Blog

Subscribe to Email Updates

How Igneous Solves the Problem of Data Movement for Large Datasets

by Jeff Hughes – December 5, 2017

Getting data from where it is to where it needs to be sounds simple in concept, but becomes a big issue when your datasets are very large. Though the aspect that most often comes to mind is moving across geographies, different formats and impact to primary systems play equally challenging roles. Yet moving data well is a key function required for backup, archive, and cloud tiering.

Screen Shot 2017-12-05 at 11.24.55 AM.png

Why is Data Movement for Large Datasets So Hard?

Many ways to move data were designed when enterprise data was measured in gigabytes.  Now that it’s measured in petabytes, many old techniques don’t work anymore.

For example, one way to move data off legacy file systems was NDMP, which was a single-threaded protocol designed to move data linearly to tape. Those constraints don’t apply today, but the protocols are still often in use.

How Does Igneous Solve this Problem?

Igneous moves data from primary storage in highly parallel streams. Rather than using legacy protocols like NDMP, we come in via front-end protocols such as NFS and SMB, and open many parallel streams the way that many users would. In addition, the way we scan and the way we move data are done intelligently, specifically designing on how the filers are built.

Impact on Filers

Igneous is latency aware. We move data as fast we can when the filers quiesce, and as we detect load from users or applications, we back off intelligently.

This enables backups to run continuously without creating “backup windows” where backup administrators tell users and application owners the data is unavailable, from say 11pm-4am. In our case, backups run all the time.

Read Consistency

When read consistency is an issue, we have integration with APIs for the filers to take a snapshot, move data, and release the snapshot after we’re done. We’ve integrated with NetApp, Dell EMC Isilon, and Pure FlashBlade to date.

Moving Data to Other Locations or Cloud

The key element here is to understand where you need low latency. Between the filers and the data movement software you want a low latency connection, as POSIX semantics involved in NFS and SMB transactions require it.

However, the communications between our data mover software and our storage layers are RESTful protocols, designed to work over WAN and Internet connections just like the Web. In fact, the RESTful protocols all work over https.

As such, we can do data movement between Igneous systems or between Igneous systems and public clouds very efficiently and reliably, without the typical retries and timeouts associated with trying to run POSIX semantics over the network.

Learn more about our data movement engine on our newly launched Technology page.

Related Content

How Igneous Selects Weekly Release Candidates for Production

August 14, 2018

Streaming out a weekly software update brings joy to customers and engineers alike. Customers receive cutting-edge features and timely bug fixes, while engineers transform bright ideas into production realities with minimal turnaround.

read more

Igneous Announces New Integration with Google Cloud Storage

July 23, 2018

Igneous Systems is excited to announce a new integration with Google Cloud Platform. This integration was designed with both replication and long-term archive in mind. Through the Igneous interface, you can now move files and file systems directly into Google Cloud Storage via policy. You will retain your ability to search across storage tiers, restoring to Igneous or to your primary NAS when you need to recover your data.

read more

Scale-Out Secondary Storage for Scale-Out NAS: How Igneous Integrates with Qumulo

July 2, 2018

One of Igneous’ key benefits is how we integrate easily with any primary NAS system, streamlining secondary storage infrastructure and freeing customers from vendor-specific siloes on the secondary tier.

read more

Comments