Blog

Subscribe to Email Updates

How Igneous Solves the Problem of Data Movement for Large Datasets

by Jeff Hughes – December 5, 2017

Getting data from where it is to where it needs to be sounds simple in concept, but becomes a big issue when your datasets are very large. Though the aspect that most often comes to mind is moving across geographies, different formats and impact to primary systems play equally challenging roles. Yet moving data well is a key function required for backup, archive, and cloud tiering.

Screen Shot 2017-12-05 at 11.24.55 AM.png

Why is Data Movement for Large Datasets So Hard?

Many ways to move data were designed when enterprise data was measured in gigabytes.  Now that it’s measured in petabytes, many old techniques don’t work anymore.

For example, one way to move data off legacy file systems was NDMP, which was a single-threaded protocol designed to move data linearly to tape. Those constraints don’t apply today, but the protocols are still often in use.

How Does Igneous Solve this Problem?

Igneous moves data from primary storage in highly parallel streams. Rather than using legacy protocols like NDMP, we come in via front-end protocols such as NFS and SMB, and open many parallel streams the way that many users would. In addition, the way we scan and the way we move data are done intelligently, specifically designing on how the filers are built.

Impact on Filers

Igneous is latency aware. We move data as fast we can when the filers quiesce, and as we detect load from users or applications, we back off intelligently.

This enables backups to run continuously without creating “backup windows” where backup administrators tell users and application owners the data is unavailable, from say 11pm-4am. In our case, backups run all the time.

Read Consistency

When read consistency is an issue, we have integration with APIs for the filers to take a snapshot, move data, and release the snapshot after we’re done. We’ve integrated with NetApp, Dell EMC Isilon, and Pure FlashBlade to date.

Moving Data to Other Locations or Cloud

The key element here is to understand where you need low latency. Between the filers and the data movement software you want a low latency connection, as POSIX semantics involved in NFS and SMB transactions require it.

However, the communications between our data mover software and our storage layers are RESTful protocols, designed to work over WAN and Internet connections just like the Web. In fact, the RESTful protocols all work over https.

As such, we can do data movement between Igneous systems or between Igneous systems and public clouds very efficiently and reliably, without the typical retries and timeouts associated with trying to run POSIX semantics over the network.

Learn more about our data movement engine on our newly launched Technology page.

Related Content

Accelerating Image Analysis and Cancer Diagnosis with AIRI from Pure Storage and Igneous

November 28, 2018

Artificial Intelligence (AI) has various applications today, from self-driving vehicles to optimizing workflows in manufacturing operations to detecting malware on the internet. Deep learning is a form of AI where multi-layer neural networks are utilized to transform input data into progressively more defined and useful outputs. Deep learning differs from machine learning (ML) in that ML focuses on the development of task-specific algorithms that can be applied to specific problems, while deep learning focuses on extracting information at multiple levels.

read more

Data Protection at Scale: How Igneous Integrates with NetApp

November 19, 2018

One of Igneous’ key benefits is how we integrate easily with any primary NAS system, streamlining data protection and freeing customers from legacy solutions, and vendor-specific data silos.

read more

Altius Institute Accelerates Medical Breakthroughs with Igneous Data Protection as-a-Service

November 12, 2018

Protecting and managing enormous datasets was an increasingly urgent problem for the Altius Institute for Biomedical Sciences, where their data is at the core of advancing discoveries that save lives. Legacy backup tools proved too expensive due to Altius’ large infrastructure and IT resource requirements, leading Altius to choose Igneous for its scalability, simplicity, and long-term data management and distribution solutions.

read more

Comments