4 min read

Pure Storage Data Hub: A Data-First Strategy

By Shaun Walsh on October 17, 2018

IDC’s DataAge 2025 report states that the global storage datasphere will grow from 16ZB in 2016 to 163ZB by 2025. This massive growth in data, which is over 70% unstructured and object based data, is not only being generated by more diverse sources such as edge computing and IoT, but also being operationalized in new ways with AI/ML and real-time analytics. These changes in data sources and workflows require a new strategic model to manage multi-dimensional workflow performance, protect data at scale, enable data sharing, and provide data mobility from anywhere.

Pure Storage recently introduced a new vision for modern storage, the data hub. The premise of the data hub is simple and profound at the same time: consolidate silos of storage, make all data shareable and accessible, and accelerate application performance. On their website, they summarize the data hub vision this way:

A data hub is a modern, data-centric architecture for storage – powering analytics and AI by enabling enterprises to consolidate and share data in today’s data-first world. Unlike data lake and legacy DAS architectures engineered primarily to store data, a data hub is designed to share data.

Igneous follows a similar principle with our Data Protection as-a-Service capabilities. We enable data protection across all unstructured platforms, whether on-premises or cloud-based. Our technology enables you to scan, index, protect, and move your data to the right storage, at the right time, and at the right price point. Pure and Igneous share a vision for enabling high-performance workflows, scalability, adaptive data movement, and massive parallelism to create a shared storage strategy that eliminates silos at the edge, in the datacenter, and across the cloud.

The Creation of Storage Silos

Storage silos end up being created for many practical reasons, such as supporting legacy systems, evolving application performance requirements (Spark or SAP), new computing models (edge computing, IoT, AI), regulatory and privacy issues, and M&A activities. The other significant creator of storage silos are limitations of vendors of storage infrastructure that do not provide the ability to move and share data, export metadata, and enable other forms of analytics capabilities which inhibit a single understanding of an organization's datasphere.

Data today lives in a world of silos

In the past, these issues prohibited IT from having enterprise-wide, shareable data repositories required to monetize and operationalize data via AI/ML and analytics. The data hub’s modern storage model is the first step in breaking these silos.

Igneous consistently encounters unstructured data silos in many environments where multiple NAS, object and cloud storage options are deployed. Over time, each new storage solution can become a new silo, and we are working with our customers and partners to break this repeating cycle of silos. One of the key capabilities of our open platform is the ability to scan permission for NFS, SMB, and S3 and then migrate data from individual silos into a unified data hub model using our data protection solution.

Unifying the Silos

As with many things in life, understanding of the problem is the key to finding a solution. The diagram below shows the diverse challenges of creating a single shareable storage model. The biggest challenge in breaking the current storage silos is that traditional storage systems are not designed to deliver multiple streams of the same data with the varying levels of performance required by each of the workflows. Traditional storage systems are designed to be tuned to the needs of a single set of applications.

unnamed (1)

In the diagram above, you can see how each major type of applications has different storage, performance, scaling, and protection requirements.  

"The data hub’s software coordinates the delivery of data to applications as that data is needed and with the performance characteristics required. This becomes a true enabler of the software-defined data center, providing IT practitioners the ability to deliver capabilities that are data-centric defined by the needs of the applications. Practitioners are not limited by the inherent inflexibilities of convergence and other architectures that rely on the closely-coupled pooling of resources. "

The data hub model will support multiple dimensions of storage performance across the spectrum of applications which will enable the unification of platforms into a single shareable storage service. This will unlock the value, accessibility, and usability of data in your organization. Igneous is aligned with Pure Storage in both vision and execution of a data hub. We have the ability to provide:

  1. High-throughput file and object data management - Our dynamic latency sensitive data movement engine can adjust performance to ensure that applications are never operationally impacted during data protection.
  2. Seamless native scale-out - We can scale our index, data management and storage capabilities dynamically via the cloud to support massive requirements for billions of files and zettabytes of data.
  3. Multi-dimensional performance - We can adjust our performance based on policy, applications needs, and assigned data movers.
  4. Massively parallel - Our data protection platform was designed to be be massively parallel, to avoid the limitations of previous generations of data protection solutions.

Learn more about how Igneous and Pure Storage work together.

Learn more

Shaun Walsh

Written by Shaun Walsh

Subscribe for Updates

Get the latest Igneous blog posts delivered to your inbox.