6 min read

3 Challenges of Storing Billions of Files Beyond Big Data

By Tammy Batey on March 6, 2017

Big data? Make that massive data. 

Enterprises collect more data than ever but life at the speed of data presents a number of challenges. Igneous recently delved into the topic of challenges of massive file systems that go beyond what is generally referred to as “big data” by bringing together a group of people on the infrastructure frontlines to get their insights.

Thanks to everyone who participated in our recent CrowdChat conversation on massive file system data. Led by host John Furrier, participants explored the data management and backup challenges that enterprises experience when Networked Attached Storage (NAS) reaches billions of files. Plans are in the works for a follow-up CrowdChat conversation and we’ll let you know when we schedule it.

Before I share the insights from our Feb. 7 CrowdChat, let’s look at a few statistics. Exactly how much data are we talking about, anyway?


How much big data do enterprises generate?

A couple big data studies sought to answer this question:

  • In 2015, organizations stored 9.3 zettabytes of data, with more than 91 percent of that data unstructured, according to a 2014 IDC Digital Universe Study
  • By 2020, data is expected to grow to 44.1 zettabytes, with more than 79 percent of that data unstructured, per the IDC study
  • 63 percent of organizations say that data is growing at a rate of 20 percent or more annually while IT budgets are growing by just 4 percent a year, according to research research by the Enterprise Strategy Group, an IT analyst and strategy company

Here are three big challenges faced by enterprises generating massive amounts of data:


Challenge #1 – Backing up your data

Traditional backup is a “necessary evil” in most enterprises because long-term snaps are pricey and don’t protect data in many scenarios, according to Jeff DiNisco, P1 Technologies’ Vice President of Solutions Architecture.

Another challenge when developing a solution for backing up your data? The way that enterprise executives view data storage, according to John, the CrowdChat host’s CEO. While he considers backup “super critical,” others don’t always see the value.

“Most organizations see data storage as a cost center rather than a potential gold mine,” he said.

Deciding what data to back up depends on the governing requirements regulating the user and vertical, according to Webair Chief Technology Officer Sagi Brody. The Health Insurance Portability and Accountability Act of 1996 is one example.

“Some, like HIPAA, don’t have an exact de facto standard and are up for interpretation,” Sagi said.


Challenge #2 – Choosing automation or user-initiated migration across tiers

Big data is intimidating. Massive data even more so.Manual vs. automatic. Involved users vs. uninvolved users. CrowdChat participants disagreed on the keys to successful migration data among tiers.

The success of tiering depends on the clarity of the data lifecycle, according to P1 Technologies’  Jeff DiNisco.

“Tiering works when the lifecycle is clear,” Jeff said. “When it’s not, it can end up pointless, especially when there’s little cost differentiation.”

Public content networks such as Netflix and Comcast turn to tiering as a solution to large, unstructured data, said Webair’s Sagi Brody. With the ever-increasing size and complexity of big data and massive data, and the costs of private line/bandwidth lag, “enterprises will be forced to do this,” Sagi said.

Bryan Champagne “completely” agrees. Bryan is the Co-founder of Congruity, which was formed in the merger of MSDI, Source Support Services, and Rockland IT Solutions.

“Tiering has been tough for many organizations as it has been manual as the nature of unstructured data can make it hard to automate,” he said. “The policy definitions for tiering will be a key to success with this, though.”

While Sagi believes users should “never” be involved with tiering, Chris Dagdigian believes users can provide valuable direction on which datasets should be archived or moved into nearline tiers. Chris is the co-founder of BioTeam Inc., which provides life science informatics and bio-IT consulting services.

“That user involvement is essential,” Chris said.  

And while Bryan added that he “totally” agrees that users shouldn’t be involved, they often must be because of the nature of some of the data in research or legal discovery automation.

Sagi views Storage as a Solution (SaaS) as a solution to the quandary with which many enterprises struggle.

“The beauty of providing Storage as a Solution – internal or external – is you can build SLAs on how often the data is needed, accessed and (the) performance, and have the flexibility to swap out platforms based on the req,” he said. “Get folks out of platform-specific questions.”


Challenge #3 – Archiving your data

A conversation about data backup wouldn’t be complete without a discussion of what data – if any – enterprises choose to delete rather than archive, according to Dave Vellante, co-host of theCUBE and co-CEO of SiliconANGLE Media.

“I’ve never seen a successful, sustainable example of deleting and reclaiming wasted space,” he said.

But even when they keep data, many companies have just one primary tier and one “last resort backup-and-archive copy,” according to BioTeam’s Chris Dagdigian.

“Nobody touches 98 percent of data 30 days after it was created,” Chris said, “(because of) lack of data awareness compounded by the fact that human data curators are more expensive than just adding capacity to a tier and punting.”

Considering new approaches to protecting unstructured data is “by far” the most common topic in data protection, said P1 Technologies’ Jeff DiNisco.

“If you’re not today, you’re trying to,” he said.

While Dave said “tape is still the fastest way to move [data] from Point A to Point B,” Jeff architects solutions for a growing number of customers that make it clear that tape is not an option.

Stuart Miniman, co-host of theCUBE and an analyst at Wikibon, agrees with Jeff that using tape as a means of archiving unstructured data is growing less popular.

“Tape has been declared dead many times,” Stuart said. “The economics of modern solutions continue to move more big deals off tape.”


Learn more

Download the Igneous whitepaper “Secondary Storage for the Cloud Era” to learn more about the growth of big data and corresponding storage challenges. Visit our Product page for details on new solutions to the storage challenges associated with large, unstructured data.

Check out our recent CrowdChat conversation on massive file system data. Hope you can join us for a future CrowdChat conversation! 


Tammy Batey

Written by Tammy Batey

Subscribe for Updates

Get the latest Igneous blog posts delivered to your inbox.