Subscribe Here!

3 Challenges of Storing Billions of Files Beyond Big Data

by Tammy Batey – March 6, 2017

Big data? Make that massive data. 

Enterprises collect more data than ever but life at the speed of data presents a number of challenges. Igneous recently delved into the topic of challenges of massive file systems that go beyond what is generally referred to as “big data” by bringing together a group of people on the infrastructure frontlines to get their insights.

Thanks to everyone who participated in our recent CrowdChat conversation on massive file system data. Led by host John Furrier, participants explored the data management and backup challenges that enterprises experience when Networked Attached Storage (NAS) reaches billions of files. Plans are in the works for a follow-up CrowdChat conversation and we’ll let you know when we schedule it.

Before I share the insights from our Feb. 7 CrowdChat, let’s look at a few statistics. Exactly how much data are we talking about, anyway?


How much big data do enterprises generate?

A couple big data studies sought to answer this question:

  • In 2015, organizations stored 9.3 zettabytes of data, with more than 91 percent of that data unstructured, according to a 2014 IDC Digital Universe Study
  • By 2020, data is expected to grow to 44.1 zettabytes, with more than 79 percent of that data unstructured, per the IDC study
  • 63 percent of organizations say that data is growing at a rate of 20 percent or more annually while IT budgets are growing by just 4 percent a year, according to research research by the Enterprise Strategy Group, an IT analyst and strategy company

Download the Igneous whitepaper “Secondary Storage for the Cloud Era” for more details on this growth and corresponding storage challenges. Big data is just the start.

Here are three big challenges faced by enterprises generating massive amounts of data:


Challenge #1 – Backing up your data

Traditional backup is a “necessary evil” in most enterprises because long-term snaps are pricey and don’t protect data in many scenarios, according to Jeff DiNisco, P1 Technologies’ Vice President of Solutions Architecture.

Another challenge when developing a solution for backing up your data? The way that enterprise executives view data storage, according to John, the CrowdChat host’s CEO. While he considers backup “super critical,” others don’t always see the value.

“Most organizations see data storage as a cost center rather than a potential gold mine,” he said.

Deciding what data to back up depends on the governing requirements regulating the user and vertical, according to Webair Chief Technology Officer Sagi Brody. The Health Insurance Portability and Accountability Act of 1996 is one example.

“Some, like HIPAA, don’t have an exact de facto standard and are up for interpretation,” Sagi said.


Challenge #2 – Choosing automation or user-initiated migration across tiers

Big data is intimidating. Massive data even more so.Manual vs. automatic. Involved users vs. uninvolved users. CrowdChat participants disagreed on the keys to successful migration data among tiers.

The success of tiering depends on the clarity of the data lifecycle, according to P1 Technologies’  Jeff DiNisco.

“Tiering works when the lifecycle is clear,” Jeff said. “When it’s not, it can end up pointless, especially when there’s little cost differentiation.”

Public content networks such as Netflix and Comcast turn to tiering as a solution to large, unstructured data, said Webair’s Sagi Brody. With the ever-increasing size and complexity of big data and massive data, and the costs of private line/bandwidth lag, “enterprises will be forced to do this,” Sagi said.

Bryan Champagne “completely” agrees. Bryan is the Co-founder of Congruity, which was formed in the merger of MSDI, Source Support Services, and Rockland IT Solutions.

“Tiering has been tough for many organizations as it has been manual as the nature of unstructured data can make it hard to automate,” he said. “The policy definitions for tiering will be a key to success with this, though.”

While Sagi believes users should “never” be involved with tiering, Chris Dagdigian believes users can provide valuable direction on which datasets should be archived or moved into nearline tiers. Chris is the co-founder of BioTeam Inc., which provides life science informatics and bio-IT consulting services.

“That user involvement is essential,” Chris said.  

And while Bryan added that he “totally” agrees that users shouldn’t be involved, they often must be because of the nature of some of the data in research or legal discovery automation.

Sagi views Storage as a Solution (SaaS) as a solution to the quandary with which many enterprises struggle.

“The beauty of providing Storage as a Solution – internal or external – is you can build SLAs on how often the data is needed, accessed and (the) performance, and have the flexibility to swap out platforms based on the req,” he said. “Get folks out of platform-specific questions.”


Challenge #3 – Archiving your data

A conversation about data backup wouldn’t be complete without a discussion of what data – if any – enterprises choose to delete rather than archive, according to Dave Vellante, co-host of theCUBE and co-CEO of SiliconANGLE Media.

“I’ve never seen a successful, sustainable example of deleting and reclaiming wasted space,” he said.

But even when they keep data, many companies have just one primary tier and one “last resort backup-and-archive copy,” according to BioTeam’s Chris Dagdigian.

“Nobody touches 98 percent of data 30 days after it was created,” Chris said, “(because of) lack of data awareness compounded by the fact that human data curators are more expensive than just adding capacity to a tier and punting.”

Considering new approaches to protecting unstructured data is “by far” the most common topic in data protection, said P1 Technologies’ Jeff DiNisco.

“If you’re not today, you’re trying to,” he said.

While Dave said “tape is still the fastest way to move [data] from Point A to Point B,” Jeff architects solutions for a growing number of customers that make it clear that tape is not an option.

Stuart Miniman, co-host of theCUBE and an analyst at Wikibon, agrees with Jeff that using tape as a means of archiving unstructured data is growing less popular.

“Tape has been declared dead many times,” Stuart said. “The economics of modern solutions continue to move more big deals off tape.”


Learn more

Download the Igneous whitepaper “Secondary Storage for the Cloud Era” to learn more about the growth of big data and corresponding storage challenges. Visit our Product page for details on new solutions to the storage challenges associated with large, unstructured data.

Check out our recent CrowdChat conversation on massive file system data. Hope you can join us for a future CrowdChat conversation! 


Related Content

Thinking About Archiving to Cloud? Read This First.

April 19, 2019

Archiving data for long-term retention is a common use for cloud storage, with compelling benefits. However, as with many data protection strategies, cloud archive encounters problems when datasets are large—at the scale of hundreds of terabytes or petabytes.

read more

Three Benefits of Backup-as-a-Service (BaaS) for Managing Unstructured Data

March 27, 2019

Ah, managing backups: A necessary, but notoriously tedious task that most IT administrators would happily hand off to someone, anyone else. In today’s increasingly automated and machine-driven age, that someone else could

read more

Top 10 IT Trends for 2019

February 19, 2019

In 2019 and beyond, 451 Research sees a key shift in the world of IT—the breaking apart and coalescing of old silos of technology. Today, technological advances feed off each other to drive innovation. With this new paradigm of technological innovation, 451 Research shares 10 IT trends they predict for 2019.

read more