For most of IT history, the focus of data protection and management has been on structured data. This is what most of us find familiar when we conceptualize “data”: numbers, strings, in neat rows and columns.
However, as unstructured data growth outpaces that of structured data, posing new challenges for data management as well as exciting new opportunities, enterprises need to pivot their data management strategies to focus on their increasingly valuable unstructured data.
But let’s not get ahead of ourselves. First, let’s talk about the difference between structured and unstructured data.
Structured vs. Unstructured Data
While structured data, such as numbers, dates, and strings, can be represented by rows and columns, unstructured data cannot. Examples of unstructured data include images, audio, videos, e-mails, spreadsheets, and word processing documents—essentially, things stored as files. Unstructured data tends to be much larger and take up more storage than structured data.
Why Does Unstructured Data Matter?
In today’s data-driven economy, unstructured data has become core to business offerings as well as essential to business operations.
Unstructured data makes up 80% of enterprise data, according to Gartner. It’s not just the e-mails, reports, spreadsheets, and presentations that employees produce daily; unstructured data is increasingly generated by machines such as lab equipment, electronic design software, and geospatial modeling software.
And unstructured data is growing, quickly. According to IDC, unstructured data grows at 26.8% annually compared to structured data, which grows at 19.6% annually.
Unstructured Data in the Machine Learning Era
As businesses embrace the opportunity of machine learning, unstructured data is poised to play a key role. Why? Because many machine learning algorithms are able to gain insights into unstructured data, which were previously impossible to derive.
That is, these ML algorithms derive insights from unstructured data, such as images, audio, and videos, that can then be understood and processed programmatically.
For example, we have customers in the life sciences industry using ML to derive insights from tumor imaging data and brain scan data. Through convolutional neural networks, unstructured data in the form of images can be used to develop algorithms that diagnosis cancer and other diseases. Another prominent example of ML, natural language processing (NLP), uses unstructured data in the form of audio to develop algorithms that enable computers to process and understand human language.
Unstructured Data Management Needs to Be Different
Because the nature of unstructured data makes it more difficult to know what’s there and traditional programs struggle to digest unstructured data, managing unstructured data requires a different approach.
With structured data, users have granular access to the data, which is easily processed by traditional programs. Think of how easy it is to search a relational database and understand what data you have when dealing with data represented by numbers and strings; meanwhile, it’s not so easy to search for and categorize files such as images or videos.
In addition, updating structured data is as easy as going into the database and changing the value, while updating unstructured data may require replacing the entire file.
As a result, data protection strategies differ for unstructured data. Rather than the traditional way of backing up structured data, which involves integrating with the database transaction log and only backing up changes, backing up unstructured data requires taking a snapshot of the filesystem.
Today’s enterprises need to take control of their growing unstructured data, or risk losing out on a valuable opportunity—and this requires a data management platform that’s built specifically to handle unstructured data at scale.
What challenges has your organization faced in managing unstructured data?
Learn more about the challenges of managing unstructured data at scale in this Data Center Knowledge article.