Blog

Subscribe to Email Updates

In the Machine Learning Era, Unstructured Data Management is More Important Than Ever

by Catherine Chiang – July 31, 2018

For most of IT history, the focus of data protection and management has been on structured data. This is what most of us find familiar when we conceptualize “data”: numbers, strings, in neat rows and columns.

However, as unstructured data growth outpaces that of structured data, posing new challenges for data management as well as exciting new opportunities, enterprises need to pivot their data management strategies to focus on their increasingly valuable unstructured data.

But let’s not get ahead of ourselves. First, let’s talk about the difference between structured and unstructured data.

Structured vs. Unstructured Data

While structured data, such as numbers, dates, and strings, can be represented by rows and columns, unstructured data cannot. Examples of unstructured data include images, audio, videos, e-mails, spreadsheets, and word processing documentsessentially, things stored as files. Unstructured data tends to be much larger and take up more storage than structured data.

structured-vs-unstructured-v01-3

Why Does Unstructured Data Matter?

In today’s data-driven economy, unstructured data has become core to business offerings as well as essential to business operations.

Unstructured data makes up 80% of enterprise data, according to Gartner. It’s not just the e-mails, reports, spreadsheets, and presentations that employees produce daily; unstructured data is increasingly generated by machines such as lab equipment, electronic design software, and geospatial modeling software.

And unstructured data is growing, quickly. According to IDC, unstructured data grows at 26.8% annually compared to structured data, which grows at 19.6% annually.

Unstructured Data in the Machine Learning Era

As businesses embrace the opportunity of machine learning, unstructured data is poised to play a key role. Why? Because many machine learning algorithms are able to gain insights into unstructured data, which were previously impossible to derive.

That is, these ML algorithms derive insights from unstructured data, such as images, audio, and videos, that can then be understood and processed programmatically.

For example, we have customers in the life sciences industry using ML to derive insights from tumor imaging data and brain scan data. Through convolutional neural networks, unstructured data in the form of images can be used to develop algorithms that diagnosis cancer and other diseases. Another prominent example of ML, natural language processing (NLP), uses unstructured data in the form of audio to develop algorithms that enable computers to process and understand human language.

Unstructured Data Management Needs to Be Different

Because the nature of unstructured data makes it more difficult to know what’s there and traditional programs struggle to digest unstructured data, managing unstructured data requires a different approach.

With structured data, users have granular access to the data, which is easily processed by traditional programs. Think of how easy it is to search a relational database and understand what data you have when dealing with data represented by numbers and strings; meanwhile, it’s not so easy to search for and categorize files such as images or videos.

In addition, updating structured data is as easy as going into the database and changing the value, while updating unstructured data may require replacing the entire file.

As a result, data protection strategies differ for unstructured data. Rather than the traditional way of backing up structured data, which involves integrating with the database transaction log and only backing up changes, backing up unstructured data requires taking a snapshot of the filesystem.

Today’s enterprises need to take control of their growing unstructured data, or risk losing out on a valuable opportunityand this requires a data management platform that’s built specifically to handle unstructured data at scale.

What challenges have your organization faced in managing unstructured data?

Learn more about the challenges of managing unstructured data at scale in this Data Center Knowledge article.

Read more

Related Content

Is Your Infrastructure Scalable Enough to Tackle ML/AI Workflows?

June 26, 2018

As Allison mentioned in this blog post, scalability is an integral part of any infrastructure strategy that is going to deal with machine learning and artificial intelligence (ML/AI) workflows. Today, I’m going to delve a little bit further into why that is and how you can plan for it.

read more

3 "Aha Moments" Enterprises Have While Planning Their AI Infrastructure Strategy

June 20, 2018

My last blog covered a few questions IT leaders are working on to best enable machine learning and AI projects.  

read more

Let's Talk ML/AI at Big Data Toronto!

June 5, 2018

Team Igneous is headed to Big Data Toronto next week, June 12-13!

read more

Comments