What is Unstructured Data?
Unstructured data is any data that is not in block format. Often stored as files in network-attached storage (NAS) systems, unstructured data can also be stored as objects, as when it is stored in public cloud. It’s likely to be images, video, or log files.
Structured data, on the other hand, is often stored in relational databases or VMs. It’s more likely to consist of numbers and text.
Unstructured data is, according to most sources, growing a lot more quickly than structured data is, making up to 80-90% of an organization’s data. In many cases, this is because unstructured data is more likely to be machine-generated, so it grows by the dataset, around the clock. In certain industries, modern design and editing software, trading algorithms, scientific equipment, or sensors can generate hundreds of terabytes per week.
What is Data Management, and How Does it Differ for Unstructured Data?
When data is important to your business, there’s a set of things you need to do in addition to storing and using it. For Igneous, data management consists of four main pillars:
Managing unstructured data is more difficult than managing structured data, for a lot of reasons.
Pillar 1: SEE
In order to get the full value out of your data, you need to know what you have in order to leverage it. If you’re dealing with structured data, this is fairly straightforward. If you’re dealing with unstructured data, however, it’s not as easy as outsiders would assume. You might have to spend weeks and scan tens of billions of files just to get to the answers to questions like: How much data do you have? Do you have a lot of huge or tiny files? Is your data changing, and how quickly? How often is it accessed, and by what applications and business units? How much are you paying for your data to be stored in various tiers, and how often does this investment pay off?
Pillar 2: ORGANIZE
This second pillar is just as important as the first. Data needs to be in the right place in the right time, with the right permissions. It needs to be searchable based on its metadata. For structured data, this pillar is easy; for unstructured data, it’s a lot more difficult, and so it is often ignored.
Pillar 3: PROTECT
The third pillar, PROTECT, is so vital that it is often used as a stand-in for all of data management. Replication, backups, archive of backups, and copy data management all fall under the PROTECT bucket. Across both structured and unstructured data, from purpose-built backup appliances to open-source solutions, organizations are bombarded with a myriad of options that can help them protect their data. The challenge here is to find the best data protection option for your data. For the best long-term mobility and scalability, your data protection strategy for any particular volume ought to depend on type of data and its value, rather than being tied to a certain storage vendor.
Pillar 4: MOVE
The final pillar, MOVE, starts with the need to archive data that has long-term value but doesn’t need to be highly available at all times, but for organizations dealing with more than a petabyte or so of data, it extends into the need to automatically tier cool data and copy data into secondary storage and cold data into tertiary storage. Once you understand what data you have, you’ll want to be able to put it where it belongs, so it can be accessed by the proper teams and applications without taking up space in costly high-performance storage unless it’s warranted.
Do I Need Specialized Tools For UDM?
In short: Absolutely. If, like many modern enterprises, a significant portion of your organization’s value comes from your unstructured data, you won’t be able to repurpose legacy tools or tools built to manage structured data into a future-proof UDM solution. It is likely that, at the moment, you are struggling with the limitations of NDMP-based backups, whether you’re using tape or disk-to-disk, and that this means you just don’t have the management bandwidth to optimize data visibility, organization, or movement.
As your unstructured data grows, you know that you’ll need to automate data management through your data’s lifecycle. It is simply impossible to expect to manually run backups and set permissions and expirations on a terabyte-by-terabyte basis if your team is expected to handle petabytes.
Curious how your peers are dealing with the challenges of growing unstructured data?