Archiving data for long-term retention is a common use for cloud storage, with compelling benefits. However, as with many data protection strategies, cloud archive encounters problems when datasets are large—at the scale of hundreds of terabytes or petabytes.
Read on to discover the pros and cons of cloud archive, and how to solve the problems of cloud archive at scale.
What is Cloud Archive?
Cloud archive involves tiering old data to the cold storage tiers of a cloud provider for long-term retention.
Benefits of Cloud Archive: Cost and Accessibility
Traditional archive solutions rely on either disk or tape for long-term storage, but both have significant drawbacks that make cloud archive a more appealing solution.
An archive strategy using disk will involve archiving old data to a lower-performant tier of NAS that is typically less expensive than traditional primary NAS storage. However, even the most cost effective NAS is still going to be extremely expensive when the true cost of ownership also includes managing the datacenter, ongoing software and hardware maintenance, cooling, power, and leasing or buying the physical space for the datacenter.
The other traditional solution for long-term storage is tape, which is cheaper per unit. However, data recovery with tape is a tedious, time-consuming, and error-prone process, making archived data on tape effectively inaccessible by end users. This results in end users that are highly resistant to archiving their data to tape because they feel as though they’ll never be able to get it back. As a result, organizations end up keeping hundreds of terabytes or petabytes of cold data on expensive primary storage, driving up storage costs and defeating the purpose of having an archive solution at all.
Cloud archive offers the best of both worlds, with a cheaper price point than any on-premises disk solution and far better accessibility than tape. Cost-wise, AWS’ recently launched Deep Glacier Tier is priced at $0.0099 per gig per month, which adds up to about $12 per TB per year—far less expensive than just the hardware and software for any on-premises disk storage. From a data recovery standpoint, recalling data can be simplified to make data easily accessible for end users without requiring labor intensive workflows or error-prone processes, as required by solutions relying on tape
Disadvantages of Cloud Archive at Scale
Unfortunately, however, cloud archive is not without its disadvantages. Although cloud archive helps organizations save time and money compared to a labor-intensive solution like tape, it does become cumbersome to manage once data grows to the scale of hundreds of terabytes or petabytes.
Moving data to the cloud becomes labor- and time-intensive when there’s lots of data. One problem is that cost actually becomes an issue because cloud providers charge a put cost for each object moved to cloud, so if IT moves millions or billions of files to cloud, that racks up huge ingress fees. Another challenge is just keeping track of what data is in the cloud, requiring massive IT effort to track individual files as they are move to buckets in the cloud so that IT knows where to find a specific dataset amongst billions of files when recovery is needed. Additionally, transactional costs for puts, gets, retrievals, lists, scans, transfers, and a whole host of other actions in the cloud can accumulate quickly if an organization isn’t managing data efficiently.
Cloud archive at scale requires an intelligent system or service to simplify the process of moving data to cloud, keep track of the data moved to cloud, and minimize the total costs of archiving data to cloud.
Solving the Problems of Cloud Archive at Scale
Igneous was designed to handle the problems of moving and managing unstructured data at scale. In the case of cloud archive, a number of Igneous features address the drawbacks that come up when data grows to the scale of hundreds of terabytes or petabytes.
- Direct API Integration: Integrate seamlessly with all tiers of AWS, Microsoft Azure, and GCP.
- Automated, Policy-Driven Workflows: All IT needs to do is identify what data to archive and define policies.
- Efficient Format: Many files are grouped to single objects called blobs, minimizing total put cost to extremely low amounts.
- Metadata Index: Keeps track of what is stored in cloud.
- Metadata Access through Read-Only NFS: Users can directly access metadata through read-only NFS to verify what files they need, so they identify the correct files to restore and save IT time.
To learn more about how Igneous DataProtect handles backup and archive of massive unstructured datasets, download the datasheet.