Blog

Subscribe Here!

Thinking About Archiving to Cloud? Read This First.

by Andy Ferris – April 19, 2019

Archiving data for long-term retention is a common use for cloud storage, with compelling benefits. However, as with many data protection strategies, cloud archive encounters problems when datasets are large—at the scale of hundreds of terabytes or petabytes.

Read on to discover the pros and cons of cloud archive, and how to solve the problems of cloud archive at scale.

What is Cloud Archive?


Cloud archive involves tiering old data to the cold storage tiers of a cloud provider for long-term retention.

Benefits of Cloud Archive: Cost and Accessibility


Traditional archive solutions rely on either disk or tape for long-term storage, but both have significant drawbacks that make cloud archive a more appealing solution.

An archive strategy using disk will involve archiving old data to a lower-performant tier of NAS that is typically less expensive than traditional primary NAS storage. However, even the most cost effective NAS is still going to be extremely expensive when the true cost of ownership also includes managing the datacenter, ongoing software and hardware maintenance, cooling, power, and leasing or buying the physical space for the datacenter.

The other traditional solution for long-term storage is tape, which is cheaper per unit. However, data recovery with tape is a tedious, time-consuming, and error-prone process, making archived data on tape effectively inaccessible by end users. This results in end users that are highly resistant to archiving their data to tape because they feel as though they’ll never be able to get it back. As a result, organizations end up keeping hundreds of terabytes or petabytes of cold data on expensive primary storage, driving up storage costs and defeating the purpose of having an archive solution at all.

Cloud archive offers the best of both worlds, with a cheaper price point than any on-premises disk solution and far better accessibility than tape. Cost-wise, AWS’ recently launched Deep Glacier Tier  is priced at $0.0099 per gig per month, which adds up to about $12 per TB per year—far less expensive than just the hardware and software for any on-premises disk storage. From a data recovery standpoint, recalling data can be simplified to make data easily accessible for end users without requiring labor intensive workflows or error-prone processes, as required by solutions relying on tape

Disadvantages of Cloud Archive at Scale


Unfortunately, however, cloud archive is not without its disadvantages. Although cloud archive helps organizations save time and money compared to a labor-intensive solution like tape, it does become cumbersome to manage once data grows to the scale of hundreds of terabytes or petabytes.

Moving data to the cloud becomes labor- and time-intensive when there’s lots of data. One problem is that cost actually becomes an issue because cloud providers charge a put cost for each object moved to cloud, so if IT moves millions or billions of files to cloud, that racks up huge ingress fees. Another challenge is just keeping track of what data is in the cloud, requiring massive IT effort to track individual files as they are move to buckets in the cloud so that IT knows where to find a specific dataset amongst billions of files when recovery is needed. Additionally, transactional costs for puts, gets, retrievals, lists, scans, transfers, and a whole host of other actions in the cloud can accumulate quickly if an organization isn’t managing data efficiently.

Cloud archive at scale requires an intelligent system or service to simplify the process of moving data to cloud, keep track of the data moved to cloud, and minimize the total costs of archiving data to cloud.

Solving the Problems of Cloud Archive at Scale


Igneous was designed to handle the problems of moving and managing unstructured data at scale. In the case of cloud archive, a number of Igneous features address the drawbacks that come up when data grows to the scale of hundreds of terabytes or petabytes.

  • Direct API Integration: Integrate seamlessly with all tiers of AWS, Microsoft Azure, and GCP.
  • Automated, Policy-Driven Workflows: All IT needs to do is identify what data to archive and define policies.
  • Efficient Format: Many files are grouped to single objects called blobs, minimizing total put cost to extremely low amounts.
  • Metadata Index: Keeps track of what is stored in cloud.
  • Metadata Access through Read-Only NFS: Users can directly access metadata through read-only NFS to verify what files they need, so they identify the correct files to restore and save IT time.

To learn more about how Igneous DataProtect handles backup and archive of massive unstructured datasets, download the datasheet.

learn more

 

Related Content

Three Benefits of Backup-as-a-Service (BaaS) for Managing Unstructured Data

March 27, 2019

Ah, managing backups: A necessary, but notoriously tedious task that most IT administrators would happily hand off to someone, anyone else. In today’s increasingly automated and machine-driven age, that someone else could be...software.

read more

Announcing Enhanced Integrations with Amazon Web Services, Microsoft Azure, and Google Cloud Platform

November 14, 2018

Igneous has launched enhanced integrations with all three major public cloud storage providers. These strengthened integrations grant data-centric organizations even more flexibility and control in using Igneous' policy-driven cloud tiering and replication workflows. All cloud data management operations are fully managed through Igneous' Unstructured Data Management as-a-Service solution, minimizing operational costs for customers.

read more

Move Your Data to Microsoft Azure with Igneous

September 24, 2018

One of the biggest hurdles of integrating public cloud storage, such as Microsoft Azure, into a modern data workflow is actually moving data to the cloud.

read more

Comments