Subscribe to Email Updates

Backup vs. Archive: The Case for Incorporating Both in Your Data Management Strategy

by Catherine Chiang – July 10, 2018

We often talk about backup and archive as coupled processes, so much so that these two very different concepts may become conflated.

It’s not just a matter of semantics. Many organizations don’t differentiate clearly between backup and archive, using backups kept for long periods of time as their “archive.” Unfortunately, this approach creates problems down the road, especially as data grows.

To effectively manage their data, organizations must differentiate between backup and archive and build both processes into their infrastructure.

What’s the Difference Between Backup and Archive?

Backup is one type of data protection. It reduces the risk of data loss by creating a secondary copy of the data. Traditionally, this has been achieved through backing up an organization’s entire corpus of data to LTO tape or to another, lower-cost tier of spinning disk using purpose-built software, at regular intervals. For instance, some organizations might run a full backup every weekend, and run “differentials” each night as their employees are asleep.

Archive involves moving rarely-accessed, old, or inactive data to a secondary or tertiary storage tier for medium- or long-term storage. Organizations may archive in order to save data that may one day be needed, to comply with industry regulations, or to offload data from expensive primary storage. With data growth increasing and the advent of high performance primary storage, such as all-flash arrays, archiving has become a necessity for controlling costs. Best of all, when done effectively, archiving enables organizations to better understand their data and its value.

The key difference between backup and archive is that backup is a copy of the data, while archive is the main version of the data, located in an infrequently accessed, but cost-effective tier.

You’ll need your backup copy if a server goes down, if you accidentally delete a file, or if some data set is accidentally changed. You’ll look in the archives if your applications need to reference more historical data than anticipated, if you want to run a 25th anniversary edition of your studio’s breakout cartoon, or if you are involved in a lawsuit.

Traditional Backup Software Can’t Archive Effectively

Often, organizations attempt to fulfill their backup and archive needs through backup software. Although this approach may seem to kill two birds with one stone, it can actually result in more complexity and management overhead for enterprise IT.

For example, backup schedules will often include yearly full backups for long-term retention, referred to as “archives.” But unlike a true archive, it’s difficult to retrieve specific files out of these full backups.

Another pitfall of using backups as archives is that storing backups over long periods of time is not cost-effective. Since the backup is a copy of the data, there are now two copies of data that need to be stored. The original data is still on primary storage, eating up expensive storage capacity. In addition, having to manage two copies of the data adds to data management overhead. At scale, this is simply untenable.

How Do You Backup and Archive Massive Unstructured Data?

Traditionally, organizations have used tape as their archive solution. While tape is a reasonable archive solution for some, today’s increasingly data-driven organizations may find that tape doesn’t allow them to fully harness the power of their archived data.

Tape workflows are labor-intensive and requires administrative overhead; between shuffling the tapes between a datacenter and a tape-vaulting service and keeping track of catalogs, maintaining archives on tape is far from painless. Once the data is needed, retrieving it from tape is another laborious process that often prevents archived data from actually being used.

Organizations utilizing modern workflows, especially machine learning and artificial intelligence workflows, need to be able to access their archives in a timely manner without investing large amounts of IT resources into maintaining their secondary tier.

A modern archive solution should:

  • Help identify data that needs to be archived
  • Have automated workflows that are simple to set up and use
  • Enable end users to easily access archived data
  • Contain true archive capabilities such as cataloging and search, which make it easier to know what’s there, organize the data, and retrieve it when it’s needed.

If you would like to learn more about Igneous’ modern backup and archive capabilities for massive unstructured data, check out our product datasheet.

Read datasheet

Related Content

How Do You Archive Data at Scale?

June 12, 2018

Archive, as a concept, seems simple enough: Offload your infrequently-accessed data to a secondary storage tier and save it until the one day you might need it.

read more

How Much is Tape Really Costing Your Business?

April 17, 2018

Tape remains a popular medium of data storage due to its advantages in cost and ease of use. Often as little as a penny per gig per month to cover just the hardware, tape seems to be a great, affordable option for backup. Unlike other backup solutions, scaling tape is simply a matter of adding physical inventory, and tape doesn’t require network bandwidth or expertise to move offsite, or power when storing long-term archives.

read more

Archive 101: What is It, Why is It So Important, and How Do You Archive Effectively?

April 3, 2018

As data grows, archiving data has become more important than ever for a robust data management strategy. Yet, effective archive remains elusive for many organizations. Even defining what “archive” means can be difficult because archive commonly refers to backup archives or e-mail archives, not unstructured data management.

read more