Blog

Subscribe to Email Updates

The Backup Window is an Outdated Concept

by Steve Pao – July 5, 2017

Backup windows for unstructured file data will go the way of the rotary phone. Why? Backup windows were designed when tape was the primary backup target, before the proliferation of unstructured data. Let’s explore how legacy backup software backed into the concept of backup windows and how a modern approach eliminates them.

Originally Designed for Tape

Prior to disk-based backups, the primary medium for backup was tape. The medium had some interesting constraints because data had to be serially written to – and read from – tape. Moreover, the job of backup systems was to keep the data flowing at a continuous rate as the tape’s physical reels rotated.

As a result, single-threaded streams and continuous rates evolved as fundamental concepts in legacy backup protocols (such as Network Data Management Protocol or NDMP) used in primary storage systems and in legacy backup software that supports those protocols. Even after disk became a viable backup target, these concepts persisted in legacy backup software and protocols.

The Backup Window Emerges

All backup software reads data from the primary storage system, potentially impacting the performance of user and application access to data. With the concepts of single-threaded streams (or “jobs”) and continuous data rates, backup administrators had to choose between how many concurrent jobs they wanted to run (impacting system performance), against how frequently they wanted to backup their data.

To meet daily backup requirements, IT administrators generally ran backups at night, and for weekly backups, they ran backups over weekends. This practice enabled those administrators to run as many backup jobs as their systems could handle during these discrete periods.

The Growth of Unstructured File Data

Historically, when data sizes were small and when users and applications accessed primary storage systems only during regular working hours, the practice of scheduling backup windows outside those hours largely worked.

However, humans are no longer the only ones generated unstructured file data in the form of Word documents or Excel spreadsheets in their home directories. Most of today's data is generated by machines (e.g., medical equipment, cameras, and now autonomous vehicles) and software applications (e.g., design automation software, image rendering, and scientific computing).

The growth both in file count and in total data volume strains the concept of jobs utilizing single threaded streams. Often, there's simply too much data to move during a backup window!

When backup windows extend into working hours, user complaints often force backup administrators to turn off backups altogether, often resulting in no complete backup set for the day or week.

In some organizations, continuous processing and 24/7 operations challenge the notion of backup windows because data must be available to users and applications every minute of the day.

Rethinking Backup

It's time to rethink backup. A modern approach can eliminate backup windows altogether. Consider these two approaches:

  • Multi-streaming: Without the requirement to single stream data serially to tape, data can move faster in dynamic, parallel streams without administrators having to manually split backups into separate, discrete jobs.
  • Latency awareness: Without the requirement to stream data at a continuous rate, data can move faster when the primary system load from users and applications is low, and “back off” when users and applications are accessing the data.

With these approaches, it is possible to run backups all the time without impacting users or applications. In essence, backup jobs run at maximum speeds when usage of the primary systems is low, and automatically slow down when needed.

The ideas here are pretty straightforward, and they work! Igneous customers use Igneous Hybrid Storage Cloud to backup primary storage file systems that couldn't previously backup during acceptable backup windows.

The trick here is making these concepts work together, and this is where our engineering comes in. Look for future posts about how we overcame challenges to implement our unique secondary storage approach, including:

  • Removing the reliance on NDMP to track changes
  • Integrating with NAS systems to enforce read consistency
  • Providing a horizontally scalable and performant backend target for the multi-streaming

Download our "Secondary Storage for the Cloud Era" whitepaper for more insights on today's secondary storage challenges and solutions for overcoming them. 

Read whitepaper

Related Content

8 Principles for a Better Data Management Strategy

December 5, 2018

I’ve spent the better part of three decades leading one of the most demanding high-performance computing infrastructures in the world. One of the greatest challenges of HPC infrastructure is keeping data available and meeting the needs of the business with supporting engineers located in dozens of locations around the world. Here are some key takeaways for anyone struggling with this problem.

read more

Accelerating Image Analysis and Cancer Diagnosis with AIRI from Pure Storage and Igneous

November 28, 2018

Artificial Intelligence (AI) has various applications today, from self-driving vehicles to optimizing workflows in manufacturing operations to detecting malware on the internet. Deep learning is a form of AI where multi-layer neural networks are utilized to transform input data into progressively more defined and useful outputs. Deep learning differs from machine learning (ML) in that ML focuses on the development of task-specific algorithms that can be applied to specific problems, while deep learning focuses on extracting information at multiple levels.

read more

Altius Institute Accelerates Medical Breakthroughs with Igneous Data Protection as-a-Service

November 12, 2018

Protecting and managing enormous datasets was an increasingly urgent problem for the Altius Institute for Biomedical Sciences, where their data is at the core of advancing discoveries that save lives. Legacy backup tools proved too expensive due to Altius’ large infrastructure and IT resource requirements, leading Altius to choose Igneous for its scalability, simplicity, and long-term data management and distribution solutions.

read more

Comments