Backup windows for unstructured file data will go the way of the rotary phone. Why? Backup windows were designed when tape was the primary backup target, before the proliferation of unstructured data. Let’s explore how legacy backup software backed into the concept of backup windows and how a modern approach eliminates them.
Originally Designed for Tape
Prior to disk-based backups, the primary medium for backup was tape. The medium had some interesting constraints because data had to be serially written to – and read from – tape. Moreover, the job of backup systems was to keep the data flowing at a continuous rate as the tape’s physical reels rotated.
As a result, single-threaded streams and continuous rates evolved as fundamental concepts in legacy backup protocols (such as Network Data Management Protocol or NDMP) used in primary storage systems and in legacy backup software that supports those protocols. Even after disk became a viable backup target, these concepts persisted in legacy backup software and protocols.
The Backup Window Emerges
All backup software reads data from the primary storage system, potentially impacting the performance of user and application access to data. With the concepts of single-threaded streams (or “jobs”) and continuous data rates, backup administrators had to choose between how many concurrent jobs they wanted to run (impacting system performance), against how frequently they wanted to backup their data.
To meet daily backup requirements, IT administrators generally ran backups at night, and for weekly backups, they ran backups over weekends. This practice enabled those administrators to run as many backup jobs as their systems could handle during these discrete periods.
The Growth of Unstructured File Data
Historically, when data sizes were small and when users and applications accessed primary storage systems only during regular working hours, the practice of scheduling backup windows outside those hours largely worked.
However, humans are no longer the only ones generated unstructured file data in the form of Word documents or Excel spreadsheets in their home directories. Most of today's data is generated by machines (e.g., medical equipment, cameras, and now autonomous vehicles) and software applications (e.g., design automation software, image rendering, and scientific computing).
The growth both in file count and in total data volume strains the concept of jobs utilizing single threaded streams. Often, there's simply too much data to move during a backup window!
When backup windows extend into working hours, user complaints often force backup administrators to turn off backups altogether, often resulting in no complete backup set for the day or week.
In some organizations, continuous processing and 24/7 operations challenge the notion of backup windows because data must be available to users and applications every minute of the day.
It's time to rethink backup. A modern approach can eliminate backup windows altogether. Consider these two approaches:
- Multi-streaming: Without the requirement to single stream data serially to tape, data can move faster in dynamic, parallel streams without administrators having to manually split backups into separate, discrete jobs.
- Latency awareness: Without the requirement to stream data at a continuous rate, data can move faster when the primary system load from users and applications is low, and “back off” when users and applications are accessing the data.
With these approaches, it is possible to run backups all the time without impacting users or applications. In essence, backup jobs run at maximum speeds when usage of the primary systems is low, and automatically slow down when needed.
The ideas here are pretty straightforward, and they work! Igneous customers use Igneous Hybrid Storage Cloud to backup primary storage file systems that couldn't previously backup during acceptable backup windows.
The trick here is making these concepts work together, and this is where our engineering comes in. Look for future posts about how we overcame challenges to implement our unique secondary storage approach, including:
- Removing the reliance on NDMP to track changes
- Integrating with NAS systems to enforce read consistency
- Providing a horizontally scalable and performant backend target for the multi-streaming
Download our "Secondary Storage for the Cloud Era" whitepaper for more insights on today's secondary storage challenges and solutions for overcoming them.