If you are reading this blog, it’s safe to assume you’ve likely heard about the recent cloud-tier pricing announcements from AWS, Azure and Google Cloud Platform. The listed pricing from these providers is now in the range of $12 to $14/TB/year. That’s right, cloud pricing is now cheaper than comparable enterprise grade hardware, which comes in at $15 to $25/TB/year - and that’s before factoring in the cost of power, cooling, and deploying hardware in your datacenter.
When you consider that all these cloud tiers feature “Archive” in their names, it makes sense that most enterprises equate their usage with archive data - stuff that needs to be stored for more than 7 years.
But really - why wouldn't you want to use these tiers in place of tape and put all your NAS backups there? It’s cheaper than on-premise and has better operational characteristics.
That’s why we created this list of the 5 most critical capabilities that will make NAS backup to the cloud an effective solution for your file data.
Critical Capability #1: Write data directly to an Archive tier without landing in Hot tiers
For most of us, we backup data just in case something happens to the first copy. Thankfully, it’s rare for something to happen to that first copy. That means it may not make financial sense to pay for all your backup data to have a short RTO.
However, many NAS backup solutions will only land data in S3/IA or Azure Hot/Cool Blob storage and then the enterprise has to write policies to push the data into the Archive tiers.
This is the difference between $23/TB/Month and $4/TB/month - a difference that adds up over time and adds to capacity. So why do they land there? These solutions are simply porting a legacy disk-to-disk model or a disk-to-tape model and they only support Hot/Cool tiers because they compatibility to their legacy model where all backup data is always online.
Can the backup solution natively leverage ‘Archive’ tiers of cloud storage to mitigate cost?
Critical Capability #2: Minimize transaction costs
When moving NAS data into any of cloud tier, there is a transaction cost to consider, called a PUT. These PUTs range in cost between $0.005 and $0.05 per 1000 transactions. At these rates, backing up 1 billion files to the cloud will cost between $5,000 and $50,000.
A modern solution must mitigate the transaction costs of moving NAS data to the cloud. Legacy on-premises disk-to-disk solutions or disk-to-tape solutions do not have to understand the nature of these costs when backing up data.
Understanding these transaction costs is important in the TCO for any direct to cloud solution.
Will the NAS backup to cloud solution factor transaction costs into the TCO?
Critical Capability #3: Intelligently expire data
Many businesses have legal and financial requirements for retention and expiration. Sometimes these are driven by government regulations like SEC17a-4, other times it’s just a business governance policy, or an SLA with the end-users.
These expiration policies need to be enforced regardless of where the data is stored. However, cloud providers often put mandatory minimum storage duration requirements on their archive-class storage offerings. For example, AWS Glacier Deep Archive has a minimum retention period of 3 months, and Azure Archive Blob Storage has a minimum retention of 6 months.
Before you put your NAS backups in the cloud: Will my retention policy still be enforced without incurring the penalties of deleting data too soon?
Critical Capability #4: Know when to clean-up expired data
Once you’ve solved your compliance problem, data still needs to be deleted from these archive tiers upon expiration. Any solution utilizing archive tiers needs to have business logic on when to reclaim data to not let capacity grow unbounded but cannot just delete data whenever it expires.
Here is why, in order to minimize transaction costs, small files will need to be grouped. Reclaiming data in grouped blobs has a cost of rehydrating data, compacting it, and rewriting it to archive tiers. Running new L0’s monthly is also not feasible for file data at scale - this has cost, time, and capacity implications.
Identifying this tipping point, where the cost of rehydrating data, removing expired data, and putting the compacted data back into the archive tier is key to cost-effective cloud backup.
Is the backup solution intelligent about reclaiming space for backup data?
Critical Capability #5: Restore data cost-effectively
Let’s face it, the ability to restore data is the most important part of NAS backup to the cloud. Most restore operations initiated by business users are directories or individual files - a modern solution should not have to rehydrate significantly more data over what is requested.
The solution also needs to be aware of the urgency of restore - is this an immediate restore, or a bulk restore? Remember, there are cost differences between the two choices. Bulk restore typically meets most business SLAs at a significantly lower cost.
When there is a critical failure that requires TBs of data to be restored, the process must be efficient in how much data it’s retrieving. Does it restore only what’s needed? Or does it have to pull more than required (and do it fast) even for small-file workloads? Can it restore data to any storage tier, including file systems in the cloud?
How does the backup solution restore data from these archive tiers to manage cost and SLAs ?
Putting it all together
So what would a solution with all these capabilities look like? A backup solution that stores data cost-effectively, has incredible resiliency, relieves the space pressure in the datacenter, and sets the foundation for doing DR in the cloud to cloud file systems.