Streaming out a weekly software update brings joy to customers and engineers alike. Customers receive cutting-edge features and timely bug fixes, while engineers transform bright ideas into production realities with minimal turnaround.
At the same time, validating each weekly software update is no small feat. It takes thoughtful development and serious automation to select a high-quality release candidate each week and to ensure it is free from critical bugs or performance degradations.
At Igneous, we rely on a variety of automated techniques to identify a release candidate and further validate that it is ready for the production primetime spotlight. Today, I’ll cover how we use comprehensive test automation to identify high-quality release candidates amongst the barrage of changes that are integrated into our source code each day.
Test, Test, Test
Testing is fundamental to any software organization’s success, and as such, testing is baked into our DNA here at Igneous. We’ve invested heavily in integrating testing with our developers’ workflows in a powerful yet streamlined fashion, which routinely pays quality and efficiency dividends. It has led to the creation of an in-house continuous integration automaton dubbed “Iggybot,” which manages all test execution grunt work for us. Thanks to the unique way in which Iggybot integrates with developer workflows, our process contains multiple layers of defense against the introduction of bugs and regressions.
Our first layer of defense engages for every proposed code change. Whenever a proposed change is posted, Iggybot automatically dispatches an extensive series of test suites. For a code change to even be eligible to merge into mainline, it must pass every single one of these checks. These test suites include everything from unit tests to component-level tests to end-to-end tests of customer workflows. Each of these tests are executed across an excruciating number of product variants in massive parallelism.
A visualization of the automated checks dispatched by Iggybot for every proposed code change. If one of these checks had failed, the proposed code change would not be allowed to merge.
Once the proposed change passes all checks and a thorough code review, it is marked “ready” for merge into mainline. We delegate the honors to Iggybot, however, to actually execute the merge and dispatch the test suites once more—just in case a concurrent merge introduces a conflict. This is our second layer of defense and automating it facilitates a major advantage: It allows us to halt merges at the first sign of trouble. Iggybot smartly monitors for test failures on recent merges. When it observes too many, it refuses to merge further changes until those failures are addressed.
The nightly tests are where things really get interesting—our third layer of defense. Each night at midnight, Iggybot selects a daily release candidate and slams it with multiple repetitions of simulated testing in order to root out pesky transient failures. Concurrently, it brutalizes that version by spinning it up on real hardware and running load tests, stress tests, and upset tests. These tests verify that our system fails over properly and degrades gracefully and safely in catastrophe scenarios. Finally, Iggybot runs the selected release candidate on real hardware systems to verify that backup and restore performance meets our benchmarks. When failures occur, we triage every single one and meticulously address every possible product concern. All told, 419 machine hours—over 17 machine days—are spent validating our product each night.
A series of merged code changes to mainline is displayed above. The entry for “Merge pull request #11602” was subjected to nightly tests. All 5 test failures were investigated for bugs. Since no critical product bugs caused any of the failures, this software version is eligible for selection as a release candidate.
Enter the Pipeline
After all this testing work—thank you Iggybot!—we are now ready to select a high-quality release candidate for our pipeline. We do this by starting with a software version that performed well in our nightly testing and triaging all issues to ensure there are no critical product bugs. If it looks clean, then we submit this release candidate into a validation pipeline.
From here, the release candidate journeys through a series of pre-production environments and automated checks before it becomes eligible for production deployment. This journey and the automation that drives it bring forth yet another collection of defense mechanisms—longevity testing, deployment testing, and internal rehearsals of our Critical Incident Process to name a few—all of which will be covered in-depth in future blog posts.
If you enjoyed learning about our production pipeline, be sure to check out our open positions!