Zero Downtime Deployment is a strategy to deploy software without interrupting the service, often using methods like canary, blue-green or rolling update deployments. These may involve a lot of domains and techniques, but of the most important ones there is testing.
With the ultimate goal of building something that doesn’t resemble a house of cards, testing allows us to catch problems before they cause trouble and to ensure that the software works as expected.
There are two paradigms for testing: manual and automated. While manual testing helps ensure that new and old functionalities work as expected, it’s difficult and time-consuming to ensure that all the necessary tests are properly executed before a release. Depending on the complexity of the solution, it becomes unfeasible.
This problem got worse with companies adopting agile practices and embracing an accelerated development lifecycle marked by frequent releases and course correction based on customer feedback. The “Continuous Everything” and zero downtime mindset simply cannot exist when you rely on manual testing only, you need test automation.
Definition
Test automation is the use of software that can be run without human interaction, to control the execution of tests and also to check if the outcomes are according to the expectation. The tests could be of any kind, from unit tests to functional end-to-end tests.
This technique is built upon two elements: test as code and a test runner. The first is as you expect, a set of steps written using a programming language to test something. The test runner, on the other hand, is in charge of finding your tests, executing them, and delivering the results to you.
What Test Automation Supports or Enables
In the context of zero downtime deployment, automation enables developers and testers to write tests as code, which can then be executed on the DevOps pipeline, like when a new build is created.
Test automation is also an important part of continuous integration and deployment.
Blindly merging new code to the trunk of a repository will undoubtedly lead to future problems. Enforcing code to pass a suite of automated tests helps developers to be sure new modifications won’t affect existing functionalities, catching errors as soon as possible.
While practically all types of tests work toward making the zero-downtime deployment a reality, some types were created specifically for this, like Chaos Testing/Engineering. Chaos Testing focuses on the system’s integrity, simulating and identifying failures that could lead to downtime or a negative user experience.
For the ideal application of Chaos Engineering, you need these five principles:
- Build a Hypothesis around Steady State Behavior: What do the metrics look like when the system is in its steady state?
- Vary Real-world Events: Any event capable of disrupting a steady state is a potential variable in a Chaos experiment.
- Run Experiments in Production: To guarantee both authenticities of how the system is exercised and relevance to the currently deployed system, Chaos strongly prefers to experiment directly on production traffic.
- Automate Experiments to Run Continuously: Running experiments manually is labor-intensive and ultimately unsustainable.
- Minimize Blast Radius: Avoid unnecessary customer pain ensuring that the fallout from the experiments is minimized and contained.
This is only one of the strategies you can use, along with Synthetic Testing and many others. We encourage you to experiment to see what works best for you.
Possible Challenges
- Choosing a set of tools capable of covering all of the required test cases might be difficult.
- The execution of manual tests is normally still needed.
- Some types of tests like usability tests, usually can’t be automated.
- As test automation acts as a gatekeeper in the context of continuous delivery, the quality of the tests will have a direct impact on the results of the technique.
- The execution of a high number of tests can slow down the release process.
- An imbalance between the unit and functional testing might increase the chances of problems going undetected.
When to Use
Except for extremely short-lived projects, like PoC or demos, there are no restrictions on the adoption of test automation.
Adopting in a greenfield
Test automation, as a general rule, should be mandatory in greenfield projects. Different types, sizes, and shapes of testing should be brought into the build pipeline, greatly reducing the amount of manual testing required to deliver changes to production.
To ensure the effectiveness and an overall quality snapshot, test automation should be driven side-by-side with other tools that allow us to continuously inspect code quality and get testing coverage metrics. The combination of static code analysis, Code Coverage Reports, and Software Composition Analysis can go a long way in this aspect.
Functional and acceptance testing can be driven using well-known frameworks such as Selenium, Cypress, Playwright, Protractor, Sikuli, and many others. If you are used to doing manual tests, these tools will bring tears of happiness to your eyes.
Performance and resiliency testing depends on the nature and requirements of the product. If it makes sense to use, but you are reluctant to include it in the build pipeline, running them periodically is fine and can help you ensure scalability and elasticity.
Adopting in a brownfield
Test automation in brownfield projects should, at a minimum, be used to validate core functionalities and increment coverage as time permits.
Leveraging Test Automation in brownfield projects might sometimes be a slow and incremental process as it might involve more than one big change — ranging from tools to a company-wide mindset shift. The suggested course of action is to start small, by performing unit testing and integrating as a build step, followed by integration and only then go to functional testing. Implementing a complete testing stack as part of the build process may not be feasible due to time constraints, so tackling key functionalities would be the topmost priority. Any new features (and bugs) should be covered by tests and integrated into the automated testing suite, triggered by a build.
It is not uncommon to see many Brownfield projects in which testing is done 100% manually. Having a transition to automation may potentially raise the need to shift the skill sets of the people responsible for QA. To reduce the learning curve and accelerate the adoption, a suggestion is to embrace test automation frameworks that abstract the technical aspects of core frameworks such as Selenium. Katalon Studios and Robot are good examples and provide a faster learning/transitioning curve for people that are from a manual testing background.
References
- https://www.atlassian.com/devops/devops-tools/test-automation
- https://www.pagerduty.com/resources/learn/what-is-chaos-testing
- https://principlesofchaos.org/
Acknowledgement
This article was written by João Pedro São Gregório Silva, Software Developer and co-authored by Isac Sacchi Souza, Principal DevOps Specialist, Systems Architect & member of the DevOps Technology Practice. Thanks to João Augusto Caleffi and the DevOps Technology Practice for reviews and insights.
About Encora
Fast-growing tech companies partner with Encora to outsource product development and drive growth. Contact us to learn more about our software engineering capabilities.