Balancing test coverage with observability and recovery
As your serverless application grows, exponentially adding tests for new features and regressions will prove to be the biggest drag on delivery speed. Excessive test suites will drastically slow down your engineers and your pipelines. Instead, you need to find a way to balance between pre-deployment testing and observability and resiliency in production.
As Cindy Sridharan says in her seminal post “Testing Microservices, the Sane Way”: “When it comes to testing…microservices, most organizations seem to be quite attached to an antediluvian model of testing all components in unison. Elaborate testing pipeline infrastructures are considered mandatory to enable this form of end-to-end testing where the test suite of every service is executed to confirm there aren’t any regressions or breaking changes being introduced.” She goes on to suggest: “to be able to craft a holistic strategy for understanding how our services function and gain confidence in their correctness, it becomes salient to be able to pick and choose the right subset of testing techniques given the availability, reliability and correctness requirements of the service.”
By far the most effective strategy to improve delivery speed is to reduce pre-deployment test coverage. This may seem counterintuitive to preserving quality at first, but only when this action is assessed in isolation. Reducing test coverage without introducing any other quality assurance (QA) methods is never going to be a good idea.
Any perceived drop in pre-deployment test coverage made to preserve delivery speed should be balanced with other forms of QA, including alerting of degraded performance of critical user experiences and the ability to recover from any bugs that may be introduced. You can read more about the emergent practice of observability in Chapter 8 and more about fault tolerance in Chapter 6.
The key to a scalable, effective set of tests is defining a clear test strategy to help engi‐ neers understand what to test and when to test it. Without this strategy, test suites and staging environments can quickly balloon out of control and grind development and delivery to a halt.