Failure Modes and Effects Analysis worksheet

The Failure Modes and Effects Analysis (FMEA) worksheet in Appendix C can be used to determine and categorize potential failure modes for the services in your application. As you analyze the types of failures that could occur when operating your application in a production environment, use the FMEA worksheet to guide your analysis, recording the details of possible failures along with their causes and effects.

For each failure, you should rate the probability of it occurring, the severity of the failure’s effects if it does occur, and the likelihood of the fault being detected by a test suite before deployment to production (for example, using a five-point scale). Each failure can then be assigned a risk level by multiplying the probability by the severity and adding the detection rating. The risk levels can be used to prioritize test coverage or work on observability and fault tolerance.

Designing a Serverless Test Strategy

The test-driven development (TDD) movement that was popularized in the early 2000s made testing a primary concern for software engineers and championed auto‐ mation over human toil. Automated testing has since become the status quo. Manual testing still has a role to play, but it should only be applied in appropriate scenarios, never as the default. Predictability is of course a key feature of automated tests, and this will be explored later in this chapter.

Beyond the sociotechnical behaviors TDD encourages, the core practice of TDD involves first writing tests that will fail based on a feature’s requirements and then implementing that feature until the tests pass. In reality, with cloud native serverless applications you will find you rarely run tests locally, aside perhaps from directly before committing the code changes to source control. This is mainly due to the difficulties associated with emulation.

With web applications or monolithic backends, it is trivial to spin up local instances and continually run full end-to-end test suites in response to every code change you make in your IDE. Testing can be a part of the development cycle. Testing cloud native software involves a different approach in order to integrate it into a rapid development feedback loop; this has forced engineering teams to rethink the role testing plays in developer workflows.

When you’re getting started with serverless, it can seem like a dis‐ advantage to not be able to trivally run your code locally. However, if you can find an ergonomic, quick-enough workflow that suits you (see Chapter 6 for more on this topic), exclusively running and testing your code in the cloud will provide the most accurate (if not the fastest) feedback. You certainly won’t have any “it works on my machine” debates anymore.

Serverless engineers work best when contributing tightly scoped changes and fre‐ quently integrating these changes with the rest of the codebase. The changes can be deployed in isolation to the cloud and tested in full. The difference is that the feedback an engineer receives is obtained remotely, in a delivery pipeline running on a continuous integration platform, rather than locally in a terminal on their machine.

Any serverless test strategy must be designed with the unique attributes of serverless applications in mind and optimized to support the serverless engineering workflow, as described in Chapter 6. Devising a test strategy as early as possible in the lifetime of your application is absolutely crucial to the scalability of its development and its stability. Applying an ill-conceived or organically evolving test strategy will eventually catch up with you and drastically slow down delivery, which in turn will impact stability and quality.

Leave a Reply

Your email address will not be published. Required fields are marked *