Serverless Compute with AWS Lambda
Serverless computing is based on the provision of machine resources, shared across many disparate clients, on demand. This shared resource model only works if the users of the servers are not allowed to greedily consume or hog the resources. Therefore, providers of serverless compute platforms must put usage constraints in place. As Figure 6-2 shows, in the case of AWS Lambda, the primary constraint is the number of concurrently executing functions.

Figure 6-2. Lambda lifecycle and concurrent executions model
Be sure to read Julian Wood’s article “Understanding AWS Lambda Scaling and Throughput” as you begin to leverage the Lambda service for your business-critical applications.
One of the most important lessons to instill in your team when getting started with serverless is that your unit of scale is concurrency. While memory consumption and execution duration are indicative metrics at the function level, the number of concurrent Lambda function executions is the ultimate metric to track when your application’s compute needs begin to scale under exceptional traffic. If the number of concurrent function executions exceeds your account’s limit, your functions will begin to be throttled. This essentially means requests to your Lambda functions will not be accepted.
Each AWS account has a default Lambda concurrency limit of 1,000 executions across all functions in a Region. This limit can be increased to tens of thousands by raising a support ticket to AWS with a valid use case. There is a different upper limit in each AWS Region.
Whenever you consider increasing the concurrency limit, keep in mind that throttling is a safety measure that AWS enforces to pro‐ tect your resources from unexpected spikes in consumption—and therefore cost—and prevent any downstream resources from being overwhelmed. In this way, function throttling is, paradoxically, a crucial aspect of serverless autoscaling.
The astute serverless team always works within the constraints of the cloud. In Chapter 8, we’ll dive into the subject of operating your serverless workload and how to understand the units of scale across various AWS services.