Avoiding Antipatterns in AWS Lambda
2026-03-11 - 18 min read
Lambda is deceptively simple until you hit the sharp edges. Cold starts, memory-CPU coupling, timeout chains, the 75GB storage cliff — this guide covers the antipatterns that actually bite experienced developers, with concrete fixes for each one.
Lambda's pitch is simple: upload code, trigger it, pay per invocation. The reality is a runtime with a dozen hidden constraints that interact in non-obvious ways. The AWS docs cover what Lambda does. This guide covers what Lambda does to you when you ignore its sharp edges.
These aren't theoretical concerns. Every antipattern here comes from production incidents, load testing surprises, and architecture decisions that looked reasonable until they didn't.
Cold Starts: When They Matter and When They Don't
The cold start conversation is overblown in some contexts and underappreciated in others. Let's be precise about what actually happens.
When Lambda receives a request and no warm execution environment is available, it must:
The cold start penalty is the sum of everything before your handler runs: downloading your deployment package, starting the language runtime, executing your global/init scope code, and (if you're in a VPC) attaching an Elastic Network Interface.
What actually drives cold start duration
- Package size: A 50MB deployment package takes longer to download than a 5MB one. This is the one you have the most control over.
- Runtime choice: Python and Node.js cold starts are typically 100-500ms. Java and .NET can hit 3-10+ seconds without SnapStart. Go and Rust are the fastest at 10-50ms.
- Init code: Every import, every database connection, every SDK client you create in global scope adds to the cold start. This is the second biggest lever you have.
- VPC attachment: Historically the biggest culprit (10-30s), but AWS improved this dramatically with Hyperplane ENI in 2019. VPC cold starts now add ~1s in most cases, but they're still non-zero.
When to care
Cold starts matter for synchronous, user-facing APIs where P99 latency matters. If your API Gateway-backed Lambda powers a checkout flow and 1% of requests take 3 seconds, users notice.
Cold starts don't matter for asynchronous processing, SQS consumers, S3 event handlers, or any background workload where an extra second of latency is invisible.
Provisioned concurrency: the expensive sledgehammer
Provisioned concurrency keeps a specified number of execution environments warm at all times. It eliminates cold starts for those environments. It also means you're paying for those environments 24/7, whether they're handling requests or not.
Provisioned concurrency converts Lambda from pay-per-use to pay-per-reserved. At $0.0000041667 per GB-second of provisioned concurrency, keeping 100 instances warm with 1GB memory costs roughly $1,080/month — before you process a single request.
Before reaching for provisioned concurrency, try these first:
- Minimize deployment package size (trim unused dependencies)
- Lazy-load heavy libraries — import them inside the handler, not at module level
- Use a lighter runtime (Python/Node.js over Java, or enable SnapStart for Java)
- Move VPC-bound Lambdas to non-VPC if they don't actually need private subnet access
Web Framework Layers: Stop Wrapping Flask in Lambda
This is the antipattern I see most often in migrations. A team has an existing Express or FastAPI application. They want to "go serverless." So they wrap the entire framework in a Lambda using something like aws-serverless-express, Mangum, or serverless-wsgi.
It works. That's the problem — it works just well enough to ship, and then slowly degrades as the mismatch between the framework's assumptions and Lambda's runtime model becomes apparent.
Why it's an antipattern
Web frameworks assume a long-lived server process. They're built around concepts that don't exist in Lambda:
- Connection persistence: Express keeps connections open. Lambda terminates after the response.
- In-memory state: Session stores, caches, connection pools — all tied to a process that Lambda might freeze or terminate at any time.
- Startup cost: The framework has to initialize on every cold start. FastAPI's dependency injection, Express's middleware chain, Flask's app factory — all of this runs before your handler.
- Routing overhead: Lambda already knows which function to invoke. Running the request through a framework's router is redundant work.
What to do instead
For new Lambda functions, consume the event directly:
def lambda_handler(event, context):
"""Direct event consumption — no framework overhead."""
http_method = event['requestContext']['http']['method']
path = event['rawPath']
if http_method == 'GET' and path == '/users':
return get_users(event)
elif http_method == 'POST' and path == '/users':
return create_user(event)
return {'statusCode': 404, 'body': 'Not Found'}
def get_users(event):
# Query params from event['queryStringParameters']
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps(users)
}For Node.js:
export const handler = async (event) => {
const { method } = event.requestContext.http;
const path = event.rawPath;
if (method === 'GET' && path === '/users') {
return getUsers(event);
}
return { statusCode: 404, body: 'Not Found' };
};When framework wrappers are justified
There's one legitimate use case: active migration. If you're moving a monolith to Lambda and need the application running now while you refactor, a framework wrapper buys you time. But treat it as tech debt with a deadline, not an architecture.
AWS's own Lambda Powertools library provides the right abstraction level — event parsing, validation, middleware patterns — without the overhead of a full web framework. Use that instead.
CloudFront vs API Gateway for VPC-Protected Lambdas
Most teams default to API Gateway because it's the first thing the AWS docs show you. For many workloads, this is the most expensive possible choice.
The cost math
API Gateway REST API charges $3.50 per million requests. API Gateway HTTP API is cheaper at $1.00 per million. But Lambda Function URLs are free — no per-request charge at all. Pair a Function URL with CloudFront ($0.085 per million for the first 10M requests) and you've cut your request costs by 97%.
When API Gateway still wins
- Built-in authorization: Cognito integration, Lambda authorizers, API keys, and usage plans are native to API Gateway. Replicating these with CloudFront requires Lambda@Edge or CloudFront Functions.
- Request/response transformation: API Gateway can map, validate, and transform payloads without hitting your Lambda at all.
- WebSocket API: If you need WebSocket support (more on this later), API Gateway is currently the only managed option.
- Usage plans and throttling: Per-client rate limiting and quota management is built into API Gateway. CloudFront doesn't have this.
When CloudFront wins
- High-volume, cost-sensitive workloads: At 10M+ requests/month, the cost difference is significant.
- Response caching: CloudFront is a CDN. If any of your responses are cacheable, you eliminate Lambda invocations entirely for cache hits.
- Global distribution: CloudFront edge locations reduce latency for global audiences. API Gateway (Regional) doesn't.
- No 29-second timeout: Lambda Function URLs support the full 900s Lambda timeout. CloudFront doesn't impose a request timeout shorter than that.
VPC considerations
If your Lambda is in a VPC (to access RDS, ElastiCache, etc.), the invocation path doesn't change — both API Gateway and CloudFront invoke the Lambda the same way. But remember that VPC-attached Lambdas still carry a cold start overhead for ENI attachment. This is independent of whether you front them with API Gateway or CloudFront.
Memory: The Most Misunderstood Configuration
Lambda's memory setting doesn't just control memory. It's a single dial that proportionally controls CPU allocation, network bandwidth, and disk I/O throughput.
The key thresholds
- 128 MB: The minimum. You get a fraction of one vCPU. CPU-bound tasks will be painfully slow. This is only appropriate for the most trivial operations.
- 1,769 MB: One full vCPU. This is where most workloads find their sweet spot. Below this, you're CPU-throttled. Above this, you're paying for multi-core capability.
- 3,009 MB+: You get multiple vCPUs, but only if your code is actually multi-threaded. Single-threaded Python doesn't benefit from 6 vCPUs.
- 10,240 MB: The maximum. Six vCPUs. Useful for parallel data processing, but expensive.
The counterintuitive cost optimization
Here's what trips people up: increasing memory can reduce your bill. Lambda charges per GB-second. If doubling memory from 512MB to 1024MB halves your execution time (because your function was CPU-bound), you pay the same GB-seconds but get faster responses.
# Benchmark different memory configurations
# Run this against your actual function payload
import boto3
import json
import base64
import re
import time
lambda_client = boto3.client('lambda')
def benchmark_memory(function_name, payload, memory_sizes):
results = []
for memory in memory_sizes:
# Update function memory
lambda_client.update_function_configuration(
FunctionName=function_name,
MemorySize=memory
)
time.sleep(5) # Wait for config to propagate
# Run 10 invocations, parse billed duration from logs
durations = []
for _ in range(10):
response = lambda_client.invoke(
FunctionName=function_name,
Payload=payload,
LogType='Tail' # Return last 4KB of logs
)
log_output = base64.b64decode(
response['LogResult']
).decode('utf-8')
match = re.search(
r'Billed Duration: (\d+) ms', log_output
)
if match:
durations.append(float(match.group(1)))
avg_duration = sum(durations) / len(durations)
gb_seconds = (memory / 1024) * (avg_duration / 1000)
results.append({
'memory_mb': memory,
'avg_duration_ms': round(avg_duration, 1),
'gb_seconds': round(gb_seconds, 4)
})
return resultsMemory Sweet Spot
Run your function at 128MB, 512MB, 1024MB, 1769MB, and 3008MB with production-like payloads. Plot duration vs. memory. The point where doubling memory no longer halves duration is your sweet spot. For most workloads, this is between 1,024 and 1,769 MB.
Ephemeral Storage: The /tmp Trap
Lambda provides /tmp as writable ephemeral storage. The default is 512MB, configurable up to 10GB. What the docs understate: /tmp persists across warm invocations of the same execution environment.
This is both useful and dangerous.
The gotcha
import os
def lambda_handler(event, context):
output_path = '/tmp/report.csv'
# BUG: On warm invocations, this file already exists
# from the previous invocation
with open(output_path, 'a') as f: # 'a' appends!
f.write(generate_report(event))
# You're now returning a file that contains data
# from multiple unrelated invocations
return upload_to_s3(output_path)The fix is simple — either use unique filenames or clean up at the start of each invocation:
import os
import uuid
def lambda_handler(event, context):
# Option 1: Unique filename per invocation
output_path = f'/tmp/report-{uuid.uuid4()}.csv'
# Option 2: Clean up at the start
for f in os.listdir('/tmp'):
os.remove(os.path.join('/tmp', f))
# Now safe to write
with open('/tmp/report.csv', 'w') as f: # 'w' not 'a'
f.write(generate_report(event))If you're writing to /tmp and your function is invoked frequently, you can exhaust your ephemeral storage across warm invocations. Each warm call accumulates files unless you clean them up. Lambda doesn't do this for you.
When to increase /tmp
The default 512MB is generous for most use cases. Increase it when you're processing large files (image/video conversion, CSV processing, ML model loading) that need local disk. Remember that increasing /tmp also increases cost — you're charged $0.0000000309 per GB-second of ephemeral storage above 512MB.
Dependency Management and Layers
Lambda has a hard limit: your deployment package can't exceed 250MB unzipped (50MB zipped). This includes your code and all dependencies. It sounds generous until you pip install pandas numpy scipy and you're at 240MB.
Lambda Layers
Layers let you share common dependencies across functions. But they come with constraints:
- 5 layers maximum per function
- 250MB total unzipped — this is shared with your deployment package, not in addition to it
- Layers are immutable — each update creates a new version
- Cross-account sharing requires explicit permission grants
Packaging strategies that work
- Strip unnecessary files. Most Python packages ship with tests, docs, and type stubs you don't need at runtime. Use
pip install --no-cache-dir -t . -r requirements.txtfollowed by removing__pycache__,*.dist-info, and test directories. - Use Lambda-optimized packages. Libraries like
aws-lambda-powertoolsare designed for the Lambda environment. Avoid pulling in kitchen-sink frameworks. - Consider container images. If your dependencies exceed 250MB, container images support up to 10GB. The cold start is slightly longer (image pull), but you're not fighting size limits anymore.
#!/bin/bash
# Build a lean Python Lambda layer
mkdir -p layer/python
pip install -r requirements.txt --no-cache-dir -t layer/python --platform manylinux2014_x86_64 --only-binary=:all:
# Strip test files and caches
find layer -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "*.dist-info" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "tests" -exec rm -rf {} + 2>/dev/null
# Package
cd layer && zip -r ../layer.zip . -x "*.pyc"
echo "Layer size: $(du -sh ../layer.zip | cut -f1)"
# If this exceeds 50MB zipped, you need to slim dependenciesThe 75GB Code Storage Limit
This one catches teams off guard. Lambda enforces a 75GB limit on total code storage per region per account. Every function version, every layer version, every $LATEST — they all count toward this limit.
If you use versioned deployments (and you should for aliases and weighted traffic shifting), old versions accumulate silently. Deploy 10 times a day across 50 functions with 20MB packages, and you're adding ~1GB per day. In two months, you're at the limit.
The fix
Set up a cleanup policy. This Lambda function deletes old versions automatically:
import boto3
lambda_client = boto3.client('lambda')
def cleanup_old_versions(function_name, keep_latest=5):
"""Delete all but the N most recent versions."""
versions = lambda_client.list_versions_by_function(
FunctionName=function_name
)['Versions']
# Filter out $LATEST and sort by version number
numbered = [v for v in versions if v['Version'] != '$LATEST']
numbered.sort(key=lambda v: int(v['Version']), reverse=True)
for version in numbered[keep_latest:]:
try:
lambda_client.delete_function(
FunctionName=function_name,
Qualifier=version['Version']
)
print(f"Deleted {function_name}:{version['Version']}")
except Exception as e:
# Version might be referenced by an alias
print(f"Skipped {function_name}:{version['Version']}: {e}")Run this as a scheduled Lambda (EventBridge rule, daily) targeting all your functions. Keeping 3-5 recent versions is usually sufficient for rollback capability.
Timeouts: The Chain You Didn't Think About
Lambda supports up to 900 seconds (15 minutes) of execution time. But your Lambda doesn't exist in isolation — it's invoked by something, and it calls something.
The 29-second wall
If you're behind API Gateway (REST or HTTP API), your Lambda has a hard 29-second timeout regardless of what you configure on the function. API Gateway will return a 504 to your client and your Lambda keeps running — consuming resources and potentially completing writes that nobody is waiting for.
The cascade
The rule is simple: downstream timeout < Lambda timeout < trigger timeout.
- Set your database query timeout shorter than your Lambda timeout
- Set your Lambda timeout shorter than your API Gateway/ALB timeout
- Leave buffer at each layer for cleanup, logging, and error handling
import requests
def lambda_handler(event, context):
remaining_ms = context.get_remaining_time_in_millis()
# Leave 5s buffer for cleanup
downstream_timeout = max(1, (remaining_ms - 5000) / 1000)
try:
response = requests.get(
'https://api.external-service.com/data',
timeout=downstream_timeout
)
return {'statusCode': 200, 'body': response.text}
except requests.Timeout:
return {
'statusCode': 504,
'body': 'Downstream service timed out'
}When you need more than 29 seconds
- Use ALB instead of API Gateway. ALB supports up to 900s timeout, matching Lambda's maximum.
- Use async invocation. Return a 202 immediately, process in the background, notify via webhook/SNS/SQS when done.
- Use Step Functions. Orchestrate multi-step workflows where each step is a short Lambda invocation.
SSE, WebSockets, and Response Streaming
Lambda is fundamentally request-response. This creates friction with real-time communication patterns.
Response streaming
Lambda supports response streaming via Function URLs. This lets your function write to a response stream incrementally instead of buffering the entire response in memory. It's useful for large payloads, server-sent events (SSE), and progressive rendering.
export const handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
// Set SSE headers
const metadata = {
statusCode: 200,
headers: {
'Content-Type': 'text/event-stream',
'Cache-Control': 'no-cache',
},
};
responseStream = awslambda.HttpResponseStream.from(
responseStream,
metadata
);
// Stream events
for (const item of await fetchItems()) {
responseStream.write(`data: ${JSON.stringify(item)}\n\n`);
}
responseStream.end();
}
);Response streaming only works with Lambda Function URLs — not API Gateway. API Gateway buffers the entire response before sending it to the client, which defeats the purpose of streaming.
WebSockets
API Gateway WebSocket API exists, but it's a different mental model than traditional WebSockets. Each message triggers a separate Lambda invocation. There's no persistent connection handler — you manage connection state in DynamoDB.
Costs add up fast. You're charged per message ($1.00 per million), per connection minute ($0.25 per million), plus the Lambda invocation costs. For a chat application sending 100 messages/second, you're looking at ~$260/month just in WebSocket API charges before Lambda costs.
When to reach for Fargate
If your application needs:
- Persistent connections (WebSockets with high message frequency)
- True bidirectional streaming
- Long-running processes that exceed 15 minutes
- Steady-state traffic patterns where pay-per-use doesn't save money
Then Fargate (or ECS/EKS) is the right tool. Lambda and Fargate aren't competitors — they're complementary. Use Lambda for event-driven, bursty workloads. Use Fargate for long-lived, connection-oriented workloads.
Supported Runtimes and the Deprecation Clock
AWS deprecates Lambda runtimes on a fixed schedule tied to the upstream language's end-of-life. When a runtime is deprecated:
- Phase 1: You can't create new functions with the runtime, but existing functions continue to work and receive updates
- Phase 2: You can't update existing functions. They still run, but you're stuck
Current runtime landscape (as of early 2026)
| Runtime | Status | Notes |
|---|---|---|
| Python 3.13 | Current | Recommended |
| Python 3.12 | Supported | Still common in production |
| Node.js 22.x | Current | Recommended |
| Node.js 20.x | Supported | EOL approaching |
| Java 21 | Current | Use with SnapStart for fast cold starts |
| .NET 8 | Current | AOT compilation reduces cold starts |
| Go | Custom runtime | Use provided.al2023 — no managed runtime |
| Rust | Custom runtime | Use provided.al2023 via cargo-lambda |
Container images vs zip
Container images (up to 10GB) bypass the 250MB zip limit and let you use your existing Docker toolchain. The trade-off is slightly longer cold starts due to image pull time, though AWS caches images aggressively.
Use container images when:
- Your dependencies exceed 250MB
- You need custom system libraries or binaries
- Your team already has Docker build pipelines
- You want consistent local-to-cloud parity
Connection Pooling: The Serverless Paradox
Lambda's execution model creates a unique challenge for database connections. Each concurrent invocation runs in its own execution environment. At 1,000 concurrent invocations, you potentially have 1,000 open database connections — more than most databases can handle.
We covered this topic in depth in our RDS Proxy guide, but here's the summary of what matters for Lambda antipatterns.
The antipattern: opening connections in the handler
import os
def lambda_handler(event, context):
# BAD: New connection every invocation
conn = psycopg2.connect(
host=os.environ['DB_HOST'],
database=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD']
)
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
result = cursor.fetchall()
conn.close() # Connection is gone, next invocation starts over
return resultThe fix: global scope with validation
import os
import psycopg2
# Connection persists across warm invocations
conn = None
def get_connection():
global conn
if conn is None or conn.closed:
conn = psycopg2.connect(
host=os.environ['DB_HOST'],
database=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD']
)
return conn
def lambda_handler(event, context):
db = get_connection()
cursor = db.cursor()
cursor.execute("SELECT * FROM users")
return cursor.fetchall()When you need RDS Proxy
- High concurrency (hundreds of simultaneous invocations)
- Connection storms during traffic spikes when many cold starts happen simultaneously
- Database connection limits are a bottleneck (especially on smaller RDS instances)
- IAM database authentication — RDS Proxy handles token refresh automatically
For low-to-moderate concurrency (under ~50 concurrent invocations), global scope connection reuse is usually sufficient. Don't add RDS Proxy complexity until you've actually hit connection limits.
Lambda Insights and What's Worth Monitoring
Lambda sends metrics to CloudWatch automatically, but the defaults don't tell you much. Here's what to actually watch.
Metrics that matter
Duration(P99, not average): Your average might be 200ms, but if P99 is 3 seconds, 1% of your users are having a bad time. Cold starts inflate P99 specifically.ConcurrentExecutions: How close are you to your account limit (default 1,000)? If you hit it, Lambda starts throttling. Set an alarm at 80%.Throttles: If this is non-zero, you're dropping requests. Increase your account concurrency limit or add reserved concurrency to critical functions.Errors: Obvious, but distinguish between your code errors and Lambda platform errors. Platform errors (out-of-memory, timeout) indicate configuration problems, not bugs.IteratorAge(for stream-based triggers): If this is growing, your function can't keep up with the stream. Scale up memory or increase batch size.
Lambda Insights
Lambda Insights is a CloudWatch extension that provides per-invocation telemetry — memory utilization, CPU time, network I/O, and disk I/O. It costs $0.50 per function per month, which is worth it for production functions where you need to right-size memory.
Enable it to answer the question: "Am I allocating more memory than I use?" If your function uses 200MB peak but you've allocated 1,769MB, you're overpaying. If it uses 1,700MB, you're cutting it dangerously close and should increase.
Custom metrics that pay for themselves
import time
from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
metrics = Metrics()
@metrics.log_metrics
def lambda_handler(event, context):
start = time.time()
# Track cold start vs warm
if not hasattr(lambda_handler, '_warm'):
metrics.add_metric(
name="ColdStart", unit=MetricUnit.Count, value=1
)
lambda_handler._warm = True
else:
metrics.add_metric(
name="WarmStart", unit=MetricUnit.Count, value=1
)
# Track downstream latency separately
db_start = time.time()
result = query_database(event)
db_duration = (time.time() - db_start) * 1000
metrics.add_metric(
name="DatabaseLatencyMs",
unit=MetricUnit.Milliseconds,
value=db_duration
)
return resultPutting It All Together
Lambda is a powerful building block, but it rewards engineers who understand its constraints rather than those who fight them. The common thread across every antipattern in this guide:
- Understand the execution model. Lambda is ephemeral, stateless, and billed by the millisecond. Design for that reality.
- Measure before optimizing. The memory-CPU relationship, cold start impact, and connection pooling decisions all depend on your specific workload.
- Know the limits. 29-second API Gateway timeouts, 250MB package sizes, 75GB code storage, 5 layers — these aren't soft limits you can request increases on. Design around them.
- Use the right tool. Lambda isn't always the answer. WebSockets, long-running processes, and high-throughput streaming are better served by Fargate. That's not a failure — it's good architecture.
If you're building on Lambda and want to avoid the expensive lessons, I can help you get the architecture right from the start — whether that's a greenfield serverless build or untangling a migration that went sideways.
Get in touch to discuss your Lambda architecture.
Resources
- Lambda Quotas — The definitive reference for all Lambda limits (memory, timeout, package size, concurrency, storage)
- Lambda Execution Environment — How Lambda manages execution environments, cold starts, and warm reuse
- Lambda Function URLs — Configuration and comparison with API Gateway
- Response Streaming — How to use response streaming with Function URLs
- Lambda Powertools — The recommended utility library for Lambda (Python and TypeScript)
- RDS Proxy Connection Best Practices — Our deep dive into connection management with RDS Proxy and Lambda