Avoiding Antipatterns in AWS Lambda

2026-03-11 - 18 min read

Daniel Young

Founder, DRYCodeWorks

Lambda is deceptively simple until you hit the sharp edges. Cold starts, memory-CPU coupling, timeout chains, the 75GB storage cliff — this guide covers the antipatterns that actually bite experienced developers, with concrete fixes for each one.

Lambda's pitch is simple: upload code, trigger it, pay per invocation. The reality is a runtime with a dozen hidden constraints that interact in non-obvious ways. The AWS docs cover what Lambda does. This guide covers what Lambda does to you when you ignore its sharp edges.

These aren't theoretical concerns. Every antipattern here comes from production incidents, load testing surprises, and architecture decisions that looked reasonable until they didn't.

Cold Starts: When They Matter and When They Don't

The cold start conversation is overblown in some contexts and underappreciated in others. Let's be precise about what actually happens.

When Lambda receives a request and no warm execution environment is available, it must:

The cold start penalty is the sum of everything before your handler runs: downloading your deployment package, starting the language runtime, executing your global/init scope code, and (if you're in a VPC) attaching an Elastic Network Interface.

What actually drives cold start duration

Package size: A 50MB deployment package takes longer to download than a 5MB one. This is the one you have the most control over.
Runtime choice: Python and Node.js cold starts are typically 100-500ms. Java and .NET can hit 3-10+ seconds without SnapStart. Go and Rust are the fastest at 10-50ms.
Init code: Every import, every database connection, every SDK client you create in global scope adds to the cold start. This is the second biggest lever you have.
VPC attachment: Historically the biggest culprit (10-30s), but AWS improved this dramatically with Hyperplane ENI in 2019. VPC cold starts now add ~1s in most cases, but they're still non-zero.

When to care

Cold starts matter for synchronous, user-facing APIs where P99 latency matters. If your API Gateway-backed Lambda powers a checkout flow and 1% of requests take 3 seconds, users notice.

Cold starts don't matter for asynchronous processing, SQS consumers, S3 event handlers, or any background workload where an extra second of latency is invisible.

Provisioned concurrency: the expensive sledgehammer

Provisioned concurrency keeps a specified number of execution environments warm at all times. It eliminates cold starts for those environments. It also means you're paying for those environments 24/7, whether they're handling requests or not.

Provisioned concurrency converts Lambda from pay-per-use to pay-per-reserved. At $0.0000041667 per GB-second of provisioned concurrency, keeping 100 instances warm with 1GB memory costs roughly $1,080/month — before you process a single request.

Before reaching for provisioned concurrency, try these first:

Minimize deployment package size (trim unused dependencies)
Lazy-load heavy libraries — import them inside the handler, not at module level
Use a lighter runtime (Python/Node.js over Java, or enable SnapStart for Java)
Move VPC-bound Lambdas to non-VPC if they don't actually need private subnet access

Web Framework Layers: Stop Wrapping Flask in Lambda

This is the antipattern I see most often in migrations. A team has an existing Express or FastAPI application. They want to "go serverless." So they wrap the entire framework in a Lambda using something like aws-serverless-express, Mangum, or serverless-wsgi.

It works. That's the problem — it works just well enough to ship, and then slowly degrades as the mismatch between the framework's assumptions and Lambda's runtime model becomes apparent.

Why it's an antipattern

Web frameworks assume a long-lived server process. They're built around concepts that don't exist in Lambda:

Connection persistence: Express keeps connections open. Lambda terminates after the response.
In-memory state: Session stores, caches, connection pools — all tied to a process that Lambda might freeze or terminate at any time.
Startup cost: The framework has to initialize on every cold start. FastAPI's dependency injection, Express's middleware chain, Flask's app factory — all of this runs before your handler.
Routing overhead: Lambda already knows which function to invoke. Running the request through a framework's router is redundant work.

What to do instead

For new Lambda functions, consume the event directly:

handler.pypython

def lambda_handler(event, context):
  """Direct event consumption — no framework overhead."""
  http_method = event['requestContext']['http']['method']
  path = event['rawPath']

  if http_method == 'GET' and path == '/users':
      return get_users(event)
  elif http_method == 'POST' and path == '/users':
      return create_user(event)

  return {'statusCode': 404, 'body': 'Not Found'}


def get_users(event):
  # Query params from event['queryStringParameters']
  return {
      'statusCode': 200,
      'headers': {'Content-Type': 'application/json'},
      'body': json.dumps(users)
  }

def lambda_handler(event, context):
  """Direct event consumption — no framework overhead."""
  http_method = event['requestContext']['http']['method']
  path = event['rawPath']

  if http_method == 'GET' and path == '/users':
      return get_users(event)
  elif http_method == 'POST' and path == '/users':
      return create_user(event)

  return {'statusCode': 404, 'body': 'Not Found'}


def get_users(event):
  # Query params from event['queryStringParameters']
  return {
      'statusCode': 200,
      'headers': {'Content-Type': 'application/json'},
      'body': json.dumps(users)
  }

For Node.js:

handler.jsjavascript

export const handler = async (event) => {
const { method } = event.requestContext.http;
const path = event.rawPath;

if (method === 'GET' && path === '/users') {
  return getUsers(event);
}

return { statusCode: 404, body: 'Not Found' };
};

export const handler = async (event) => {
const { method } = event.requestContext.http;
const path = event.rawPath;

if (method === 'GET' && path === '/users') {
  return getUsers(event);
}

return { statusCode: 404, body: 'Not Found' };
};

When framework wrappers are justified

There's one legitimate use case: active migration. If you're moving a monolith to Lambda and need the application running now while you refactor, a framework wrapper buys you time. But treat it as tech debt with a deadline, not an architecture.

AWS's own Lambda Powertools library provides the right abstraction level — event parsing, validation, middleware patterns — without the overhead of a full web framework. Use that instead.

CloudFront vs API Gateway for VPC-Protected Lambdas

Most teams default to API Gateway because it's the first thing the AWS docs show you. For many workloads, this is the most expensive possible choice.

The cost math

API Gateway REST API charges $3.50 per million requests. API Gateway HTTP API is cheaper at $1.00 per million. But Lambda Function URLs are free — no per-request charge at all. Pair a Function URL with CloudFront ($0.085 per million for the first 10M requests) and you've cut your request costs by 97%.

When API Gateway still wins

Built-in authorization: Cognito integration, Lambda authorizers, API keys, and usage plans are native to API Gateway. Replicating these with CloudFront requires Lambda@Edge or CloudFront Functions.
Request/response transformation: API Gateway can map, validate, and transform payloads without hitting your Lambda at all.
WebSocket API: If you need WebSocket support (more on this later), API Gateway is currently the only managed option.
Usage plans and throttling: Per-client rate limiting and quota management is built into API Gateway. CloudFront doesn't have this.

When CloudFront wins

High-volume, cost-sensitive workloads: At 10M+ requests/month, the cost difference is significant.
Response caching: CloudFront is a CDN. If any of your responses are cacheable, you eliminate Lambda invocations entirely for cache hits.
Global distribution: CloudFront edge locations reduce latency for global audiences. API Gateway (Regional) doesn't.
No 29-second timeout: Lambda Function URLs support the full 900s Lambda timeout. CloudFront doesn't impose a request timeout shorter than that.

VPC considerations

If your Lambda is in a VPC (to access RDS, ElastiCache, etc.), the invocation path doesn't change — both API Gateway and CloudFront invoke the Lambda the same way. But remember that VPC-attached Lambdas still carry a cold start overhead for ENI attachment. This is independent of whether you front them with API Gateway or CloudFront.

Memory: The Most Misunderstood Configuration

Lambda's memory setting doesn't just control memory. It's a single dial that proportionally controls CPU allocation, network bandwidth, and disk I/O throughput.

The key thresholds

128 MB: The minimum. You get a fraction of one vCPU. CPU-bound tasks will be painfully slow. This is only appropriate for the most trivial operations.
1,769 MB: One full vCPU. This is where most workloads find their sweet spot. Below this, you're CPU-throttled. Above this, you're paying for multi-core capability.
3,009 MB+: You get multiple vCPUs, but only if your code is actually multi-threaded. Single-threaded Python doesn't benefit from 6 vCPUs.
10,240 MB: The maximum. Six vCPUs. Useful for parallel data processing, but expensive.

The counterintuitive cost optimization

Here's what trips people up: increasing memory can reduce your bill. Lambda charges per GB-second. If doubling memory from 512MB to 1024MB halves your execution time (because your function was CPU-bound), you pay the same GB-seconds but get faster responses.

benchmark_memory.pypython

# Benchmark different memory configurations
# Run this against your actual function payload

import boto3
import json
import base64
import re
import time

lambda_client = boto3.client('lambda')

def benchmark_memory(function_name, payload, memory_sizes):
  results = []
  for memory in memory_sizes:
      # Update function memory
      lambda_client.update_function_configuration(
          FunctionName=function_name,
          MemorySize=memory
      )
      time.sleep(5)  # Wait for config to propagate

      # Run 10 invocations, parse billed duration from logs
      durations = []
      for _ in range(10):
          response = lambda_client.invoke(
              FunctionName=function_name,
              Payload=payload,
              LogType='Tail'  # Return last 4KB of logs
          )
          log_output = base64.b64decode(
              response['LogResult']
          ).decode('utf-8')
          match = re.search(
              r'Billed Duration: (\d+) ms', log_output
          )
          if match:
              durations.append(float(match.group(1)))

      avg_duration = sum(durations) / len(durations)
      gb_seconds = (memory / 1024) * (avg_duration / 1000)
      results.append({
          'memory_mb': memory,
          'avg_duration_ms': round(avg_duration, 1),
          'gb_seconds': round(gb_seconds, 4)
      })

  return results

# Benchmark different memory configurations
# Run this against your actual function payload

import boto3
import json
import base64
import re
import time

lambda_client = boto3.client('lambda')

def benchmark_memory(function_name, payload, memory_sizes):
  results = []
  for memory in memory_sizes:
      # Update function memory
      lambda_client.update_function_configuration(
          FunctionName=function_name,
          MemorySize=memory
      )
      time.sleep(5)  # Wait for config to propagate

      # Run 10 invocations, parse billed duration from logs
      durations = []
      for _ in range(10):
          response = lambda_client.invoke(
              FunctionName=function_name,
              Payload=payload,
              LogType='Tail'  # Return last 4KB of logs
          )
          log_output = base64.b64decode(
              response['LogResult']
          ).decode('utf-8')
          match = re.search(
              r'Billed Duration: (\d+) ms', log_output
          )
          if match:
              durations.append(float(match.group(1)))

      avg_duration = sum(durations) / len(durations)
      gb_seconds = (memory / 1024) * (avg_duration / 1000)
      results.append({
          'memory_mb': memory,
          'avg_duration_ms': round(avg_duration, 1),
          'gb_seconds': round(gb_seconds, 4)
      })

  return results

Memory Sweet Spot

Run your function at 128MB, 512MB, 1024MB, 1769MB, and 3008MB with production-like payloads. Plot duration vs. memory. The point where doubling memory no longer halves duration is your sweet spot. For most workloads, this is between 1,024 and 1,769 MB.

Ephemeral Storage: The /tmp Trap

Lambda provides /tmp as writable ephemeral storage. The default is 512MB, configurable up to 10GB. What the docs understate: /tmp persists across warm invocations of the same execution environment.

This is both useful and dangerous.

The gotcha

tmp_gotcha.pypython

import os

def lambda_handler(event, context):
  output_path = '/tmp/report.csv'

  # BUG: On warm invocations, this file already exists
  # from the previous invocation
  with open(output_path, 'a') as f:  # 'a' appends!
      f.write(generate_report(event))

  # You're now returning a file that contains data
  # from multiple unrelated invocations
  return upload_to_s3(output_path)

import os

def lambda_handler(event, context):
  output_path = '/tmp/report.csv'

  # BUG: On warm invocations, this file already exists
  # from the previous invocation
  with open(output_path, 'a') as f:  # 'a' appends!
      f.write(generate_report(event))

  # You're now returning a file that contains data
  # from multiple unrelated invocations
  return upload_to_s3(output_path)

The fix is simple — either use unique filenames or clean up at the start of each invocation:

tmp_fixed.pypython

import os
import uuid

def lambda_handler(event, context):
  # Option 1: Unique filename per invocation
  output_path = f'/tmp/report-{uuid.uuid4()}.csv'

  # Option 2: Clean up at the start
  for f in os.listdir('/tmp'):
      os.remove(os.path.join('/tmp', f))

  # Now safe to write
  with open('/tmp/report.csv', 'w') as f:  # 'w' not 'a'
      f.write(generate_report(event))

import os
import uuid

def lambda_handler(event, context):
  # Option 1: Unique filename per invocation
  output_path = f'/tmp/report-{uuid.uuid4()}.csv'

  # Option 2: Clean up at the start
  for f in os.listdir('/tmp'):
      os.remove(os.path.join('/tmp', f))

  # Now safe to write
  with open('/tmp/report.csv', 'w') as f:  # 'w' not 'a'
      f.write(generate_report(event))

If you're writing to /tmp and your function is invoked frequently, you can exhaust your ephemeral storage across warm invocations. Each warm call accumulates files unless you clean them up. Lambda doesn't do this for you.

When to increase /tmp

The default 512MB is generous for most use cases. Increase it when you're processing large files (image/video conversion, CSV processing, ML model loading) that need local disk. Remember that increasing /tmp also increases cost — you're charged $0.0000000309 per GB-second of ephemeral storage above 512MB.

Dependency Management and Layers

Lambda has a hard limit: your deployment package can't exceed 250MB unzipped (50MB zipped). This includes your code and all dependencies. It sounds generous until you pip install pandas numpy scipy and you're at 240MB.

Lambda Layers

Layers let you share common dependencies across functions. But they come with constraints:

5 layers maximum per function
250MB total unzipped — this is shared with your deployment package, not in addition to it
Layers are immutable — each update creates a new version
Cross-account sharing requires explicit permission grants

Packaging strategies that work

Strip unnecessary files. Most Python packages ship with tests, docs, and type stubs you don't need at runtime. Use pip install --no-cache-dir -t . -r requirements.txt followed by removing __pycache__, *.dist-info, and test directories.
Use Lambda-optimized packages. Libraries like aws-lambda-powertools are designed for the Lambda environment. Avoid pulling in kitchen-sink frameworks.
Consider container images. If your dependencies exceed 250MB, container images support up to 10GB. The cold start is slightly longer (image pull), but you're not fighting size limits anymore.

build-layer.shbash

#!/bin/bash
# Build a lean Python Lambda layer

mkdir -p layer/python
pip install -r requirements.txt   --no-cache-dir   -t layer/python   --platform manylinux2014_x86_64   --only-binary=:all:

# Strip test files and caches
find layer -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "*.dist-info" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "tests" -exec rm -rf {} + 2>/dev/null

# Package
cd layer && zip -r ../layer.zip . -x "*.pyc"
echo "Layer size: $(du -sh ../layer.zip | cut -f1)"
# If this exceeds 50MB zipped, you need to slim dependencies

#!/bin/bash
# Build a lean Python Lambda layer

mkdir -p layer/python
pip install -r requirements.txt   --no-cache-dir   -t layer/python   --platform manylinux2014_x86_64   --only-binary=:all:

# Strip test files and caches
find layer -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "*.dist-info" -exec rm -rf {} + 2>/dev/null
find layer -type d -name "tests" -exec rm -rf {} + 2>/dev/null

# Package
cd layer && zip -r ../layer.zip . -x "*.pyc"
echo "Layer size: $(du -sh ../layer.zip | cut -f1)"
# If this exceeds 50MB zipped, you need to slim dependencies

The 75GB Code Storage Limit

This one catches teams off guard. Lambda enforces a 75GB limit on total code storage per region per account. Every function version, every layer version, every $LATEST — they all count toward this limit.

If you use versioned deployments (and you should for aliases and weighted traffic shifting), old versions accumulate silently. Deploy 10 times a day across 50 functions with 20MB packages, and you're adding ~1GB per day. In two months, you're at the limit.

The fix

Set up a cleanup policy. This Lambda function deletes old versions automatically:

cleanup_versions.pypython

import boto3

lambda_client = boto3.client('lambda')

def cleanup_old_versions(function_name, keep_latest=5):
  """Delete all but the N most recent versions."""
  versions = lambda_client.list_versions_by_function(
      FunctionName=function_name
  )['Versions']

  # Filter out $LATEST and sort by version number
  numbered = [v for v in versions if v['Version'] != '$LATEST']
  numbered.sort(key=lambda v: int(v['Version']), reverse=True)

  for version in numbered[keep_latest:]:
      try:
          lambda_client.delete_function(
              FunctionName=function_name,
              Qualifier=version['Version']
          )
          print(f"Deleted {function_name}:{version['Version']}")
      except Exception as e:
          # Version might be referenced by an alias
          print(f"Skipped {function_name}:{version['Version']}: {e}")

import boto3

lambda_client = boto3.client('lambda')

def cleanup_old_versions(function_name, keep_latest=5):
  """Delete all but the N most recent versions."""
  versions = lambda_client.list_versions_by_function(
      FunctionName=function_name
  )['Versions']

  # Filter out $LATEST and sort by version number
  numbered = [v for v in versions if v['Version'] != '$LATEST']
  numbered.sort(key=lambda v: int(v['Version']), reverse=True)

  for version in numbered[keep_latest:]:
      try:
          lambda_client.delete_function(
              FunctionName=function_name,
              Qualifier=version['Version']
          )
          print(f"Deleted {function_name}:{version['Version']}")
      except Exception as e:
          # Version might be referenced by an alias
          print(f"Skipped {function_name}:{version['Version']}: {e}")

Run this as a scheduled Lambda (EventBridge rule, daily) targeting all your functions. Keeping 3-5 recent versions is usually sufficient for rollback capability.

Timeouts: The Chain You Didn't Think About

Lambda supports up to 900 seconds (15 minutes) of execution time. But your Lambda doesn't exist in isolation — it's invoked by something, and it calls something.

The 29-second wall

If you're behind API Gateway (REST or HTTP API), your Lambda has a hard 29-second timeout regardless of what you configure on the function. API Gateway will return a 504 to your client and your Lambda keeps running — consuming resources and potentially completing writes that nobody is waiting for.

The cascade

The rule is simple: downstream timeout < Lambda timeout < trigger timeout.

Set your database query timeout shorter than your Lambda timeout
Set your Lambda timeout shorter than your API Gateway/ALB timeout
Leave buffer at each layer for cleanup, logging, and error handling

timeout_chain.pypython

import requests

def lambda_handler(event, context):
  remaining_ms = context.get_remaining_time_in_millis()

  # Leave 5s buffer for cleanup
  downstream_timeout = max(1, (remaining_ms - 5000) / 1000)

  try:
      response = requests.get(
          'https://api.external-service.com/data',
          timeout=downstream_timeout
      )
      return {'statusCode': 200, 'body': response.text}
  except requests.Timeout:
      return {
          'statusCode': 504,
          'body': 'Downstream service timed out'
      }

import requests

def lambda_handler(event, context):
  remaining_ms = context.get_remaining_time_in_millis()

  # Leave 5s buffer for cleanup
  downstream_timeout = max(1, (remaining_ms - 5000) / 1000)

  try:
      response = requests.get(
          'https://api.external-service.com/data',
          timeout=downstream_timeout
      )
      return {'statusCode': 200, 'body': response.text}
  except requests.Timeout:
      return {
          'statusCode': 504,
          'body': 'Downstream service timed out'
      }

When you need more than 29 seconds

Use ALB instead of API Gateway. ALB supports up to 900s timeout, matching Lambda's maximum.
Use async invocation. Return a 202 immediately, process in the background, notify via webhook/SNS/SQS when done.
Use Step Functions. Orchestrate multi-step workflows where each step is a short Lambda invocation.

SSE, WebSockets, and Response Streaming

Lambda is fundamentally request-response. This creates friction with real-time communication patterns.

Response streaming

Lambda supports response streaming via Function URLs. This lets your function write to a response stream incrementally instead of buffering the entire response in memory. It's useful for large payloads, server-sent events (SSE), and progressive rendering.

streaming.mjsjavascript

export const handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
  // Set SSE headers
  const metadata = {
    statusCode: 200,
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  };
  responseStream = awslambda.HttpResponseStream.from(
    responseStream,
    metadata
  );

  // Stream events
  for (const item of await fetchItems()) {
    responseStream.write(`data: ${JSON.stringify(item)}\n\n`);
  }
  responseStream.end();
}
);

export const handler = awslambda.streamifyResponse(
async (event, responseStream, context) => {
  // Set SSE headers
  const metadata = {
    statusCode: 200,
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  };
  responseStream = awslambda.HttpResponseStream.from(
    responseStream,
    metadata
  );

  // Stream events
  for (const item of await fetchItems()) {
    responseStream.write(`data: ${JSON.stringify(item)}\n\n`);
  }
  responseStream.end();
}
);

Response streaming only works with Lambda Function URLs — not API Gateway. API Gateway buffers the entire response before sending it to the client, which defeats the purpose of streaming.

WebSockets

API Gateway WebSocket API exists, but it's a different mental model than traditional WebSockets. Each message triggers a separate Lambda invocation. There's no persistent connection handler — you manage connection state in DynamoDB.

Costs add up fast. You're charged per message ($1.00 per million), per connection minute ($0.25 per million), plus the Lambda invocation costs. For a chat application sending 100 messages/second, you're looking at ~$260/month just in WebSocket API charges before Lambda costs.

When to reach for Fargate

If your application needs:

Persistent connections (WebSockets with high message frequency)
True bidirectional streaming
Long-running processes that exceed 15 minutes
Steady-state traffic patterns where pay-per-use doesn't save money

Then Fargate (or ECS/EKS) is the right tool. Lambda and Fargate aren't competitors — they're complementary. Use Lambda for event-driven, bursty workloads. Use Fargate for long-lived, connection-oriented workloads.

Supported Runtimes and the Deprecation Clock

AWS deprecates Lambda runtimes on a fixed schedule tied to the upstream language's end-of-life. When a runtime is deprecated:

Phase 1: You can't create new functions with the runtime, but existing functions continue to work and receive updates
Phase 2: You can't update existing functions. They still run, but you're stuck

Current runtime landscape (as of early 2026)

Runtime	Status	Notes
Python 3.13	Current	Recommended
Python 3.12	Supported	Still common in production
Node.js 22.x	Current	Recommended
Node.js 20.x	Supported	EOL approaching
Java 21	Current	Use with SnapStart for fast cold starts
.NET 8	Current	AOT compilation reduces cold starts
Go	Custom runtime	Use `provided.al2023` — no managed runtime
Rust	Custom runtime	Use `provided.al2023` via cargo-lambda

Container images vs zip

Container images (up to 10GB) bypass the 250MB zip limit and let you use your existing Docker toolchain. The trade-off is slightly longer cold starts due to image pull time, though AWS caches images aggressively.

Use container images when:

Your dependencies exceed 250MB
You need custom system libraries or binaries
Your team already has Docker build pipelines
You want consistent local-to-cloud parity

Connection Pooling: The Serverless Paradox

Lambda's execution model creates a unique challenge for database connections. Each concurrent invocation runs in its own execution environment. At 1,000 concurrent invocations, you potentially have 1,000 open database connections — more than most databases can handle.

We covered this topic in depth in our RDS Proxy guide, but here's the summary of what matters for Lambda antipatterns.

The antipattern: opening connections in the handler

bad_connections.pypython

import os

def lambda_handler(event, context):
  # BAD: New connection every invocation
  conn = psycopg2.connect(
      host=os.environ['DB_HOST'],
      database=os.environ['DB_NAME'],
      user=os.environ['DB_USER'],
      password=os.environ['DB_PASSWORD']
  )
  cursor = conn.cursor()
  cursor.execute("SELECT * FROM users")
  result = cursor.fetchall()
  conn.close()  # Connection is gone, next invocation starts over
  return result

import os

def lambda_handler(event, context):
  # BAD: New connection every invocation
  conn = psycopg2.connect(
      host=os.environ['DB_HOST'],
      database=os.environ['DB_NAME'],
      user=os.environ['DB_USER'],
      password=os.environ['DB_PASSWORD']
  )
  cursor = conn.cursor()
  cursor.execute("SELECT * FROM users")
  result = cursor.fetchall()
  conn.close()  # Connection is gone, next invocation starts over
  return result

The fix: global scope with validation

good_connections.pypython

import os
import psycopg2

# Connection persists across warm invocations
conn = None

def get_connection():
  global conn
  if conn is None or conn.closed:
      conn = psycopg2.connect(
          host=os.environ['DB_HOST'],
          database=os.environ['DB_NAME'],
          user=os.environ['DB_USER'],
          password=os.environ['DB_PASSWORD']
      )
  return conn

def lambda_handler(event, context):
  db = get_connection()
  cursor = db.cursor()
  cursor.execute("SELECT * FROM users")
  return cursor.fetchall()

import os
import psycopg2

# Connection persists across warm invocations
conn = None

def get_connection():
  global conn
  if conn is None or conn.closed:
      conn = psycopg2.connect(
          host=os.environ['DB_HOST'],
          database=os.environ['DB_NAME'],
          user=os.environ['DB_USER'],
          password=os.environ['DB_PASSWORD']
      )
  return conn

def lambda_handler(event, context):
  db = get_connection()
  cursor = db.cursor()
  cursor.execute("SELECT * FROM users")
  return cursor.fetchall()

When you need RDS Proxy

High concurrency (hundreds of simultaneous invocations)
Connection storms during traffic spikes when many cold starts happen simultaneously
Database connection limits are a bottleneck (especially on smaller RDS instances)
IAM database authentication — RDS Proxy handles token refresh automatically

For low-to-moderate concurrency (under ~50 concurrent invocations), global scope connection reuse is usually sufficient. Don't add RDS Proxy complexity until you've actually hit connection limits.

Lambda Insights and What's Worth Monitoring

Lambda sends metrics to CloudWatch automatically, but the defaults don't tell you much. Here's what to actually watch.

Metrics that matter

Duration (P99, not average): Your average might be 200ms, but if P99 is 3 seconds, 1% of your users are having a bad time. Cold starts inflate P99 specifically.
ConcurrentExecutions: How close are you to your account limit (default 1,000)? If you hit it, Lambda starts throttling. Set an alarm at 80%.
Throttles: If this is non-zero, you're dropping requests. Increase your account concurrency limit or add reserved concurrency to critical functions.
Errors: Obvious, but distinguish between your code errors and Lambda platform errors. Platform errors (out-of-memory, timeout) indicate configuration problems, not bugs.
IteratorAge (for stream-based triggers): If this is growing, your function can't keep up with the stream. Scale up memory or increase batch size.

Lambda Insights

Lambda Insights is a CloudWatch extension that provides per-invocation telemetry — memory utilization, CPU time, network I/O, and disk I/O. It costs $0.50 per function per month, which is worth it for production functions where you need to right-size memory.

Enable it to answer the question: "Am I allocating more memory than I use?" If your function uses 200MB peak but you've allocated 1,769MB, you're overpaying. If it uses 1,700MB, you're cutting it dangerously close and should increase.

Custom metrics that pay for themselves

custom_metrics.pypython

import time
from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics()

@metrics.log_metrics
def lambda_handler(event, context):
  start = time.time()

  # Track cold start vs warm
  if not hasattr(lambda_handler, '_warm'):
      metrics.add_metric(
          name="ColdStart", unit=MetricUnit.Count, value=1
      )
      lambda_handler._warm = True
  else:
      metrics.add_metric(
          name="WarmStart", unit=MetricUnit.Count, value=1
      )

  # Track downstream latency separately
  db_start = time.time()
  result = query_database(event)
  db_duration = (time.time() - db_start) * 1000
  metrics.add_metric(
      name="DatabaseLatencyMs",
      unit=MetricUnit.Milliseconds,
      value=db_duration
  )

  return result

import time
from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit

metrics = Metrics()

@metrics.log_metrics
def lambda_handler(event, context):
  start = time.time()

  # Track cold start vs warm
  if not hasattr(lambda_handler, '_warm'):
      metrics.add_metric(
          name="ColdStart", unit=MetricUnit.Count, value=1
      )
      lambda_handler._warm = True
  else:
      metrics.add_metric(
          name="WarmStart", unit=MetricUnit.Count, value=1
      )

  # Track downstream latency separately
  db_start = time.time()
  result = query_database(event)
  db_duration = (time.time() - db_start) * 1000
  metrics.add_metric(
      name="DatabaseLatencyMs",
      unit=MetricUnit.Milliseconds,
      value=db_duration
  )

  return result

Putting It All Together

Lambda is a powerful building block, but it rewards engineers who understand its constraints rather than those who fight them. The common thread across every antipattern in this guide:

Understand the execution model. Lambda is ephemeral, stateless, and billed by the millisecond. Design for that reality.
Measure before optimizing. The memory-CPU relationship, cold start impact, and connection pooling decisions all depend on your specific workload.
Know the limits. 29-second API Gateway timeouts, 250MB package sizes, 75GB code storage, 5 layers — these aren't soft limits you can request increases on. Design around them.
Use the right tool. Lambda isn't always the answer. WebSockets, long-running processes, and high-throughput streaming are better served by Fargate. That's not a failure — it's good architecture.

If you're building on Lambda and want to avoid the expensive lessons, I can help you get the architecture right from the start — whether that's a greenfield serverless build or untangling a migration that went sideways.

Get in touch to discuss your Lambda architecture.

Resources

Lambda Quotas — The definitive reference for all Lambda limits (memory, timeout, package size, concurrency, storage)
Lambda Execution Environment — How Lambda manages execution environments, cold starts, and warm reuse
Lambda Function URLs — Configuration and comparison with API Gateway
Response Streaming — How to use response streaming with Function URLs
Lambda Powertools — The recommended utility library for Lambda (Python and TypeScript)
RDS Proxy Connection Best Practices — Our deep dive into connection management with RDS Proxy and Lambda