May 27, 2026 · 3 min read

Our AWS Bill Went from $200 to $14,000 in One Week

A forgotten Lambda function with a recursive trigger. That’s all it took to generate a five-figure AWS bill over a weekend. Here’s exactly what happened and how to make sure it never happens to you.

The email from AWS billing alerts hit at 7 AM on Monday: “Your estimated charges have exceeded $5,000.”

Our normal monthly bill was $200.

What happened

A Lambda function that processed image uploads had a bug. When it failed to process an image, it put the image back on the SQS queue for retry. The retry triggered the same Lambda. Which failed again. Which put it back on the queue.

An infinite loop, running at cloud scale.

The math

Each Lambda invocation: 3 seconds, 1024MB memory
Cost per invocation: ~$0.00005
Invocations per minute: ~2,000 (SQS kept feeding the Lambda)
Per hour: 120,000 invocations = $6
Per day: 2,880,000 invocations = $144
Per week: ~20 million invocations = $1,000

But that’s just Lambda. The real cost was data transfer and SQS:

Each retry read the image from S3: $0.0004 per GET request × 20 million = $8,000
SQS messages: $0.40 per million × 20 million = $8
Data transfer: the images averaged 2MB each, and… yeah.

Total: ~$14,000 in 7 days.

Why nobody noticed for a week

Billing alerts were set to $500. Our normal bill was $200, so $500 seemed generous. The bill blew past $500 on day 3 (Saturday). Nobody checks email on Saturday.
No Lambda concurrency limits. AWS will happily run 1,000 concurrent Lambdas by default. We never set a limit.
CloudWatch alarms were on the wrong metric. We monitored error rate (percentage), not error count. The error rate was 0.1% — because the retries counted as new invocations, diluting the error rate.

The bug

def handler(event, context):
    for record in event['Records']:
        try:
            image_url = json.loads(record['body'])['url']
            process_image(image_url)
        except Exception as e:
            logger.error(f"Failed to process: {e}")
            raise  # This causes SQS to retry

The raise at the end tells SQS the message wasn’t processed, so SQS puts it back on the queue. For transient errors (network timeout), this is correct. For permanent errors (corrupt image), this creates an infinite loop.

The fix

def handler(event, context):
    for record in event['Records']:
        try:
            image_url = json.loads(record['body'])['url']
            process_image(image_url)
        except TransientError:
            raise  # Retry for temporary failures
        except Exception as e:
            logger.error(f"Permanent failure, sending to DLQ: {e}")
            # Don't raise — let SQS delete the message
            # The Dead Letter Queue catches it for investigation

Prevention measures

1. SQS Dead Letter Queue (should have been there from day 1)

{
  "RedrivePolicy": {
    "deadLetterTargetArn": "arn:aws:sqs:...:image-processing-dlq",
    "maxReceiveCount": 3
  }
}

After 3 failed attempts, the message goes to a DLQ instead of retrying forever.

2. Lambda concurrency limit

{
  "ReservedConcurrentExecutions": 10
}

Even if the queue fills up, only 10 Lambdas run at once. This caps the blast radius.

3. Better billing alerts

$300 (50% above normal) — Slack notification
$500 — email + Slack
$1,000 — PagerDuty

4. AWS Budget Actions

Set up an AWS Budget that automatically disables the Lambda’s trigger when spending exceeds a threshold. This is the nuclear option but it prevents $14,000 surprises.

Did AWS refund it?

We opened a support ticket explaining the situation. AWS credited back about $9,000 (first-time courtesy). We still paid ~$5,000 for the lesson.

The takeaway

Serverless scales automatically. Including your mistakes. Every Lambda needs:

A Dead Letter Queue
A concurrency limit
Billing alerts at reasonable thresholds
Distinction between retryable and permanent errors

The cloud doesn’t care if your code has a bug. It will happily run that bug a million times and send you the bill.

Related: Dark Side Of Serverless

Our AWS Bill Went from $200 to $14,000 in One Week

What happened

The math

Why nobody noticed for a week

The bug

The fix

Prevention measures

1. SQS Dead Letter Queue (should have been there from day 1)

2. Lambda concurrency limit

3. Better billing alerts

4. AWS Budget Actions

Did AWS refund it?

The takeaway

📬 AI Dev Weekly

You might also like

The Dark Side of Serverless Nobody Talks About

AWS vs GCP vs Azure — Which Cloud Provider in 2026?

How a Single Regex Caused a 100% CPU Spike in Production

The Day We Accidentally Deleted the Production Database