May 30, 2026 · 5 min read

Webhook Architecture Patterns — Retry, Idempotency, and Delivery Guarantees (2026)

If you’ve read our intro to webhooks, you know the basics: an event happens, an HTTP POST fires, a subscriber reacts. Simple enough on a whiteboard. In production, everything that can go wrong will — receivers go down, networks drop packets, and duplicate deliveries corrupt state. This guide covers the architecture patterns that make webhooks actually reliable.

Why exactly-once delivery is impossible

Distributed systems theory (the Two Generals Problem) tells us that exactly-once delivery over an unreliable network is impossible. The sender fires a POST. The receiver processes it and returns 200. But the response is lost in transit. The sender has no way to distinguish “delivered and processed” from “never arrived.” It retries, and now the receiver has processed the event twice.

You have two realistic options:

At-most-once — fire and forget. No retries. Simple, but you lose events.
At-least-once — retry until you get an acknowledgment. You guarantee delivery but accept the possibility of duplicates.

Almost every production webhook system chooses at-least-once delivery and pushes the duplicate problem to the receiver via idempotency. More on that below.

Retry with exponential backoff

When a delivery attempt fails (timeout, 5xx, connection refused), you retry. Naive retries at fixed intervals will hammer a recovering server. Exponential backoff spaces attempts out progressively:

Attempt	Delay
1	immediate
2	30 seconds
3	2 minutes
4	8 minutes
5	30 minutes
6	2 hours

Add jitter (a small random offset) to each delay so that thousands of failed webhooks don’t all retry at the exact same second and create a thundering herd.

A typical policy retries 5–8 times over 24–72 hours, then gives up and routes the event to a dead letter queue. Always respect the receiver’s response: a 410 Gone means the endpoint was deliberately removed — stop retrying immediately. For other error handling strategies, see our guide to API error handling.

HMAC signature verification

Receivers need to verify that an incoming webhook actually came from the expected sender and wasn’t tampered with in transit. The standard approach is HMAC-SHA256: the sender signs the payload with a shared secret and includes the signature in a header. The receiver recomputes the signature and compares.

This works alongside HTTPS/TLS — TLS protects the transport, HMAC proves the sender’s identity.

Sender side (Node.js):

import crypto from "node:crypto";

function signPayload(payload, secret) {
  return crypto
    .createHmac("sha256", secret)
    .update(JSON.stringify(payload))
    .digest("hex");
}

// Attach as header: X-Webhook-Signature: sha256=<signature>

Receiver side (Node.js / Express):

import crypto from "node:crypto";

function verifyWebhook(req, secret) {
  const expected = req.headers["x-webhook-signature"];
  if (!expected) return false;

  const computed =
    "sha256=" +
    crypto.createHmac("sha256", secret).update(req.rawBody).digest("hex");

  return crypto.timingSafeEqual(
    Buffer.from(expected),
    Buffer.from(computed)
  );
}

Key details:

Use crypto.timingSafeEqual — a constant-time comparison that prevents timing attacks.
Compute the signature from the raw request body, not a re-serialized object. JSON key ordering differences will break the check.
Rotate secrets periodically. During rotation, accept signatures from both the old and new secret for a short overlap window.

Idempotency on the receiver side

Since at-least-once delivery means duplicates are inevitable, receivers must be idempotent — processing the same event twice should produce the same result as processing it once.

The standard pattern:

The sender includes a unique X-Webhook-Event-Id (or equivalent) in every delivery.
The receiver stores processed event IDs (a database table, Redis set, or similar).
Before processing, check if the ID already exists. If it does, return 200 OK without re-processing.

async function handleWebhook(req, res) {
  const eventId = req.headers["x-webhook-event-id"];
  if (await store.has(eventId)) return res.sendStatus(200);

  await processEvent(req.body);
  await store.add(eventId);
  res.sendStatus(200);
}

Set a TTL on stored IDs (e.g., 7 days) so the store doesn’t grow unbounded. For a deeper dive, see Idempotency in APIs.

Dead letter queues

After exhausting all retry attempts, the event has to go somewhere. A dead letter queue (DLQ) captures these permanently failed deliveries so they aren’t silently lost.

A DLQ should store:

The full event payload
The target URL
The failure reason and HTTP status from the last attempt
A timestamp and retry count

This gives your operations team (or the subscriber themselves via a dashboard) the ability to inspect failures and manually replay events once the underlying issue is fixed. Services like AWS SQS, Google Cloud Pub/Sub, and RabbitMQ all have native DLQ support.

Fan-out: one event, multiple subscribers

Many systems need to notify multiple subscribers when a single event occurs — a new order might trigger an email service, an analytics pipeline, and an inventory system simultaneously.

Two approaches:

Direct fan-out — the webhook sender maintains a list of subscriber URLs per event type and delivers to each independently. Simple, but the sender bears the load of N deliveries and N retry chains.

Broker-mediated fan-out — the sender publishes the event once to a message broker (SNS, Pub/Sub, internal queue). The broker handles delivery to each subscriber. This decouples the sender from subscriber count and isolates failures — one slow subscriber doesn’t block the others.

For systems with more than a handful of subscribers, broker-mediated fan-out is almost always the right call. It also makes it trivial to add or remove subscribers without changing the sender’s code, which aligns with good API design principles.

Webhooks vs. event streaming (Kafka)

Webhooks and event streaming platforms like Kafka solve overlapping but different problems:

Dimension	Webhooks	Kafka / event streaming
Delivery model	Push (sender → receiver)	Pull (consumer reads from topic)
Coupling	Sender knows receiver URL	Producers and consumers decoupled
Replay	Not natively supported	Full replay from any offset
Ordering	No guarantees	Per-partition ordering
Best for	Cross-org integrations, SaaS	Internal microservices, high throughput

Use webhooks when you’re integrating with external systems you don’t control. Use event streaming when you own both sides and need ordering, replay, or very high throughput. Many architectures use both — Kafka internally, webhooks at the boundary.

Monitoring webhook health

A webhook system without observability is a webhook system that fails silently. Track these metrics:

Delivery success rate — percentage of first-attempt 2xx responses. A drop signals receiver issues.
Retry rate — how many events require retries. A spike means something is degrading.
DLQ depth — events that exhausted all retries. This should trend toward zero.
Delivery latency — p50/p95/p99 time from event creation to successful delivery.
Subscriber response time — slow receivers increase your retry queue depth and resource usage.

Alert on sustained DLQ growth and on delivery success rate dropping below a threshold (e.g., 99%). Expose a health dashboard to subscribers so they can self-diagnose — Stripe, GitHub, and Shopify all do this well.

Putting it all together

A production-grade webhook pipeline looks like this:

Event occurs → payload created and signed with HMAC
First delivery attempt to subscriber URL over HTTPS
On failure → exponential backoff with jitter, up to N retries
Receiver verifies signature, checks idempotency key, processes event
After max retries → route to dead letter queue
For multi-subscriber events → fan-out via message broker
Monitor everything: success rates, latency, DLQ depth

Webhooks are deceptively simple on the surface. The patterns above — at-least-once delivery, HMAC verification, idempotency, dead letter queues, and fan-out — are what separate a toy implementation from one that handles real traffic without losing or duplicating data.

Webhook Architecture Patterns — Retry, Idempotency, and Delivery Guarantees (2026)

Why exactly-once delivery is impossible

Retry with exponential backoff

HMAC signature verification

Idempotency on the receiver side

Dead letter queues

Fan-out: one event, multiple subscribers

Webhooks vs. event streaming (Kafka)

Monitoring webhook health

Putting it all together

📬 AI Dev Weekly

You might also like

Idempotency in APIs — Why It Matters and How to Implement It (2026)

Pagination Patterns — Cursor vs Offset vs Keyset Explained (2026)

How to Handle API Errors Gracefully — Status Codes, Error Bodies, and Retries (2026)

REST API Versioning Strategies — URL, Header, or Query Param? (2026)