πŸ“š Learning Hub
Β· 2 min read

System Design Interview: Design a Notification System β€” Step by Step


β€œDesign a notification system” tests whether you can build a reliable, scalable system that handles multiple channels, user preferences, and delivery guarantees.

Step 1: Clarify requirements (3 min)

  • Channels: Email, push (mobile), SMS, in-app
  • Scale: 10M users, 100M notifications/day
  • Features: User preferences (opt-in/out per channel), templates, rate limiting, delivery tracking

Step 2: High-level design (10 min)

Event Source β†’ Notification Service β†’ Message Queue β†’ Channel Workers β†’ Delivery
                     ↓                                    ↓
              Template Engine                      Email/Push/SMS APIs
                     ↓
              Preference Store

Key insight: Decouple notification creation from delivery. The notification service decides what to send. Channel workers handle how to send it.

Step 3: Core components (15 min)

Notification Service

Receives events (β€œuser signed up”, β€œorder shipped”), resolves the template, checks user preferences, and enqueues messages.

Input:  { event: "order_shipped", userId: 123, data: { orderId: "ABC" } }
Output: [
  { channel: "email", to: "user@example.com", subject: "...", body: "..." },
  { channel: "push", token: "fcm-token", title: "...", body: "..." },
  { channel: "in_app", userId: 123, message: "..." }
]

User Preferences

preferences: {
  userId: 123,
  email: { marketing: false, transactional: true },
  push: { marketing: true, transactional: true },
  sms: { marketing: false, transactional: false }
}

Check preferences before enqueuing. Never send to a channel the user opted out of.

Message Queue (Kafka/SQS)

Separate queues per channel: email-queue, push-queue, sms-queue, in-app-queue. This allows independent scaling β€” email might need 10 workers while SMS needs 2.

Channel Workers

Each worker pulls from its queue and calls the delivery API:

  • Email: SendGrid, SES, or Resend
  • Push: FCM (Android), APNs (iOS)
  • SMS: Twilio, SNS
  • In-app: Write to database, deliver via WebSocket

Rate Limiting

Prevent notification spam: max 3 push notifications per hour per user, max 1 SMS per day for marketing.

rate_limit: { userId_channel_type β†’ count, window }

Step 4: Reliability (10 min)

At-least-once delivery: Workers acknowledge messages only after successful delivery. Failed messages go to a dead-letter queue for retry.

Idempotency: Each notification has a unique ID. Workers check if already delivered before sending (prevents duplicate emails on retry).

Priority: Transactional notifications (password reset, 2FA) get a high-priority queue that’s processed before marketing notifications.

Step 5: Scaling (5 min)

  • 100M notifications/day = ~1,200/second average, 10K/second peak
  • Kafka handles this easily with partitioning by userId
  • Email is the bottleneck β€” SES rate limits apply. Use multiple SES accounts or pre-warm IPs.
  • In-app notifications are cheapest β€” just a database write + WebSocket push

Common follow-ups

  • β€œHow do you handle template versioning?” β†’ Store templates with version IDs, render at send time
  • β€œHow do you track delivery?” β†’ Webhook callbacks from email/push providers, store status per notification
  • β€œHow do you handle timezone-aware scheduling?” β†’ Store user timezone, schedule worker checks delivery window before sending

Related: System Design: Chat App Β· System Design: URL Shortener