System Design Interview: Design a Notification System β Step by Step
βDesign a notification systemβ tests whether you can build a reliable, scalable system that handles multiple channels, user preferences, and delivery guarantees.
Step 1: Clarify requirements (3 min)
- Channels: Email, push (mobile), SMS, in-app
- Scale: 10M users, 100M notifications/day
- Features: User preferences (opt-in/out per channel), templates, rate limiting, delivery tracking
Step 2: High-level design (10 min)
Event Source β Notification Service β Message Queue β Channel Workers β Delivery
β β
Template Engine Email/Push/SMS APIs
β
Preference Store
Key insight: Decouple notification creation from delivery. The notification service decides what to send. Channel workers handle how to send it.
Step 3: Core components (15 min)
Notification Service
Receives events (βuser signed upβ, βorder shippedβ), resolves the template, checks user preferences, and enqueues messages.
Input: { event: "order_shipped", userId: 123, data: { orderId: "ABC" } }
Output: [
{ channel: "email", to: "user@example.com", subject: "...", body: "..." },
{ channel: "push", token: "fcm-token", title: "...", body: "..." },
{ channel: "in_app", userId: 123, message: "..." }
]
User Preferences
preferences: {
userId: 123,
email: { marketing: false, transactional: true },
push: { marketing: true, transactional: true },
sms: { marketing: false, transactional: false }
}
Check preferences before enqueuing. Never send to a channel the user opted out of.
Message Queue (Kafka/SQS)
Separate queues per channel: email-queue, push-queue, sms-queue, in-app-queue. This allows independent scaling β email might need 10 workers while SMS needs 2.
Channel Workers
Each worker pulls from its queue and calls the delivery API:
- Email: SendGrid, SES, or Resend
- Push: FCM (Android), APNs (iOS)
- SMS: Twilio, SNS
- In-app: Write to database, deliver via WebSocket
Rate Limiting
Prevent notification spam: max 3 push notifications per hour per user, max 1 SMS per day for marketing.
rate_limit: { userId_channel_type β count, window }
Step 4: Reliability (10 min)
At-least-once delivery: Workers acknowledge messages only after successful delivery. Failed messages go to a dead-letter queue for retry.
Idempotency: Each notification has a unique ID. Workers check if already delivered before sending (prevents duplicate emails on retry).
Priority: Transactional notifications (password reset, 2FA) get a high-priority queue thatβs processed before marketing notifications.
Step 5: Scaling (5 min)
- 100M notifications/day = ~1,200/second average, 10K/second peak
- Kafka handles this easily with partitioning by userId
- Email is the bottleneck β SES rate limits apply. Use multiple SES accounts or pre-warm IPs.
- In-app notifications are cheapest β just a database write + WebSocket push
Common follow-ups
- βHow do you handle template versioning?β β Store templates with version IDs, render at send time
- βHow do you track delivery?β β Webhook callbacks from email/push providers, store status per notification
- βHow do you handle timezone-aware scheduling?β β Store user timezone, schedule worker checks delivery window before sending
Related: System Design: Chat App Β· System Design: URL Shortener