System Design Interview Guide

A synthesis of system design as a discipline — covering the fundamental trade-offs, common patterns, and interview preparation approach. System design interviews test your ability to design large-scale distributed systems under constraints.

What Is System Design?

System design focuses on architecting large-scale, distributed systems for real-world constraints: millions of users, terabytes of data, five nines availability. Unlike software architecture (focused on code structure), system design covers the full stack: databases, caching, load balancers, message queues, CDNs, and service decomposition.

Core Trade-offs

Every system design decision involves balancing competing concerns:

Trade-offDescriptionKey Concept
Consistency vs. AvailabilityStrong consistency costs availability during partitionsCAP Theorem
Latency vs. ConsistencyFast reads may be stale; consistent reads require coordinationPACELC Theorem
Read vs. Write optimizationOptimize for reads = denormalize; optimize for writes = normalizeCQRS
Vertical vs. Horizontal scalingScale up (bigger machine) vs. scale out (more machines)Quality Attributes
SQL vs. NoSQLACID and joins vs. scale and flexibilityCAP Theorem
Sync vs. AsyncImmediate response vs. queue and process laterEvent-Driven Architecture

System Design Building Blocks

Scalability Layer

ComponentPurposeKey Concepts
Load BalancerDistribute traffic; enable horizontal scaleRound-robin, least connections, health checks
CDNCache static assets geographicallyEdge caching, origin pull, cache invalidation
Horizontal ScalingAdd nodes rather than bigger nodesStateless services, sticky sessions
Database ShardingPartition data across multiple DB nodesHash vs. range sharding, resharding

Data Layer

ComponentPurposeConsiderations
Relational DBACID transactions, complex queriesPostgreSQL, MySQL — CP systems
NoSQLScale and flexibilityCassandra (AP), DynamoDB (AP/tunable)
CacheReduce DB load; low-latency readsRedis, Memcached; cache aside, write-through
SearchFull-text, relevance rankingElasticsearch, OpenSearch
Time Series DBMetrics, IoT, monitoring dataInfluxDB, TimescaleDB

Messaging & Event Layer

PatternPurposeTool
Message QueueAsync task processing; buffer spikesApache Kafka, RabbitMQ, SQS
Pub/SubFan-out to multiple consumersPublish-Subscribe Pattern, Kafka topics
Event StreamingDurable, replayable event logApache Kafka, Kinesis

Reliability Patterns

PatternProblem SolvedWiki Concept
Circuit BreakerPrevent cascade failuresCircuit Breaker Pattern
Retry with backoffTransient failuresResiliency Patterns
BulkheadIsolate failures to one componentBulkhead Pattern
Rate LimitingProtect from overloadAPI Gateway feature
IdempotencySafe retriesIdempotency

System Design Interview Framework

A standard approach to system design interviews:

1. Clarify Requirements (5 min)

  • Functional: What does the system do? (URL shortener, Twitter feed, payment system)
  • Non-functional: Scale (users, requests/sec), availability SLA, consistency requirements, latency targets
  • Constraints: Read-heavy or write-heavy? Global or regional?

2. Back-of-Envelope Estimates (5 min)

  • Daily active users × requests/user = total QPS
  • Storage needs per entity × entity count = total storage
  • Derive read:write ratio

3. High-Level Design (10 min)

  • Draw major components: clients, load balancer, application servers, databases, caches, message queues
  • Identify data flows for key use cases

4. Deep Dive (15 min)

  • Database schema design
  • API design (key endpoints)
  • Address bottlenecks (caching strategy, sharding approach)
  • Handle failure scenarios

5. Trade-offs Discussion (5 min)

  • What would you do differently with more time?
  • What are the weakest points?

Common System Design Problems

ProblemKey Concepts
URL ShortenerConsistent hashing, KV store, redirect caching
Twitter Feed (Timeline)Fan-out on write vs. read, Redis cache, Kafka
Rate LimiterToken bucket, sliding window, Redis atomic ops
Distributed CacheConsistent hashing, cache coherence, eviction policies
Notification SystemPub/sub, push/pull, priority queues
Search AutocompleteTrie, top-k prefix search, distributed indexing
Ride-Sharing MatchingGeospatial indexing, real-time updates, WebSocket
Payment SystemACID transactions, idempotency, exactly-once delivery

Learning Resources

  • System Design Primer (donnemartin) — comprehensive GitHub guide
  • System Design 101 (ByteByteGo) — visual explanations
  • Designing Data-Intensive Applications (Kleppmann) — deep theory