Raft Consensus Algorithm

A consensus algorithm designed for understandability that achieves the same safety guarantees as Paxos by decomposing consensus into three independent sub-problems: leader election, log replication, and safety. Used by etcd, CockroachDB, TiKV, and Consul.

Problem

Distributed systems need to agree on shared state (e.g., a key-value store’s contents, a database transaction log) even when some nodes fail. The classic algorithm, Paxos, is famously hard to understand and implement correctly — leading to implementation bugs and subtle consistency violations in production systems. How can you build a consensus algorithm that is both provably correct and implementable by ordinary engineers?

Solution / Explanation

Diego Ongaro and John Ousterhout designed Raft (2014) specifically for understandability. Raft decomposes consensus into three clearly separated sub-problems:

1. Leader Election

Every node starts as a follower. If a follower receives no heartbeat from a leader within a timeout period, it becomes a candidate and requests votes from other nodes.

Election rules:

Each node votes for at most one candidate per term (monotonically increasing ID)
A candidate only wins if it has the most up-to-date log (prevents electing a node missing committed entries)
Winning candidate becomes leader and starts sending heartbeats

Split vote: If no candidate wins majority, a new election term starts after a random timeout (prevents perpetual split votes).

Follower ──timeout──► Candidate ──majority votes──► Leader
   ▲                       │                          │
   └────────────────────────┘◄─────── heartbeat ──────┘
         discover leader

2. Log Replication

All writes go through the leader. The leader appends the entry to its log and sends AppendEntries RPCs to all followers.

Commit protocol:

Leader appends entry with current term to its log
Sends AppendEntries to all followers (contains log entry + preceding entry reference)
Waits for majority of nodes (including itself) to acknowledge
Once majority ACK → leader marks entry committed
Leader applies entry to state machine and responds to client
Next heartbeat/AppendEntries tells followers about the commit; they apply it too

3. Safety Guarantees

Raft ensures:

Election Safety: At most one leader per term
Log Matching: If two logs have identical entries at an index and term, all preceding entries are identical
Leader Completeness: A leader has all committed entries from previous terms
State Machine Safety: Once a node applies an entry at index i, no other node will apply a different entry at index i

Cluster Membership Changes

Changing the cluster topology (adding/removing nodes) uses a two-phase joint consensus protocol to avoid split-brain during the transition.

Real-World Implementations

System	Use of Raft
etcd	All key-value operations; Kubernetes cluster state
CockroachDB	Per-range consensus for distributed SQL
TiKV	Distributed key-value store (TiDB’s storage layer)
Consul	Cluster membership and KV store
RethinkDB	Table shards
InfluxDB IOx	Write-ahead log replication

Raft vs Paxos

Aspect	Raft	Paxos
Understandability	Designed for it	Notoriously difficult
Safety guarantees	Equivalent	Equivalent
Leader	Always has all committed logs	Leader may need to catch up
Log structure	Explicit log as first-class concept	Log is implicit
Membership changes	Joint consensus	Complex multi-round protocol
Industrial adoption	etcd, CockroachDB, Consul	Chubby, Zookeeper (ZAB is Paxos-like)

Trade-offs

Benefit	Cost
Understandable; fewer implementation bugs	Requires stable leader; leader is a bottleneck
Strong consistency guarantees	Quorum required; minority partition becomes unavailable for writes
Clean leader/follower mental model	Leader election causes brief unavailability
Handles arbitrary node failures	Not designed for Byzantine failures

etcd — canonical Raft implementation
CAP Theorem — Raft systems are CP
PACELC Theorem — Raft systems are PC/EC
Distributed Transactions — Raft enables distributed consensus for transactions
Eventual Consistency — contrast: Raft provides linearizability, not eventual consistency

software-architecture-design

Explorer

Raft Consensus Algorithm

Raft Consensus Algorithm

Problem

Solution / Explanation

1. Leader Election

2. Log Replication

3. Safety Guarantees

Cluster Membership Changes

Real-World Implementations

Raft vs Paxos

Trade-offs

Graph View

Table of Contents

Backlinks

software-architecture-design

Explorer

Raft Consensus Algorithm

Raft Consensus Algorithm

Problem

Solution / Explanation

1. Leader Election

2. Log Replication

3. Safety Guarantees

Cluster Membership Changes

Real-World Implementations

Raft vs Paxos

Trade-offs

Related

Graph View

Table of Contents

Backlinks