Building Raft Consensus from Scratch: Leader Election

May 04, 2026

I was curious about distributed consensus, and I figured that the best way to really understand it was to build something myself. I asked myself questions like: Why does each rule exist? What would happen if we took one away? This exploration led me to implement leader election in Go.

The full source code is available on GitHub. I highly recommend cloning it and setting up a 3-node cluster on your local machine as you go through the rest of the article.

The Dilemma: Agreeing Without a Referee

Picture this: you and four friends are trying to decide where to go for dinner in a group chat. Everyone needs to agree on one restaurant, but it’s not as simple as it sounds. Sometimes, phones unexpectedly die or messages take a while to come through. And what about those moments when two friends suggest different places at the same time?

If everyone just shouts out their suggestions whenever they feel like it, a few things can go awry:
1. You end up with multiple choices and no clear front-runner.
2. It takes ages to sort it all out.
3. The group ends up splitting up, and some friends go to Place A while others head to Place B.

That last scenario is a big deal! In the world of distributed systems, it’s known as "split-brain" and it's definitely not fun when two parts of the group make conflicting decisions. It’s all about finding a way to make sure everyone is on the same page.

A leader sidesteps all three problems. One person decides, tells everyone, done.

The clever insight at the heart of Raft is this:

Turn the hard problem (agreeing on every operation) into an easier problem (agreeing on a leader, then trusting the leader).

What If the Leader Dies?

Easy when the leader is alive. But what happens when the “decider” goes silent? Someone has to become the new leader without a higher authority telling them to.

Now we have new problems:

How does anyone notice the leader is gone?
How do we avoid two people declaring “I’m the new leader!” at the same time?
What if the old leader comes back online? What should happen?

Let me work through how I reasoned about each.

Problem 1: Detecting the leader is gone

The leader sends regular “I’m alive” messages(heartbeats). If a follower hasn’t heard one in a while, it assumes the leader is dead and decides it’s time to elect a new one.

That “while” is called the election timeout.

Problem 2: Two candidates declaring at once

Here’s where it gets interesting. If two followers’ timers expire simultaneously, they both become candidates and ask everyone for votes. The cluster splits its votes. Nobody wins. The election fails.

So how do we make this collision unlikely?

Each node picks a random election timeout - say, somewhere between 150ms and 300ms. Node A might wait 173ms, B waits 241ms, C waits 198ms. Whoever’s timer fires first becomes a candidate first, asks for votes first, and usually wins before anyone else even wakes up.

Why a range and not fixed, unique timeouts? Because if two nodes had fixed timeouts and a collision happened anyway (network delay, partition healing), they’d retry with the same fixed timeout and collide again. Forever. This is called a livelock. The system is “alive” but making no progress, which is worse than a crash because crashes are obvious.

Randomness breaks symmetry. There’s no central authority to say “you go first,” so we use randomness to make sure that, statistically, somebody goes first.

Why not 150–155ms instead of 150–300ms? A range that narrow is essentially a fixed timeout. Collisions become likely again. Wider range = more reliable elections, but slower failover when the leader actually dies. That’s the tradeoff.

Problem 3: The returning ghost leader

The old leader comes back online after a network blip. From his perspective, he was just briefly offline, and he might still think he’s in charge and start giving orders. Now we have two leaders.

The solution is a monotonic counter called the term. Every time an election starts, the term goes up: term 1, term 2, term 3...

The rule:

Every message includes the sender’s term. If you ever see a term higher than yours, you immediately step down to Follower and update your term.

So when the ghost leader sends his first stale heartbeat, the response carries a higher term, and he instantly demotes himself.

Problem 4: Double voting

What stops a follower from voting “yes” to multiple candidates in the same election? Nothing unless we add a rule:

Each node casts at most one vote per term, first-come-first-served.

Once a follower votes for A in term 5, it remembers and refuses everyone else until term 6. This prevents the disaster of two candidates both thinking they got a majority.

The Three Roles

Every node is always in exactly one of three states:

Follower : listening for heartbeats, voting when asked
Candidate : running for election
Leader : sending heartbeats, in charge

And the transitions:

Follower  --(timeout, no heartbeat)-->  Candidate
Candidate --(majority votes)-->         Leader
Candidate --(saw higher term)-->        Follower
Leader    --(saw higher term)-->        Follower

A Note on Randomness (Raft vs Web3)

Is this the same kind of randomness used in blockchain systems?

Same philosophy, very different strength. Both answer the question “who goes first when nobody is in charge?” But:

Raft uses cheap randomness (rand.Intn) because nobody is attacking it. It’s just breaking ties between friendly peers.
Web3 uses cryptographic randomness (VRFs, RANDAO) because adversaries actively try to predict and exploit it.

If a Raft random number is predicted, nothing bad happens. If a Web3 one is, money gets stolen.

The Implementation

I built this in Go because Go’s goroutines and channels map naturally to how Raft thinks. Each node is a tiny independent state machine that occasionally talks to others.

For networking, I picked the standard library (net/http with JSON) over a P2P framework like libp2p because it only needs a fixed list of peers and two simple RPCs. Using a heavyweight framework would hide the very mechanics I was trying to learn.

Project structure

raft/
├── main.go     // entry point: parse flags, start a node
├── node.go     // the Node struct & state machine logic
├── rpc.go      // HTTP handlers + RPC client calls
└── types.go    // request/response structs

The two RPCs

Just two messages exist between nodes, each as a request/response pair:

// types.go

type RequestVoteArgs struct {
    Term        int    `json:"term"`
    CandidateID string `json:"candidateId"`
}

type RequestVoteReply struct {
    Term        int  `json:"term"`
    VoteGranted bool `json:"voteGranted"`
}

type AppendEntriesArgs struct {
    Term     int    `json:"term"`
    LeaderID string `json:"leaderId"`
}

type AppendEntriesReply struct {
    Term    int  `json:"term"`
    Success bool `json:"success"`
}

Notice that every message, request, and response carries a term. This is what makes the “higher term wins” rule work in both directions. A response of “no, I won’t vote for you, and by the way my term is 8” instantly tells a stale candidate it’s behind.

AppendEntries : It is named generically because, in a complete Raft implementation, it carries log entries too. For leader-election, it’s just a heartbeat.

The Node struct

All states in one struct, guarded by one mutex. The fields are tightly coupled, you almost always read or write several together, so splitting them across multiple structs would create more problems than it solves.

// node.go

type Role int

const (
    Follower Role = iota
    Candidate
    Leader
)

type Node struct {
    mu sync.Mutex

    // Identity & config (set at startup, never change)
    id    string
    peers []string

    // Persistent state (would survive a restart if we wrote to disk)
    currentTerm int
    votedFor    string

    // Volatile state
    role     Role
    leaderID string

    // Timing
    lastHeartbeat time.Time
}

The main loop

The node runs forever, dispatching to a different function based on its current role:

func (n *Node) Run() {
    for {
        n.mu.Lock()
        role := n.role
        n.mu.Unlock()

        switch role {
        case Follower:
            n.runFollower()
        case Candidate:
            n.runCandidate()
        case Leader:
            n.runLeader()
        }
    }
}

Follower behavior

Sleep until the election timeout. If no heartbeat arrived during the sleep, become a candidate.

func (n *Node) runFollower() {
    timeout := randomElectionTimeout()
    time.Sleep(timeout)

    n.mu.Lock()
    defer n.mu.Unlock()

    if time.Since(n.lastHeartbeat) >= timeout {
        log.Printf("[%s] follower timed out, becoming candidate", n.id)
        n.role = Candidate
    }
}

Candidate behavior: the heart of leader election

A candidate increments the term, votes for itself, then asks every peer for a vote in parallel:

func (n *Node) runCandidate() {
    n.mu.Lock()
    n.currentTerm++
    n.votedFor = n.id
    term := n.currentTerm
    n.lastHeartbeat = time.Now()
    n.mu.Unlock()

    votes := 1 // we voted for ourselves
    var votesMu sync.Mutex
    var wg sync.WaitGroup

    for _, peer := range n.peers {
        wg.Add(1)
        go func(peer string) {
            defer wg.Done()
            args := RequestVoteArgs{Term: term, CandidateID: n.id}
            var reply RequestVoteReply
            if err := sendRequestVote(peer, args, &reply); err != nil {
                return
            }

            n.mu.Lock()
            defer n.mu.Unlock()

            // Higher term in reply? Step down.
            if reply.Term > n.currentTerm {
                n.currentTerm = reply.Term
                n.votedFor = ""
                n.role = Follower
                return
            }

            if reply.Term != term || n.role != Candidate {
                return // stale reply
            }

            if reply.VoteGranted {
                votesMu.Lock()
                votes++
                count := votes
                votesMu.Unlock()

                if count > (len(n.peers)+1)/2 {
                    n.role = Leader
                    n.leaderID = n.id
                }
            }
        }(peer)
    }

    // ... wait for either all replies or election timeout
}

Three rules in this code, each maps to a concept we derived:

Saw a higher term in any reply → step down. That’s the ghost-leader defense.
Stale reply (term doesn’t match) → ignore. Networks are slow; old replies arrive in new elections.
Got majority (len(peers)+1)/2 + 1 → become a leader. A strict majority is what makes Raft safe even with crashes.

Leader behavior

Send heartbeats to all peers periodically, on an interval much shorter than the election timeout:

func (n *Node) runLeader() {
    n.mu.Lock()
    term := n.currentTerm
    n.mu.Unlock()

    for _, peer := range n.peers {
        go func(peer string) {
            args := AppendEntriesArgs{Term: term, LeaderID: n.id}
            var reply AppendEntriesReply
            if err := sendAppendEntries(peer, args, &reply); err != nil {
                return
            }

            n.mu.Lock()
            defer n.mu.Unlock()

            if reply.Term > n.currentTerm {
                n.currentTerm = reply.Term
                n.votedFor = ""
                n.role = Follower
                n.leaderID = ""
            }
        }(peer)
    }

    time.Sleep(heartbeatInterval)
}

The heartbeat interval must be much shorter than the election timeout. Otherwise, followers will time out between heartbeats and start unnecessary elections.

Handling incoming requests

When a vote request arrives:

func (n *Node) HandleRequestVote(args RequestVoteArgs) RequestVoteReply {
    n.mu.Lock()
    defer n.mu.Unlock()

    reply := RequestVoteReply{Term: n.currentTerm, VoteGranted: false}

    if args.Term < n.currentTerm {
        return reply // stale candidate, reject
    }

    if args.Term > n.currentTerm {
        n.currentTerm = args.Term
        n.votedFor = ""
        n.role = Follower
        n.leaderID = ""
        reply.Term = args.Term
    }

    // Vote if we haven't voted, OR we already voted for this same candidate
    if n.votedFor == "" || n.votedFor == args.CandidateID {
        n.votedFor = args.CandidateID
        reply.VoteGranted = true
        n.lastHeartbeat = time.Now()
    }

    return reply
}

Notice the one-vote-per-term rule on the last if: only grant the vote if we haven’t voted yet, or if it’s the same candidate re-asking (which is safe, it’s not a double vote).

When a heartbeat arrives:

func (n *Node) HandleAppendEntries(args AppendEntriesArgs) AppendEntriesReply {
    n.mu.Lock()
    defer n.mu.Unlock()

    reply := AppendEntriesReply{Term: n.currentTerm, Success: false}

    if args.Term < n.currentTerm {
        return reply // stale leader, reject
    }

    if args.Term > n.currentTerm {
        n.currentTerm = args.Term
        n.votedFor = ""
        reply.Term = args.Term
    }
    n.role = Follower
    n.leaderID = args.LeaderID
    n.lastHeartbeat = time.Now()
    reply.Success = true

    return reply
}

A valid heartbeat resets lastHeartbeat, which prevents the follower’s timer from firing.

The HTTP wiring

Bookkeeping. Each node runs an HTTP server with two endpoints:

// rpc.go

func (n *Node) StartServer(port string) error {
    mux := http.NewServeMux()

    mux.HandleFunc("/request-vote", func(w http.ResponseWriter, r *http.Request) {
        var args RequestVoteArgs
        json.NewDecoder(r.Body).Decode(&args)
        json.NewEncoder(w).Encode(n.HandleRequestVote(args))
    })

    mux.HandleFunc("/append-entries", func(w http.ResponseWriter, r *http.Request) {
        var args AppendEntriesArgs
        json.NewDecoder(r.Body).Decode(&args)
        json.NewEncoder(w).Encode(n.HandleAppendEntries(args))
    })

    return http.ListenAndServe(":"+port, mux)
}

A bonus of using HTTP/JSON: you can debug your cluster with curl. Want to manually send a vote request to see how a node responds? You can.

Running It

go mod init raft

# Three terminals:
go run . -id=n1 -port=8001 -peers=localhost:8002,localhost:8003
go run . -id=n2 -port=8002 -peers=localhost:8001,localhost:8003
go run . -id=n3 -port=8003 -peers=localhost:8001,localhost:8002

Within seconds, one node logs WON election for term N and starts heartbeating. Kill that node, and within another few seconds, a new leader emerges in a higher term. Restart the dead one, it joins as a follower in the new term.

What I Built vs What Real Raft Does

What’s here is Layer 1: Leader Election only. Real Raft has more layers:

Log replication : the leader actually accepts commands and replicates them to followers, so they all execute the same ordered sequence of operations.
Persistence : writing currentTerm, votedFor, and the log to disk so a node can survive a crash without violating safety.
Snapshots / log compaction : logs grow forever otherwise.
Cluster membership changes : adding and removing nodes safely.

What I Took Away

The most surprising thing about Raft isn’t how it works but it’s how little there is to it. Three states. Two RPCs. Six rules. And from those six rules, every safety property falls out naturally:

Randomized timeouts prevent livelock
Monotonic terms prevent stale leaders
One vote per term prevents split decisions
Strict majority prevents split brain
“Higher term wins” in every message keeps everyone in sync

Other consensus algorithms often have many complex components. Raft simplifies this by eliminating anything that isn't strictly necessary, resulting in a model that is small enough for one person to fully understand.

I believe this highlights an important lesson in distributed systems: simplicity is not merely about aesthetics. Instead, simplicity is a means to achieve correctness that can be easily verified.

Reference

In Search of an Understandable Consensus Algorithm (Ongaro & Ousterhout) - the original Raft paper
The Raft Visualization - an interactive diagram of the algorithm

Jinali Pabasara

Discussion about this post

Ready for more?