A quorum-replicated, deterministic KVP database designed for control-plane workloads and predictable behavior under failure.
1. Project Overview
LankaDB is a distributed key-value database built for control-plane workloads, where correctness, predictability, and operational clarity matter more than raw feature surface.
It is designed for systems that require:
- strong consistency (linearizable writes)
- deterministic replication across nodes
- bounded resource usage under load
- clear failure modes and recovery paths
- minimal operational complexity
LankaDB is deployed as a single binary per node and forms a cluster using a quorum-based consensus protocol (Raft-style).
The system is intentionally narrow in scope:
- not a general-purpose database
- not an analytics engine
- not a document store
It is a replicated coordination and state system, similar in role to etcd or Chubby.
2. Design Principles
- Deterministic state machine replication as the source of truth
- Leader-only writes with quorum commit guarantees
- Append-only log + snapshot as the durability model
- Single-writer, ordered application of state transitions
- Explicit backpressure over unbounded buffering
- Operational simplicity over feature richness
- Predictable failure modes over best-effort behavior
3. System Architecture
LankaDB is a modular monolith:
- single process per node
- multiple async tasks (actors)
- no shared mutable state in the consensus core
3.1 Core Components
-
Consensus Core (Node Actor)
- owns term, role, commit index, log coordination
- processes all consensus events serially
- enforces ordering and safety invariants
-
Replication Engine
- manages follower state (
next_index,match_index) - handles APPEND and snapshot installation
- implements Raft backtracking with conflict hints
- manages follower state (
-
WAL Storage
- append-only segmented log
- CRC-protected records
- crash-safe recovery via forward scan
-
State Machine (Apply Loop)
- applies committed entries in order
- maintains in-memory KV state
- emits watch events
-
Transport Layer
- persistent TCP connections between peers
- multiplexed message channels
- priority-aware write queues
-
Client API Layer
- GET / PUT / DEL / TXN / RANGE / WATCH
- leader redirection semantics
- idempotency support (in-memory v1)
-
Snapshot & Compaction
- periodic state snapshotting
- WAL truncation for bounded disk usage
4. Replication Model
4.1 Consensus
LankaDB uses a Raft-style protocol:
- Leader election via randomized timeouts
- Log replication via APPEND entries
- Commit requires quorum acknowledgment
- Apply occurs after commit
4.2 Write Flow
- Client sends write to leader
- Leader appends entry to WAL
- Entry replicated to followers
- On quorum ACK + durable write:
- commit_index advances
- entry applied
- Response returned to client
4.3 Read Model
- Reads are served by leader only (v1)
- Reads observe committed state
- RANGE supports revision-based consistency
5. Storage Model
5.1 Write-Ahead Log (WAL)
- segmented, append-only files
- each record includes:
- length prefix
- CRC checksum
- recovery scans until last valid record
5.2 Snapshots
- full state materialization
- include:
- KV map
- lease table
- global revision
- used for:
- compaction
- follower catch-up
5.3 Compaction
- WAL segments ≤ snapshot index are deleted
- tombstones retained until snapshot boundary
6. Lease System
LankaDB includes a deterministic lease engine:
- leases are granted with TTL
- keys can be attached to leases
- expiration is leader-driven via log entries
Key property
Expiry is not local. It is replicated.
The leader emits LEASE_EXPIRE entries:
- ensures identical behavior across all nodes
- avoids clock drift inconsistencies
7. Watch System
7.1 Semantics
- prefix-based subscription
- ordered event stream by revision
7.2 Backpressure Policy
- bounded per-watcher queues
- if exceeded:
- watcher is closed with
BEHIND - client must resync via RANGE
- watcher is closed with
This avoids:
- unbounded memory growth
- hidden data loss
8. Network & Protocol
8.1 Transport
- persistent TCP connections
- one connection per peer
- optional TLS/mTLS
8.2 Framing
- fixed 64-byte header
- little-endian encoding
- payload opaque for replication
8.3 Channels
- CONTROL: elections, heartbeats
- REPL: log replication
- CLIENT: requests/responses
- WATCH: event streams
- SNAP: snapshot transfer
8.4 Priority Model
Lower numeric value = higher priority:
- 0: control (heartbeat, votes)
- 1: client writes
- 2: replication
- 3: background (watch, snapshot)
9. Failure & Recovery Model
9.1 Node Failure
- leader failure → election
- follower failure → catch-up via WAL or snapshot
9.2 Network Partition
- minority partition cannot commit
- majority partition continues
9.3 Disk Failure
- WAL corruption detected via CRC
- recovery truncates to last valid record
9.4 Shutdown (SIGTERM)
- stop accepting new writes
- flush WAL
- apply committed entries
- step down if leader
- exit within bounded time
10. Observability
Metrics (v1)
- current term
- commit index
- last applied index
- global revision
- peer replication lag
- election count
- leader changes
- WAL fsync latency
- request counts by type/status
Ops Endpoints
/healthz/readyz/metrics/status
11. Safety Properties
- No committed write is lost
- Leader-only mutation
- Log matching property enforced
- Monotonic revision ordering
- Deterministic state application
12. Non-Goals
- multi-region quorum optimization (v1)
- follower reads (v1)
- dynamic cluster membership (v1)
- rich query language
- secondary indexes
13. Deployment Model
- one container per node
- persistent volume for storage
- static cluster membership (v1)
- headless service for peer discovery
Works with:
- Kubernetes
- bare metal clusters
- homelab environments
14. What Comes Next (v2)
- persisted idempotency cache
- dynamic membership changes
- follower reads with leases
- improved snapshot streaming (zero-copy)
- optional TLS reload without restart
- performance tuning (io_uring, batching strategies)
Closing Note
LankaDB is intentionally constrained.
The goal is not to build the most feature-rich database, but to build a system where:
- behavior is predictable
- failures are explainable
- recovery is deterministic
Most complexity exists to ensure that under stress or failure, the system behaves in a way operators can reason about.