← LOGBOOK LOG-398
EXPLORING · SOFTWARE ·
FORMAL-VERIFICATIONTLA+COQLEANALLOYCORRECTNESSPROOF

Verification Programming Languages

Exploring formal verification languages — TLA+, Coq, Lean, Alloy — and what it means to prove that software is correct.

What is Verification?

Normal testing asks: does this work for the inputs I tried? Verification asks: does this work for all possible inputs, forever? One is sampling. The other is proof.

Formal verification treats software as a mathematical object and uses logic to prove properties about it — no bugs for any input, no race conditions under any schedule, no state the system can enter that violates your invariants.

The Landscape

Verification languages split roughly into two camps:

Specification languages — describe what the system should do, then check if any execution violates it (model checking).

Proof assistants — describe how something is true, step by step, using logic; the computer verifies the proof is valid.


TLA+ — Leslie Lamport’s Language

What it is: A specification language for concurrent and distributed systems. You describe states and transitions; TLC (the model checker) exhaustively explores all reachable states.

Core ideas:

  • A system is a set of states and a transition relation — just like a state machine
  • Temporal Logic of Actions: you write formulas that must hold across all time steps, not just at one moment
  • []P means P holds in every state (always). <>P means P eventually holds.

Typical use: Specifying distributed protocols — consensus algorithms, database transactions, lock-free data structures. Amazon AWS uses TLA+ to verify S3, DynamoDB, and other services.

VARIABLES pc, x

Init == pc = "start" /\ x = 0

Next == \/ (pc = "start" /\ x' = x + 1 /\ pc' = "done")
        \/ (pc = "done"  /\ UNCHANGED <<pc, x>>)

Invariant == x <= 1

What it catches: Deadlocks, safety violations, liveness failures. Not line-by-line bugs — systemic design errors.

Limits: You write a model of the system, not the system itself. The code can still diverge from the spec. Gap between spec and implementation.


Alloy — Relations All the Way Down

What it is: Daniel Jackson’s relational modeling language. Everything is a set or relation. Alloy Analyzer auto-generates counterexamples to your assertions.

sig Node {
  next: lone Node
}

fact NoSelfLoop { no n: Node | n in n.^next }

assert Acyclic { no n: Node | n in n.^next }
check Acyclic for 5

Push the button. Alloy either says “no counterexample found up to scope 5” or shows you a concrete violated case as a diagram.

Strength: Instantly visual. Great for data models, access control policies, protocols with complex state. Jackson’s Software Abstractions is the canonical reference.

Limit: Bounded model checking — it only checks up to a finite scope. Misses counterexamples larger than your bound.


Coq — Dependent Types as Logic

What it is: A proof assistant based on the Calculus of Constructions. Programs and proofs are the same thing (Curry-Howard correspondence). You write a theorem; Coq checks that your proof term is type-correct.

Theorem add_comm : forall n m : nat, n + m = m + n.
Proof.
  intros n m. induction n as [| n' IH].
  - simpl. rewrite <- plus_n_O. reflexivity.
  - simpl. rewrite -> IH. rewrite <- plus_n_Sm. reflexivity.
Qed.

Major achievements:

  • CompCert — a formally verified C compiler. Every optimization proven to preserve semantics.
  • Four Color Theorem proof — checked in Coq.
  • Feit-Thompson theorem — 170,000 lines of Coq proof.

The learning curve: Steep. You’re doing interactive theorem proving. Tactics (intro, induction, rewrite, reflexivity) guide proof search. Getting comfortable takes weeks.


Lean 4 — The New Wave

What it is: A proof assistant and functional programming language developed at Microsoft Research, now the preferred tool for the mathematical formalization community (Mathlib).

Why Lean 4 is interesting now:

  • Mathlib has 100,000+ theorems — the largest single library of formalized math in history
  • Lean 4 is also a practical general-purpose language (compiled, fast)
  • AI-assisted proving is exploding here — models that generate tactic proofs
theorem add_comm (n m : Nat) : n + m = m + n := by
  induction n with
  | zero => simp
  | succ n ih => simp [Nat.succ_add, ih]

Why it matters beyond math: The Lean toolchain can extract verified Lean code to run. The gap between proof and implementation shrinks.


Dafny — Verification-Aware Programming

What it is: A language from Microsoft Research where you write normal-looking code, annotate it with preconditions, postconditions, and loop invariants, and the verifier (backed by Z3, an SMT solver) proves them automatically.

method BinarySearch(a: array<int>, key: int) returns (index: int)
  requires forall i, j :: 0 <= i < j < a.Length ==> a[i] <= a[j]
  ensures 0 <= index < a.Length ==> a[index] == key
  ensures index < 0 ==> key !in a[..]
{
  var lo, hi := 0, a.Length;
  while lo < hi
    invariant 0 <= lo <= hi <= a.Length
    invariant key !in a[..lo] && key !in a[hi..]
  { ... }
}

Why it’s approachable: Annotations look like documentation you’d write anyway. The verifier does the heavy lifting. You don’t write proof terms manually.


The Spectrum

ToolStyleAutomationBest For
TLA+Temporal logic specModel checkerDistributed protocols, system design
AlloyRelational modelsSAT-based counterexampleData models, access policies
DafnyAnnotated codeSMT solver (Z3)Algorithms with correctness specs
CoqDependent typesManual proof tacticsMath proofs, compiler verification
Lean 4Dependent typesManual + AI-assistedModern math formalization

Why This Matters Now

Two forces are converging:

  1. Distributed systems are everywhere — every SaaS product is now a distributed system. Race conditions and protocol bugs cost real money and user trust. TLA+ is cheap compared to a data loss incident.

  2. LLMs writing code — if machines generate code, the only way to trust it is to verify it, not review it. Formal methods suddenly become a multiplier on AI code generation, not just an academic exercise.

The question isn’t whether formal verification is worth it. It’s whether the tooling is approachable enough that engineers outside research labs will use it. Dafny and Lean 4 are the bets to watch.