Zulip Chat Archive

Stream: Program verification

Topic: Formalizing a theoretical AI safety result

Geoffrey Irving (Nov 27 2023 at 11:02):

https://github.com/google-deepmind/debate formalizes a theoretical result related to AI safety in Lean 4 (the paper is https://arxiv.org/abs/2311.14125). It is a fairly simple theorem, but was a fun exercise to formalize (and was useful in learning the differences between Lean 3 and Lean 4). And now we have the basic setting down in case we want to formalize follow-up results (this paper is just a baby step).

Screenshot-2023-11-27-at-10.58.57am.png

Geoffrey Irving (Nov 27 2023 at 11:05):

In particular, we have a simple monad for expressing stochastic computations relative to multiple oracles:

/-- A stochastic computation that can make oracle queries.
    Importantly, the computation does not know the oracle, so we can model query complexity.
    The `Comp` constructors are not very user friendly due to kernel restrictions on inductive,
    but we replace them with clean ones below. -/
inductive Comp {I : Type} (s : Set I) (α : Type) : Type 1 where
  /-- Return a result with no computation -/
  | pure' : α → Comp s α
  /-- Sample a value with some probability distribution, then continue -/
  | sample' : {β : Type} → Prob β → (β → Comp s α) → Comp s α
  /-- Query an oracle `o ∈ s`, and branch on the result -/
  | query' : (o : I) → o ∈ s → (n : ℕ) → Vector Bool n → Comp s α → Comp s α → Comp s α

The current result uses only one oracle in spirit, but still uses the multiple-oracle functionality to express per-player query complexity. Future results along this research path (if we succeed in getting there) will likely use multiple oracles in more fundamental ways, in particular to express machine heuristics than humans don't have access to.

Geoffrey Irving (Nov 27 2023 at 11:06):

Thank you to @Eric Wieser for his careful review and various helpful suggestions! (But all inelegant proofs are my fault. :))

Geoffrey Irving (Nov 27 2023 at 11:14):

One more note: rather than using PMF I made a custom Prob monad for finitely supported probability distributions: https://github.com/google-deepmind/debate/blob/main/Prob/Defs.lean#L31

/-- Prob α is a finitely supported probability distribution over results α -/
structure Prob (α : Type) where
  /-- Finitely supported probabilities -/
  prob : α →₀ ℝ
  /-- prob is nonnegative -/
  prob_nonneg : ∀ {x}, 0 ≤ prob x
  /-- The total probability is 1 -/
  total : prob.sum (λ _ p ↦ p) = 1

This was in part a learning exercise, but I think it's also useful to avoid integrability side conditions that appear with PMF. You can avoid integrability if summing over ENNReal, but finite support means expectations over the reals are also fine (https://github.com/google-deepmind/debate/blob/main/Prob/Defs.lean#L43):

/-- Integral w.r.t. a distribution -/
@[pp_dot] def Prob.exp (f : Prob α) (g : α → ℝ) : ℝ :=
  f.prob.sum (λ x p ↦ p * g x)

I'm not perfectly confident this is the right design decision, but integrability was annoying enough in the Prob / PMF comparison file that I have some evidence: https://github.com/google-deepmind/debate/blob/main/Prob/Pmf.lean

Geoffrey Irving (Nov 27 2023 at 11:15):

Ah, also: monad syntax is lovely as always. Here's the main debate protocol that the repo formalizes:

/-- One step of the debate protocol -/
def step (alice : Alice) (bob : Bob) (vera : Vera) (y : Vector Bool n) :
    Comp AllIds (State (n+1)) := do
  let p ← (alice _ y).allow_all
  if ←(bob _ y p).allow_all then do  -- Bob accepts Alice's probability, so take the step
    let x ← bernoulli p  -- This is Figure 4, steps 2b,2c,2d, as a fixed part of the protocol
    return .ok (y.cons x)
  else  -- Bob rejects, so we call the verifier and end the computation
    return .error (←(vera _ y p).allow_all)

/-- n steps of the debate protocol -/
def steps (alice : Alice) (bob : Bob) (vera : Vera) : (n : ℕ) → Comp AllIds (State n)
| 0 => pure (.ok Vector.nil)
| n+1 => do match ←steps alice bob vera n with
  | .ok y => step alice bob vera y
  | .error r => return .error r

/-- The full debate protocol that stitches these together -/
def debate (alice : Alice) (bob : Bob) (vera : Vera) (t : ℕ) : Comp AllIds Bool := do
  match ←steps alice bob vera (t+1) with
  | .ok y => return y.head
  | .error r => return r

Shreyas Srinivas (Nov 27 2023 at 12:10):

This is wonderful news.

Shreyas Srinivas (Nov 27 2023 at 12:11):

About the design choices: Have you found it more convenient to avoid the more abstract mathlib definitions at other places too?

Geoffrey Irving (Nov 27 2023 at 12:38):

Nothing that occurs to me offhand. Mathlib is usually pretty good at providing simple variants (e.g., Finset).

Geoffrey Irving (Nov 27 2023 at 13:25):

Prob might make sense as an addition to Mathlib (Finprob or FinPMF or something).

Last updated: May 02 2025 at 03:31 UTC