Zulip Chat Archive

Stream: job postings

Topic: Quick job writing cryptography equivalence proofs

Daniel Windham (Jan 09 2025 at 21:10):

Position filled - thanks @Hanting Zhang!

Hi folks,

We are looking for someone to write equivalence proofs of simple cryptography code in the next couple weeks as part of a research project on spec formalization and review. There is also some debugging of machine-generated Lean code that we expect to be pretty quick for someone experienced with Lean (e.g. debug typeclass and typechecker perf issues).

We have reference implementations and machine-generated target implementations of a few cryptography algorithms (SHA-1, AES). They are all in Lean. The target implementations are from Rust reference implementations that have been transpiled to Lean using the Aeneas toolchain, and currently overflow the typechecker. We'd like to resolve the issues with the generated code and prove equivalence between the two implementations.

You can see the implementations here: https://github.com/atlas-computing-org/verified-fips-in-rust. Here is an example sorry to prove: https://github.com/atlas-computing-org/verified-fips-in-rust/blob/main/lean/VerifiedFipsCryptography/Equivalence/AES.lean#L18.

If you would like to run this experiment for us, please reply here or DM me. Let us know how much time you'd expect this to take and your hourly rate so we can compensate you fairly.

If you're interested in work that goes beyond this exercise, there's lots more in this vein that we could follow up with.

Thank you!

Kim Morrison (Jan 09 2025 at 22:33):

Could you point to a sorry that is indicative of the work you want done?

Daniel Windham (Jan 09 2025 at 23:30):

Good point, thanks Kim. I can't get to this immediately but will post a sorry soon.

François G. Dorais (Jan 09 2025 at 23:50):

At a glance, the transpiled Rust code doesn't work... Is fixing that part of the work?

Daniel Windham (Jan 10 2025 at 01:26):

Thanks @François G. Dorais, I've updated the original post to clarify that. Yes, fixing this is part of the work, though we expect that's low lift. I've also linked to an example sorry.

François G. Dorais (Jan 10 2025 at 01:32):

Actually, the issue you're running into is perhaps not low lift at all. If my initial assessment of the issue is correct, the fix probably requires changes to the Lean compiler. This is not only heavy lifting but a no-go since the Lean compiler is frozen until work on the new compiler resumes (perhaps this year).

François G. Dorais (Jan 10 2025 at 01:34):

PS: Feel free to pressure the FRO to resume compiler work as soon as possible!

Daniel Windham (Jan 10 2025 at 01:37):

Francois why is that?

Making modest changes to the generated code is totally fine. We claim the code is equivalent to hand-written Lean code that we've tested, so with sufficient changes to the generated code it would work with the current compiler.

(Not that I'm against requesting compiler improvements if I have a good motivation :wink: )

François G. Dorais (Jan 10 2025 at 01:47):

The main issue is that the current Lean compiler has a difficult time processing constants that are large arrays of non-scalar values. There are workarounds but that is a separate job, I think.

Kim Morrison (Jan 10 2025 at 02:45):

François G. Dorais said:

until work on the new compiler resumes

Just noting this has already happened.

Daniel Windham (Jan 10 2025 at 05:49):

Dumb question, but is the U8 type any more non-scalar than UInt8? To me, they both look non-scalar, but similarly optimize-able to scalar.

A length-256 Array UInt8 works fine in our reference implementation (line 35), while the length-256 Primitives.Array U8 256#usize in the generated implementation (line 62) chokes. The generated code defines its own U8 and Array types as shown below.

inductive ScalarTy :=
| U8
| ...

@[reducible] def U8 := Scalar .U8

structure Scalar (ty : ScalarTy) where
  val : Int
  hmin : Scalar.min ty ≤ val
  hmax : val ≤ Scalar.max ty
deriving Repr, BEq, DecidableEq

def Array (α : Type u) (n : Usize) := { l : List α // l.length = n.val }

def algorithms.aes.SBOX_body : Result (Array U8 256#usize) :=
  Result.ok
  (Array.make 256#usize [
    99#u8, ...
   ])

François G. Dorais (Jan 10 2025 at 07:27):

Both UInt8 is a scalar type but I'm not sure about U8. (This brief translation to C guide helps.) Regardless, the custom Array type though is not the Lean Array type so it does not benefit from custom hacks that make Lean's Array type tolerate larger arrays. Using Lean's new Vector type might help.

François G. Dorais (Jan 10 2025 at 07:30):

Regardless, these weird hacks should be fixed when the new compiler rolls out :fingers_crossed:

Hanting Zhang (Jan 10 2025 at 07:35):

Cross posted from DM's

I've written up a small work-around:

import Base
open Primitives

lemma U8min {x : UInt8} : Scalar.min ScalarTy.U8 ≤ x.val := by simp [Scalar.min]
lemma U8max {x : UInt8} : x.val ≤ Scalar.max ScalarTy.U8 := by
  simp only [UInt8.val_val_eq_toNat, Scalar.max, U8.max, Nat.cast_le_ofNat]
  exact Nat.le_of_lt_succ x.toNat_lt

def toU8 (x : UInt8) : U8 := ⟨x.val.val, U8min, U8max⟩

def sBox : List UInt8 := [
  0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
  0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
  0xB7, 0xFD, 0x93, 0x26, 0x36, 0x3F, 0xF7, 0xCC, 0x34, 0xA5, 0xE5, 0xF1, 0x71, 0xD8, 0x31, 0x15,
  0x04, 0xC7, 0x23, 0xC3, 0x18, 0x96, 0x05, 0x9A, 0x07, 0x12, 0x80, 0xE2, 0xEB, 0x27, 0xB2, 0x75,
  0x09, 0x83, 0x2C, 0x1A, 0x1B, 0x6E, 0x5A, 0xA0, 0x52, 0x3B, 0xD6, 0xB3, 0x29, 0xE3, 0x2F, 0x84,
  0x53, 0xD1, 0x00, 0xED, 0x20, 0xFC, 0xB1, 0x5B, 0x6A, 0xCB, 0xBE, 0x39, 0x4A, 0x4C, 0x58, 0xCF,
  0xD0, 0xEF, 0xAA, 0xFB, 0x43, 0x4D, 0x33, 0x85, 0x45, 0xF9, 0x02, 0x7F, 0x50, 0x3C, 0x9F, 0xA8,
  0x51, 0xA3, 0x40, 0x8F, 0x92, 0x9D, 0x38, 0xF5, 0xBC, 0xB6, 0xDA, 0x21, 0x10, 0xFF, 0xF3, 0xD2,
  0xCD, 0x0C, 0x13, 0xEC, 0x5F, 0x97, 0x44, 0x17, 0xC4, 0xA7, 0x7E, 0x3D, 0x64, 0x5D, 0x19, 0x73,
  0x60, 0x81, 0x4F, 0xDC, 0x22, 0x2A, 0x90, 0x88, 0x46, 0xEE, 0xB8, 0x14, 0xDE, 0x5E, 0x0B, 0xDB,
  0xE0, 0x32, 0x3A, 0x0A, 0x49, 0x06, 0x24, 0x5C, 0xC2, 0xD3, 0xAC, 0x62, 0x91, 0x95, 0xE4, 0x79,
  0xE7, 0xC8, 0x37, 0x6D, 0x8D, 0xD5, 0x4E, 0xA9, 0x6C, 0x56, 0xF4, 0xEA, 0x65, 0x7A, 0xAE, 0x08,
  0xBA, 0x78, 0x25, 0x2E, 0x1C, 0xA6, 0xB4, 0xC6, 0xE8, 0xDD, 0x74, 0x1F, 0x4B, 0xBD, 0x8B, 0x8A,
  0x70, 0x3E, 0xB5, 0x66, 0x48, 0x03, 0xF6, 0x0E, 0x61, 0x35, 0x57, 0xB9, 0x86, 0xC1, 0x1D, 0x9E,
  0xE1, 0xF8, 0x98, 0x11, 0x69, 0xD9, 0x8E, 0x94, 0x9B, 0x1E, 0x87, 0xE9, 0xCE, 0x55, 0x28, 0xDF,
  0x8C, 0xA1, 0x89, 0x0D, 0xBF, 0xE6, 0x42, 0x68, 0x41, 0x99, 0x2D, 0x0F, 0xB0, 0x54, 0xBB, 0x16
]

-- These are instant.
example : sBox.length = 256 := by rfl
def sBoxArrayU8 : Primitives.Array U8 256#usize := ⟨sBox.map toU8, by rfl⟩

The issue is that Aeneas implements U8 literals via the #u8 macro. This expands into

macro:max x:term:max noWs "#u8"    : term => `(U8.ofInt $x (by first | decide | scalar_tac))

for every instance. So the generated code is elaborating hundreds (!!) of by first | decide | scalar_tac statements. In the above approach, Lean has native support for UInt8, and in the example : sBoxU8.length = 256 := by rfl snippet, I guess rfl is smart enough to figure out that mapping doesn't change the length. (Actually, I'm decently sure rfl would still be unfolding everything here, but the difference is that toU8 avoids macroexpanding #u8 notation, saving the elaboration overhead.)

François G. Dorais (Jan 10 2025 at 07:46):

Yes, the rfl portion reminds me of this closed issue lean4#5502

Bas Spitters (Jan 10 2025 at 08:21):

@Daniel Windham Great! This is the same methodology we've been following with hacspec/hax/libcrux (see the papers here: https://hacspec.org/, we have another submitted paper to be released soon). Hax and aenaes share quite a bit of code. Making a Lean backend for Hax is feasible, and might give you simpler code then aeneas.
E.g. in previous work, we connected hacspec rust with fiat-crypto (I see @Jason Gross is also involved in your project).

Daniel Windham (Jan 10 2025 at 15:59):

Nice fix @Hanting Zhang ! I've pushed your update and the compiler no longer hangs on the generated code. There are about a half-dozen errors showing in the rest of the generated code (not counting duplicates). Most if not all of them are local issues, but I haven't explored much.

Daniel Windham (Jan 10 2025 at 15:59):

For the job posting: It's clear there are two sequential jobs: (1) get the Aeneas-generated code working, making changes as needed, and (2) complete the equivalence proofs. A couple folks have reached out who I'm following up with, but I'm still taking new expressions of interest for now. Let me know if you're interested in either/both.

Daniel Windham (Jan 10 2025 at 16:04):

And thanks @Bas Spitters! It's great to learn from what you're doing here. You posted the wrong link to your papers though :wink: .

How much work would it be to add a Lean backend to Hax? Know anyone who wants to do it? The research my team is working on is focused on spec formalization and review, and we're interested in comparing different formalization backends - if there were a backend with Hax we'd try it out.

Bas Spitters (Jan 10 2025 at 21:23):

@Daniel Windham Thanks. I've fixed the link.

Daniel Windham (Jan 10 2025 at 22:52):

Position filled - thanks @Hanting Zhang!

Last updated: May 02 2025 at 03:31 UTC