Zulip Chat Archive
Stream: general
Topic: SHA-3 implementation in pure Lean
Carbon (Oct 21 2024 at 23:04):
I experimented with implementing SHA-3 in pure Lean. This is my first exposure to pure functional programming, and dependent types. The library provides one-shot, and streaming APIs for hash and XOF functions, and uses dependent typing, type classes, and macros.
https://github.com/gdncc/Cryptography/tree/main
I had a blast. I will probably try to validate a few more properties, and to improve performance. I am hoping it will be useful educational resource for how to implement low(-ish) level primitives in Lean. This was certainly a rewarding exercise for me.
Shreyas Srinivas (Oct 21 2024 at 23:05):
https://github.com/gdncc/Cryptography/blob/main/Cryptography%2FHashes%2FSHA3%2FBasic.lean#L50
Here you can use the vector implementation of Batteries.Vector
instead of creating your own statically sized array
Carbon (Oct 21 2024 at 23:11):
Yep! @Chris Bailey mentioned this to me last Thursday at our San Francisco Lean Meetup. I will consider using this in the next iteration of the implementation.
Eric Wieser (Oct 21 2024 at 23:16):
I'm puzzled by the stable
in the lean-toolchain file. What version of lean does this mean?
Eric Wieser (Oct 21 2024 at 23:17):
Either this is some old version of lean, which is fine but you may as well update; or more worryingly it tracks some "latest" version of lean, in which case your project may stop building at random points in future.
Carbon (Oct 21 2024 at 23:29):
I assume this means the latest release e.g. 4.12. I did have to fix a few things for the past 2 releases. Breaking changes included changes to Parsec
, and affected the NIST test cases.
Eric Wieser (Oct 21 2024 at 23:34):
If you write leanprover/lean4:v4.12.0
in that file, then the code will work forever, and you can manually change it to the most recent release each time you are ready to fix things
Eric Wieser (Oct 21 2024 at 23:36):
Perhaps @Mac Malone can comment on whether putting stable
in the lean-toolchain
is supposed to be legal in the first place.
Mac Malone (Oct 21 2024 at 23:38):
It is not really meant to be legal. Future versions of elan
will try to prevent such occurrences.
Shreyas Srinivas (Oct 21 2024 at 23:55):
Eric Wieser said:
I'm puzzled by the
stable
in the lean-toolchain file. What version of lean does this mean?
stable is the default toolchain when lake generates a non-math project
Shreyas Srinivas (Oct 21 2024 at 23:56):
And I tested this very recently, so this is not an issue with only old lean versions.
Shreyas Srinivas (Oct 21 2024 at 23:57):
for non-math projects, lake is very barebones at the moment
Shreyas Srinivas (Oct 22 2024 at 00:01):
Actually it isn't a bad idea to maintain the stable tag. I use it whenever I want to jump from stable toolchain to stable toolchain. I recall explicitly asking for this last year and Mac suggested this idea.
Shreyas Srinivas (Oct 22 2024 at 00:02):
@Carbon there were some breaking changes in the Array and List API from 4.12 to 4.13-rc versions. So when updating to 4.13, there will definitely be build breakages
Arthur Paulino (Oct 22 2024 at 00:03):
This is exciting work! I'm eager to see more of this.
Carbon (Nov 07 2024 at 16:09):
Shreyas Srinivas said:
Carbon there were some breaking changes in the Array and List API from 4.12 to 4.13-rc versions. So when updating to 4.13, there will definitely be build breakages
I updated to 4.13, and I did not experience any issues. Were these breaking Array changes implemented, or should we expect them to materialize later?
Kim Morrison (Nov 07 2024 at 21:19):
99% of changes in List and Array come with deprecation warnings, so should be pretty easy to adapt to. Please ping me if you encounter places where the upgrade path for List/Array is painful!
Wrenna Robson (Nov 13 2024 at 13:39):
@Shreyas Srinivas out of interested, why are all your definitions marked "private"?
Shreyas Srinivas (Nov 13 2024 at 13:41):
What definitions? I am not the author of this project
Wrenna Robson (Nov 13 2024 at 13:41):
Oh sorry!
Wrenna Robson (Nov 13 2024 at 13:41):
Misread the above :)
Wrenna Robson (Nov 13 2024 at 13:41):
@Carbon, sorry
Carbon (Nov 13 2024 at 16:16):
To ensure library users do not call the wrong functions.
Eric Wieser (Nov 13 2024 at 17:50):
As far as I can tell, the result is that your public function has a private type
Carbon (Nov 13 2024 at 17:58):
Not all functions are actually private
. I have ~private~ functions but also public (no private
modifier) functions (the public functions are actually implemented via macros). Upon import, I can only call the public functions, which is the desired effect. Am I missing something?
Eric Wieser (Nov 13 2024 at 18:08):
My claim is that you provide a public SHA3_224.k
definition, but the type of this definition is private
Eric Wieser (Nov 13 2024 at 18:10):
Separately, the definition
/-- Sponge function, defined by its capacity, padding, and output bit length -/
private inductive HashFunction {α : Type u} {β : Type v} {γ : Type w} : α → β → γ → Type (max u v w) where
| f (capacity : α) (paddingDelimiter : β) (outputBitsLen : γ)
(property :
(capacity = c)
∧ (paddingDelimiter = p)
∧ (outputBitsLen = o)): HashFunction c p o
seems quite strange to me; as far as I can tell, this is isomorphic to Unit
!
Mario Carneiro (Nov 14 2024 at 08:42):
Eric Wieser said:
My claim is that you provide a public
SHA3_224.k
definition, but the type of this definition is private
this should be a linter warning
Carbon (Nov 14 2024 at 09:44):
Eric Wieser said:
My claim is that you provide a public
SHA3_224.k
definition, but the type of this definition is private
Are you referring to SHA3_224.mk
? Do you mean that one can call the public SHA3_224.mk
, and access components of the returned value that are meant to be private? Is there a way around that?
Carbon (Nov 14 2024 at 09:55):
Eric Wieser said:
Separately, the definition
/-- Sponge function, defined by its capacity, padding, and output bit length -/ private inductive HashFunction {α : Type u} {β : Type v} {γ : Type w} : α → β → γ → Type (max u v w) where | f (capacity : α) (paddingDelimiter : β) (outputBitsLen : γ) (property : (capacity = c) ∧ (paddingDelimiter = p) ∧ (outputBitsLen = o)): HashFunction c p o
seems quite strange to me; as far as I can tell, this is isomorphic to
Unit
!
What's the significance of this? The idea was to be able to create distinct types such that HashFunction 1 2 3
is different than HashFunction 1 1 1
for instance. This problem and solution was suggested in https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/Deriving.20dependent.20type.20from.20inductive.20type.20constructor/near/439302814
And this is how it is instantiated in the code:
private def mkHashFunction (c : Capacity) (p o : Nat) :=
HashFunction.f c p o (by
apply And.intro
case left => rfl
case right =>
apply And.intro
case left => rfl
case right => rfl
)
-- (...)
#check mkHashFunction 56 0x06 28 -- mkHashFunction 56 6 28 : HashFunction 56 6 28
Mario Carneiro (Nov 14 2024 at 10:28):
Carbon said:
Eric Wieser said:
My claim is that you provide a public
SHA3_224.k
definition, but the type of this definition is privateAre you referring to
SHA3_224.mk
? Do you mean that one can call the publicSHA3_224.mk
, and access components of the returned value that are meant to be private? Is there a way around that?
No, the problem is that you can't even call the function without producing a value whose type you are not allowed to refer to
Mario Carneiro (Nov 14 2024 at 10:30):
Is this function supposed to be called outside the module?
Mario Carneiro (Nov 14 2024 at 10:31):
if not, you should just make SHA3_224.mk
private
Eric Wieser (Nov 14 2024 at 11:35):
Carbon said:
What's the significance of this? The idea was to be able to create distinct types such that
HashFunction 1 2 3
is different thanHashFunction 1 1 1
for instance. This problem and solution was suggested in https://leanprover.zulipchat.com/#narrow/channel/270676-lean4/topic/Deriving.20dependent.20type.20from.20inductive.20type.20constructor/near/439302814
I think my claim is that this is an #xy problem, and actually you would be fine with
structure HashFunction where
capacity : Capacity
p : Nat
o : Nat
Carbon (Nov 14 2024 at 16:09):
Mario Carneiro said:
Is this function supposed to be called outside the module?
Yes. In file example.lean, I show examples how to use the lean API. I imported import «Cryptography».Hashes.SHA3.Basic
to that effect. it does call mk()
successfully, but you make me wonder if I can call mk()
from completely outside Cryptography
. I might have misunderstood scoping .
Carbon (Nov 14 2024 at 16:18):
Eric Wieser said:
I think my claim is that this is an #xy problem, and actually you would be fine with
structure HashFunction where capacity : Capacity p : Nat o : Nat
I see. I had an issue in my initial implementation where one could do something that I wanted to avoid - see example.lean. Basically the returned hash function state was generic enough that I could pass it between different hash functions, which is not ideal.
If I used a structure as above (which was more or less what I was doing), I assumed would have add to check the values at runtime? Whereas now, each state is completely dependent on c, p, o, which is different for each hash functions. Maybe there is a better way to do it.
Eric Wieser (Nov 14 2024 at 16:21):
Carbon said:
Basically the returned hash function state was generic enough that I could pass it between different hash functions, which is not ideal.
What type is the returned hash function state?
Eric Wieser (Nov 14 2024 at 16:23):
Your existing code had this very strange property where your functions accepted c
, p
, o
, and an f : HashFunction c p o
that contained no new information.
Carbon (Nov 14 2024 at 16:38):
Eric Wieser said:
What type is the returned hash function state?
So the state can be either AbsorbingKeccakC α n
, or SqueezingKeccakC α n
(this models the two phases of SHA-3 hash and XOF functions. For example function absorb()
takes an absorbing state, and return sonly an absorbing state:
def absorb {α : Type} {n : Capacity} (k : AbsorbingKeccakC α n) (inputBytes : ByteArray) : AbsorbingKeccakC α n
Above, α is the type for HashFunction
, and for instance HashFunction 56 6 28
. The macros then ensure that each hash function can only accept the correct parameter and for instance, for SHA3_224 it will only work with a state (AbsorbingKeccakC
or SqueezingKeccakC
) with HashFunction 56 6 28
. I hope this makes sense.
Eric Wieser (Nov 14 2024 at 16:43):
Then I recommend changing the signature to be
def absorb {hf : HashFunction} {n : Capacity} (k : AbsorbingKeccakC hf n) (inputBytes : ByteArray) : AbsorbingKeccakC hf n
Eric Wieser (Nov 14 2024 at 16:45):
And changing to
private structure KeccakC (hf : HashFunction) (n : Capacity) where
A : State
state : SpongeState := SpongeState.absorbing
rate : RateValue n := ⟨ 0, by omega ⟩
buffer : FixedBuffer
bufPos : RateIndex n := ⟨ 0, by simp [KeccakPPermutationSize]; omega ⟩
outputBytesLen := 0
Carbon (Nov 14 2024 at 17:00):
Eric Wieser said:
Then I recommend changing the signature to be
def absorb {hf : HashFunction} {n : Capacity} (k : AbsorbingKeccakC hf n) (inputBytes : ByteArray) : AbsorbingKeccakC hf n
In both changes, HashFunction
is a structure? Or did you mean something like HashFunction c p o
? If the latter, is this for clarity? If the former, users (well nobody really, this is just an API experiment for now :smile: ) could call the wrong function on a state?
Eric Wieser (Nov 14 2024 at 17:05):
Yes, HashFunction
is now the structure
Eric Wieser (Nov 14 2024 at 17:05):
If the former, users (well nobody really, this is just an API experiment for now :smile: ) could call the wrong function on a state?
no, they can't :)
Eric Wieser (Nov 14 2024 at 17:07):
Try playing around with https://github.com/gdncc/Cryptography/pull/1
Carbon (Nov 14 2024 at 17:36):
Eric Wieser said:
Try playing around with https://github.com/gdncc/Cryptography/pull/1
Wow, amazing. Thanks so much! This makes the code much simpler. I somehow lost sight of the obvious in my struggles. That is, my state can depend on a single struct value with 3 fields. Well I learnt a few things today, that, and the XY problem.
Eric Wieser (Nov 14 2024 at 17:39):
I think you can probably also eliminate your macros by using dot notation too, but that's a much more minor point, and it depends whether your private
experiment is intended to forbid users making their own hash functions
Carbon (Nov 14 2024 at 18:08):
Yeah the idea was to attempt implementing an API without footguns, so users should not be able to create their own hash function parameters (although I suspect the implementation could ensure parameters are safe somehow). But I am interested at a high-level how you would dispense of them? I wanted to experiment with Lean macros, and removing (admittedly light) code duplication was an obvious target.
Eric Wieser (Nov 14 2024 at 18:24):
I can make a follow-up PR after you merge / adapt https://github.com/gdncc/Cryptography/pull/1
Carbon (Nov 14 2024 at 18:25):
Eric Wieser said:
I can make a follow-up PR after you merge / adapt https://github.com/gdncc/Cryptography/pull/1
Done!
Eric Wieser (Nov 14 2024 at 23:01):
https://github.com/gdncc/Cryptography/pull/2 has an outline
Eric Wieser (Nov 14 2024 at 23:02):
The trick is that you don't need to define SHA1.mk
etc, you can just define SHA1 : HashFunction
and HashFunction.mk
, and lean resolves the method notation for you.
Carbon (Nov 14 2024 at 23:24):
Eric Wieser said:
The trick is that you don't need to define
SHA1.mk
etc, you can just defineSHA1 : HashFunction
andHashFunction.mk
, and lean resolves the method notation for you.
This is pretty cool. Shorter than the macro code, and it seems to provide the same functionality. I don't think I came across usage of a private constructor before e.g. structure HashFunction where private ofParams ::
I will merge this after a bit of work. Thank you!
Eric Wieser (Nov 14 2024 at 23:43):
Yes, the private constructor is really orthogonal to the rest of the patch, but allows you to say "this type is a public interface, but only I get to construct it"
Eric Wieser (Nov 14 2024 at 23:43):
(the ofParam ::
part is because I neeed to reclaim mk
to use elsewhere!)
Eric Wieser (Nov 14 2024 at 23:44):
Feel free to discard my PR and use it for inspiration, rather than merging it, if you'd prefer
Carbon (Jan 30 2025 at 19:18):
I am porting my SHA-3 implementation from Lean v4.14.0 to v4.15.0, and I had to re-write function storeUInt64()
. Below is my latest working attempt. I can't find a way to remove the repetition in the rw
tactic. I've tried simp
and repeat rw
without success. Any ideas on how I can resolve this?
namespace ByteArray
@[simp] theorem size_set (b : ByteArray) (i : Nat) (v : UInt8) (h : i < b.size) :
(b.set i v h).size = b.size := by
simp [ByteArray.size, ByteArray.set]
end ByteArray
-- Store little-endian `UInt64` in `ByteArray`. All Lean supported platforms are little-endian.
private def storeUInt64 (num : UInt64) : ByteArray := Id.run do
let mut bs := ByteArray.mk $ mkArray 8 0
bs := bs.set 0 (num &&& 0xff).toUInt8 (by decide)
bs := bs.set 1 (num >>> 0x08 &&& 0xff).toUInt8 (by rw [ByteArray.size_set] ;decide )
bs := bs.set 2 (num >>> 0x10 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set] ;decide )
bs := bs.set 3 (num >>> 0x18 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set,ByteArray.size_set] ; decide )
bs := bs.set 4 (num >>> 0x20 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set] ; decide )
bs := bs.set 5 (num >>> 0x28 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set] ; decide )
bs := bs.set 6 (num >>> 0x30 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set] ; decide )
bs := bs.set 7 (num >>> 0x38 &&& 0xff).toUInt8 (by rw [ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set,ByteArray.size_set] ; decide )
bs
Chris Bailey (Jan 30 2025 at 19:36):
There's probably a better overall solution, but you should just be able to do by (repeat rw [ByteArray.size_set]); decide
, which will stop doing the rewrite step the first time it fails.
Carbon (Jan 30 2025 at 19:53):
Thanks @Chris Bailey ! Surrounding repeat rw with parenthesis did the trick!
Vlad Tsyrklevich (Jan 30 2025 at 20:29):
I am not an expert here, but this may be more ergonomic to what you're trying to perform, so you don't need to provide intermediate proofs:
private def storeUInt64 (num : UInt64) : ByteArray :=
ByteArray.mk $ mkArray 8 0
|>.set 0 (num &&& 0xff).toUInt8
|>.set 1 (num >>> 0x08 &&& 0xff).toUInt8
|>.set 2 (num >>> 0x10 &&& 0xff).toUInt8
-- rest here
Kim Morrison (Jan 30 2025 at 20:59):
Can you use Vector UInt8
? The set
operation on Vector knows that the size is constant. There will be much better support for Vector in v4.16 and even more in v4.17.0-rc1, both expected on Monday.
François G. Dorais (Jan 30 2025 at 21:43):
Vlad Tsyrklevich said:
I am not an expert here, but this may be more ergonomic to what you're trying to perform, so you don't need to provide intermediate proofs:
private def storeUInt64 (num : UInt64) : ByteArray := ByteArray.mk $ mkArray 8 0 |>.set 0 (num &&& 0xff).toUInt8 |>.set 1 (num >>> 0x08 &&& 0xff).toUInt8 |>.set 2 (num >>> 0x10 &&& 0xff).toUInt8 -- rest here
Still more ergonomic:
import Batteries.Data.ByteArray
private def storeUInt64 (num : UInt64) : ByteArray :=
ByteArray.ofFn fun (i : Fin 8) => (num >>> (8 * i.val).toUInt64).toUInt8
The &&& 0xff
is redundant.
Carbon (Jan 30 2025 at 21:50):
Vlad Tsyrklevich said:
I am not an expert here, but this may be more ergonomic to what you're trying to perform, so you don't need to provide intermediate proofs:
private def storeUInt64 (num : UInt64) : ByteArray := ByteArray.mk $ mkArray 8 0 |>.set 0 (num &&& 0xff).toUInt8 |>.set 1 (num >>> 0x08 &&& 0xff).toUInt8 |>.set 2 (num >>> 0x10 &&& 0xff).toUInt8 -- rest here
Wow, this works well indeed!
Carbon (Jan 30 2025 at 21:51):
Kim Morrison said:
Can you use
Vector UInt8
? Theset
operation on Vector knows that the size is constant. There will be much better support for Vector in v4.16 and even more in v4.17.0-rc1, both expected on Monday.
Vectors are boxed right?
Carbon (Jan 30 2025 at 21:58):
François G. Dorais said:
Still more ergonomic:
import Batteries.Data.ByteArray private def storeUInt64 (num : UInt64) : ByteArray := ByteArray.ofFn fun (i : Fin 8) => (num >>> (8 * i.val).toUInt64).toUInt8
The
&&& 0xff
is redundant.
Very nice! I want something self-contained for now so I won't use Batteries. Nice catch on &&& 0xff
, I will remove it, thank you.
Kim Morrison (Jan 30 2025 at 23:44):
I'm happy to upstream ByteArray.ofFn
, is someone tells me what they need / that it is ready.
François G. Dorais (Jan 31 2025 at 19:15):
@Kim Morrison I just PR a more efficient implementation of ByteArray.ofFn
if you wish to upstream.
Kim Morrison (Jan 31 2025 at 22:47):
@François G. Dorais, I missed while reviewing that there is still a
@[csimp] private theorem ofFn_eq_ofFnAux : @ofFn = @ofFnAux := by
further down the file. Should we still have that?
François G. Dorais (Jan 31 2025 at 23:24):
@Kim Morrison Good catch! Thanks! That's probably a remnant from before we had optimized Fin.folds. The new def is better since it reuses a sturdy wheel rather than inventing one. I'll fix that.
Last updated: May 02 2025 at 03:31 UTC