Zulip Chat Archive

Stream: Program verification

Topic: Problem with USize

Vassil Keremidchiev (Nov 14 2024 at 21:53):

I'm very new to Lean, but I'm trying to give it a chance with creating a small library for bitmaps. I try to use theorem proving in the place of unit testing on places where that should be possible.

I stuck on a small function, but I split the problem in pieces and found that this theorem does not verify for some reason:
theorem th0 : USize.toNat ((1 : USize) * (1 : USize)) = USize.toNat 1 := by rfl

How should I approach similar problems?

The same with Nats is verifying:
theorem th1 : 1 * 1 = 1 := rfl

Henrik Böving (Nov 14 2024 at 23:16):

The UIntX and IntX API isn't particularly fleshed out at this point for proving purposes unfortunately. You can still get there if you are willing to unfold UInt operations to their rathe well supported BitVec equivalent but you'll be building a lot of API yourself if you want to prove things.

Yann Herklotz (Nov 14 2024 at 23:24):

I think the main issue is here is that the size of a USize is architecture dependent and depends on an opaque constant: System.Platform.numBits. You cannot know its exact value, just that it is 32 or 64. It's therefore even a bit painful to prove simple things like:

-- BitVec System.Platform.numBits == USize
theorem th2 : (@BitVec.toNat System.Platform.numBits 1) = 1 := by
  simp
  -- reason about 64 and 32 bits
  have := System.Platform.numBits_eq
  omega

This is also where rfl is getting stuck, because it can't compute the constant System.Platform.numBits, which is needed to fully evaluate both sides of the equation. Declaring the bitsize directly makes it computable then:

theorem th1 : (@BitVec.toNat 32 1) = 1 := by rfl

In this case though, simp does have enough theorems to be able to prove your property, so maybe you can try that instead and see if it works for other tests as well. In this case it is probably making use of a fact that 1 * 1 is 1 for any bitsize (this is a theorem from Mathlib actually mul_one). simp doesn't do pure computation, but instead performs rewrites of theorems in its database.

import Mathlib

theorem th0 : USize.toNat ((1 : USize) * (1 : USize)) = USize.toNat 1 := by simp only [mul_one,
  USize.toNat_ofNat]
-- other solution is to use BitVec 32 or 64 directly
theorem th1 : BitVec.toNat ((1 : BitVec 32) * (1 : BitVec 32)) = @BitVec.toNat 32 1 := by rfl

Vassil Keremidchiev (Nov 15 2024 at 21:32):

Yann Herklotz said:

I think the main issue is here is that the size of a USize is architecture dependent and depends on an opaque constant: System.Platform.numBits. You cannot know its exact value, just that it is 32 or 64. It's therefore even a bit painful to prove simple things like:
-- BitVec System.Platform.numBits == USize
theorem th2 : (@BitVec.toNat System.Platform.numBits 1) = 1 := by
  simp
  -- reason about 64 and 32 bits
  have := System.Platform.numBits_eq
  omega
This is also where rfl is getting stuck, because it can't compute the constant System.Platform.numBits, which is needed to fully evaluate both sides of the equation. Declaring the bitsize directly makes it computable then:
theorem th1 : (@BitVec.toNat 32 1) = 1 := by rfl
In this case though, simp does have enough theorems to be able to prove your property, so maybe you can try that instead and see if it works for other tests as well. In this case it is probably making use of a fact that 1 * 1 is 1 for any bitsize (this is a theorem from Mathlib actually mul_one). simp doesn't do pure computation, but instead performs rewrites of theorems in its database.
import Mathlib

theorem th0 : USize.toNat ((1 : USize) * (1 : USize)) = USize.toNat 1 := by simp only [mul_one,
  USize.toNat_ofNat]
-- other solution is to use BitVec 32 or 64 directly
theorem th1 : BitVec.toNat ((1 : BitVec 32) * (1 : BitVec 32)) = @BitVec.toNat 32 1 := by rfl

First, thank you very much about the explanation and the examples!
It's interesting to me, why this constant is not a constant in the type space also? It cannot change during runtime. It's related to the compilation and can be fixed.

Henrik Böving (Nov 15 2024 at 22:48):

You are proving something about your Lean program, Lean programs can generally run on both 32 and 64 bit platforms so you need to prove something about all platforms that your program can run on. If a proof is invalid depending on the platform you're on that is probably not good.

Vassil Keremidchiev (Nov 20 2024 at 08:54):

Henrik Böving said:

You are proving something about your Lean program, Lean programs can generally run on both 32 and 64 bit platforms so you need to prove something about all platforms that your program can run on. If a proof is invalid depending on the platform you're on that is probably not good.

But it's not like running same script or same Python on different platforms. It's a recompilation of the program to have two separate executables for each platform. Each compilation could have different settings, so it is like you have two separate programs with a separate proof checks. Those separate compilations lead to very different instructions used.

Henrik Böving (Nov 20 2024 at 09:00):

Yes, still just compiling a Lean proof on a different computer should not make a difference as to whether it type checks or not.

Chris Bailey (Nov 20 2024 at 16:05):

Vassil Keremidchiev said:

Henrik Böving said:

You are proving something about your Lean program, Lean programs can generally run on both 32 and 64 bit platforms so you need to prove something about all platforms that your program can run on. If a proof is invalid depending on the platform you're on that is probably not good.

But it's not like running same script or same Python on different platforms. It's a recompilation of the program to have two separate executables for each platform. Each compilation could have different settings, so it is like you have two separate programs with a separate proof checks. Those separate compilations lead to very different instructions used.

"A program written in Lean" and "the artifact obtained by compiling a program written in Lean" are two different things. The compiler is not part of the trusted code base, and there's no model of the compiler that you can use within Lean to reason about compilation. When you write proofs about USize, you're doing so before it's known what the compilation target or pointer size are, so it cannot be fixed.

If you want to assume the program will be run in a 32 bit system, presumably you can write your proofs with a hypothesis System.Platform.numBits = 32.

Alternatively you can go the sel4 route and use the proof assistant to reason about a compiled program by making the assembly output by the compiler the domain instead of the source code.

Vassil Keremidchiev (Nov 26 2024 at 20:48):

Chris Bailey said:

Vassil Keremidchiev said:

Henrik Böving said:

You are proving something about your Lean program, Lean programs can generally run on both 32 and 64 bit platforms so you need to prove something about all platforms that your program can run on. If a proof is invalid depending on the platform you're on that is probably not good.

But it's not like running same script or same Python on different platforms. It's a recompilation of the program to have two separate executables for each platform. Each compilation could have different settings, so it is like you have two separate programs with a separate proof checks. Those separate compilations lead to very different instructions used.

"A program written in Lean" and "the artifact obtained by compiling a program written in Lean" are two different things. The compiler is not part of the trusted code base, and there's no model of the compiler that you can use within Lean to reason about compilation. When you write proofs about USize, you're doing so before it's known what the compilation target or pointer size are, so it cannot be fixed.

If you want to assume the program will be run in a 32 bit system, presumably you can write your proofs with a hypothesis System.Platform.numBits = 32.

Alternatively you can go the sel4 route and use the proof assistant to reason about a compiled program by making the assembly output by the compiler the domain instead of the source code.

Thanks for the detailed explanation! It totally makes sense!

Last updated: May 02 2025 at 03:31 UTC