Zulip Chat Archive

Stream: lean4

Topic: Simplifying Access to Array Literal


David Pearce (May 26 2024 at 22:24):

I feel confused why simp fails to make progress here. Did I miss something?

example : #[00][0] = 0 :=
by
  simp

Henrik Böving (May 26 2024 at 22:25):

This particular case you can solve with by decide. Simp probably doesn't have lemmas that allow this type of reduction.

David Pearce (May 26 2024 at 22:26):

I figured out you can do it with rfl. What is decide ?

Henrik Böving (May 26 2024 at 22:27):

rfl and decide are similar but decide is more powerful in some situations. rfl can prove goals that hold by definitional equality while decide can solve all goals for which a docs#Decidable instance can be synthesized. It will synthesize this instance, reduce it and use the proof it produces to close the goal.

David Pearce (May 26 2024 at 22:27):

Ok, ta

Henrik Böving (May 26 2024 at 22:28):

There are also cases where rfl is stronger than decide though. For example if you don't have a Decidable instance for something but it still holds definitionally. This might in particular be the case when there are variables floating around within your goal. In the end it depends on the situation when to use which

David Pearce (May 26 2024 at 22:29):

Hmmm, well I can just try it and see whether it helps in any given instance.

David Pearce (May 26 2024 at 22:37):

It makes using arrays really unpleasant, because I can't simplify a large expression involving e.g. a match #[0][0].

Henrik Böving (May 26 2024 at 22:42):

You can enable ground term reduction in simp (hover over ground to see what it does):

  simp (config := { ground := true })

Note that depending on your goal this can do a lot of computation so this is the reason it is not generally enabled. If this happens you can in theory still zoom in on the relevant terms with conv mode

David Pearce (May 26 2024 at 22:43):

It feels like doing even seemingly simple things in Lean lead to complexity. I'm constantly getting tripped by a myriad of details like this ... ?

David Pearce (May 26 2024 at 22:44):

What's conv mode?

Henrik Böving (May 26 2024 at 22:46):

I think we can teach simp about this with a simproc that is enabled by default. If you are working with a theory fragment that not many people have worked with before (I don't think many people work with indices on constant arrays) it is possible that proof automation in this region has not been brought to the level of other regions yet. If you find such things please do report them here, we are looking to fix as many of these papercuts as possible!

Henrik Böving (May 26 2024 at 22:46):

David Pearce said:

What's conv mode?

https://lean-lang.org/theorem_proving_in_lean4/conv.html. Note that you do not need to use conv mode if just the simp with the config option works fine. I just explained the reason why it is not on by default and how, if this bites you, you can get around it if you want.

David Pearce (May 26 2024 at 22:47):

All I'm actually doing is writing an example which serves as a test to ensure a piece of code behaves as expected. I'm using lean more in the style of a programming language than for e.g. proving mathmetics.

Henrik Böving (May 26 2024 at 22:49):

Right. Are you just evaluating a program without any free variables? In that case you can use eval + guard_msgs to write quick unit tests, check e.g. https://github.com/leanprover/leansat/blob/main/Test/AIG/Shared.lean#L10-L12. The idea is that you write the eval, then you put guard_msgs in above it and use the code action that guard_msgs provides to auto generate the doc string that determines the desired output.

David Pearce (May 26 2024 at 22:49):

Are you just evaluating a program without any free variables?

Yes, that is what I'm doing in this case

David Pearce (May 26 2024 at 22:50):

So, conversion mode allows you to apply rewrites to very specific subterms of an expression? Interesting.

Henrik Böving (May 26 2024 at 22:51):

That is correct yes.

David Pearce (May 26 2024 at 22:51):

But, no, #eval is not what I want. I want to write a proof that something reduces to something.

Henrik Böving (May 26 2024 at 22:51):

In that case why doesn't rfl suffice?

Henrik Böving (May 26 2024 at 22:52):

If a proof can be done by reduction rfl will find it

David Pearce (May 26 2024 at 22:53):

Well it does and it doesn't. I got stuck as rflcouldn't solve my example. There was actually a bug so the example should fail. But, the expression I was left with was very large and incomprehensible. I was expecting simp to reduce it down to something more manageable. But it did not.

Henrik Böving (May 26 2024 at 22:54):

Can you show me?

David Pearce (May 26 2024 at 22:54):

In fact, using List directly instead of Arrayand I get the desired behaviour. To be honest, I think using arrays is just a bad idea.

David Pearce (May 26 2024 at 22:54):

Can you show me?

Uhh, I mean ... I could ... but its part of a large example.

David Pearce (May 26 2024 at 22:55):

Let me try

Henrik Böving (May 26 2024 at 22:55):

It depends on the context. Writing proof automation about List is quite simple and a lot of it has been done already, Array is certainly lacking behind in a few cases like this but it will eventually be fixed. The main motivation to use Array should be performance.

David Pearce (May 26 2024 at 22:56):

Well, performance is not a concern for me write now. But, an array is the natural decription for what I'm doing.

Henrik Böving (May 26 2024 at 22:58):

What is it that you are doing then?

David Pearce (May 26 2024 at 22:59):

Leaning Lean :grinning:

Henrik Böving (May 26 2024 at 23:00):

Right^^ I was talking about the thing that has a natural description as arrays

David Pearce (May 26 2024 at 23:00):

Using a toy example which is based on an EVM: https://github.com/DavePearce/LeanEVM

Henrik Böving (May 26 2024 at 23:00):

Right, where is the piece of code that you are struggling with?

David Pearce (May 26 2024 at 23:01):

Hang on

David Pearce (May 26 2024 at 23:01):

I'm just simplifying it

David Pearce (May 26 2024 at 23:10):

Hmmm, I don't think this is going to help:

import Batteries.Data.List.Lemmas

-- A set of zero or more bytes upto a given size.
structure FinVec {n:Nat} (T) where
  data : List T
  isLt : data.length <= n

def Bytes32 := FinVec (n:=32) UInt8


abbrev UInt4 := Fin 16
@[simp]
abbrev U4_MAX : UInt4 := {val:=15, isLt:=(by simp_arith)}

inductive Bytecode where
-- 0s: Stop and Arithmetic Operations
| Stop
| Add
| Sub
| Mul
| Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| Pop
-- 60s & 70s: Push Operations
| Push(bs:Bytes32)
-- 80s: Duplication Operations
| Dup(n:UInt4)
-- 90s: Exchange Operations
| Swap(n:UInt4)
-- a0s: Logging Operations
-- f0s: System operations
| Invalid


/- =============================================================== -/
/- Code ROM -/
/- =============================================================== -/

def MAX_CODE_SIZE := 24576

def EvmCode := Array UInt8

@[simp]
def EvmCode.read(st:EvmCode)(pc:Nat) : UInt8 :=
  if r:pc >= st.size
  then
    0
  else
    st.get {val:=pc,isLt:=(by omega)}

-- Read `n` bytes from the code sequence starting a given `pc` position.
@[simp]
def EvmCode.slice(st:EvmCode)(pc:Nat)(n:Nat)(p:n32) : Bytes32 :=
  let head := (st.data.splitAt pc).snd
  let bytes := head.takeD n 0
  -- Prove bytes has at most 32 elements.
  have q : bytes.length = n := by sorry
  -- Construct FinVec
  {data:=bytes, isLt:=by exact le_of_eq_of_le q p}

-- Decode the instruction at a given `pc` position within the code sequence.
@[simp]
def EvmCode.decode (st:EvmCode)(pc:Nat) : Bytecode :=
  -- Read opcode
  let opcode := st.read pc;
  -- Decode opcode
  match opcode.val with
  -- 0s: Stop and Arithmetic Operations
  | 0x00 => Bytecode.Stop
  | 0x01 => Bytecode.Add
  | 0x02 => Bytecode.Mul
  | 0x03 => Bytecode.Sub
  | 0x04 => Bytecode.Div
  -- 10s: Comparison & Bitwise Logic Operations
  -- 20s: SHA3
  -- 30s: Environment Information
  -- 40s: Block Information
  -- 50s: Stack, Memory Storage and Flow Operations
  | 0x50 => Bytecode.Pop
  -- 60s & 70s: Push Operations
  | 0x60 => Bytecode.Push (st.slice pc 1 (by simp_arith))
  | 0x61 => Bytecode.Push (st.slice pc 2 (by simp_arith))
  | 0x62 => Bytecode.Push (st.slice pc 3 (by simp_arith))
  | 0x63 => Bytecode.Push (st.slice pc 4 (by simp_arith))
  | 0x64 => Bytecode.Push (st.slice pc 5 (by simp_arith))
  | 0x65 => Bytecode.Push (st.slice pc 6 (by simp_arith))
  | 0x66 => Bytecode.Push (st.slice pc 7 (by simp_arith))
  | 0x67 => Bytecode.Push (st.slice pc 8 (by simp_arith))
  | 0x68 => Bytecode.Push (st.slice pc 9 (by simp_arith))
  | 0x69 => Bytecode.Push (st.slice pc 10 (by simp_arith))
  | 0x6a => Bytecode.Push (st.slice pc 11 (by simp_arith))
  | 0x6b => Bytecode.Push (st.slice pc 12 (by simp_arith))
  | 0x6c => Bytecode.Push (st.slice pc 13 (by simp_arith))
  | 0x6d => Bytecode.Push (st.slice pc 14 (by simp_arith))
  | 0x6e => Bytecode.Push (st.slice pc 15 (by simp_arith))
  | 0x6f => Bytecode.Push (st.slice pc 16 (by simp_arith))
  -- 80s: Duplication Operations
  | 0x80 => Bytecode.Dup {val:=0, isLt:=(by simp_arith)}
  | 0x81 => Bytecode.Dup {val:=1, isLt:=(by simp_arith)}
  | 0x82 => Bytecode.Dup {val:=2, isLt:=(by simp_arith)}
  | 0x83 => Bytecode.Dup {val:=3, isLt:=(by simp_arith)}
  | 0x84 => Bytecode.Dup {val:=4, isLt:=(by simp_arith)}
  | 0x85 => Bytecode.Dup {val:=5, isLt:=(by simp_arith)}
  | 0x86 => Bytecode.Dup {val:=6, isLt:=(by simp_arith)}
  | 0x87 => Bytecode.Dup {val:=7, isLt:=(by simp_arith)}
  | 0x88 => Bytecode.Dup {val:=8, isLt:=(by simp_arith)}
  | 0x89 => Bytecode.Dup {val:=9, isLt:=(by simp_arith)}
  | 0x8a => Bytecode.Dup {val:=10, isLt:=(by simp_arith)}
  | 0x8b => Bytecode.Dup {val:=11, isLt:=(by simp_arith)}
  | 0x8c => Bytecode.Dup {val:=12, isLt:=(by simp_arith)}
  | 0x8d => Bytecode.Dup {val:=13, isLt:=(by simp_arith)}
  | 0x8e => Bytecode.Dup {val:=14, isLt:=(by simp_arith)}
  | 0x8f => Bytecode.Dup {val:=15, isLt:=(by simp_arith)}
  -- 90s: Exchange Operations
  -- a0s: Logging Operations
  -- f0s: System operations
  | _ => Bytecode.Invalid

example : (EvmCode.decode #[0x60,0xf5] 0) = Bytecode.Push {data:=[0x60],isLt:=by simp} :=
by
  simp

David Pearce (May 26 2024 at 23:11):

I wasn't able to simplify it down more. In the end, note that I know how to prove the example. That's not the problem.

David Pearce (May 26 2024 at 23:13):

A slightly shorter version:

import Batteries.Data.List.Lemmas

inductive Bytecode where
-- 0s: Stop and Arithmetic Operations
| Stop
| Add
| Sub
| Mul
| Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| Pop
-- 60s & 70s: Push Operations
| Push(bs:List UInt8)
-- 80s: Duplication Operations
| Dup(n:Fin 16)
-- 90s: Exchange Operations
| Swap(n:Fin 16)
-- a0s: Logging Operations
-- f0s: System operations
| Invalid

/- =============================================================== -/
/- Code ROM -/
/- =============================================================== -/

def MAX_CODE_SIZE := 24576

def EvmCode := Array UInt8

@[simp]
def EvmCode.read(st:EvmCode)(pc:Nat) : UInt8 :=
  if r:pc >= st.size
  then
    0
  else
    st.get {val:=pc,isLt:=(by omega)}

-- Read `n` bytes from the code sequence starting a given `pc` position.
@[simp]
def EvmCode.slice(st:EvmCode)(pc:Nat)(n:Nat)(p:n32) : List UInt8 :=
  let head := (st.data.splitAt pc).snd
  let bytes := head.takeD n 0
  -- Prove bytes has at most 32 elements.
  --have q : bytes.length = n := by sorry
  -- Construct FinVec
  bytes

-- Decode the instruction at a given `pc` position within the code sequence.
@[simp]
def EvmCode.decode (st:EvmCode)(pc:Nat) : Bytecode :=
  -- Read opcode
  let opcode := st.read pc;
  -- Decode opcode
  match opcode.val with
  -- 0s: Stop and Arithmetic Operations
  | 0x00 => Bytecode.Stop
  | 0x01 => Bytecode.Add
  | 0x02 => Bytecode.Mul
  | 0x03 => Bytecode.Sub
  | 0x04 => Bytecode.Div
  -- 10s: Comparison & Bitwise Logic Operations
  -- 20s: SHA3
  -- 30s: Environment Information
  -- 40s: Block Information
  -- 50s: Stack, Memory Storage and Flow Operations
  | 0x50 => Bytecode.Pop
  -- 60s & 70s: Push Operations
  | 0x60 => Bytecode.Push (st.slice pc 1 (by simp_arith))
  | 0x61 => Bytecode.Push (st.slice pc 2 (by simp_arith))
  | 0x62 => Bytecode.Push (st.slice pc 3 (by simp_arith))
  | 0x63 => Bytecode.Push (st.slice pc 4 (by simp_arith))
  | 0x64 => Bytecode.Push (st.slice pc 5 (by simp_arith))
  | 0x65 => Bytecode.Push (st.slice pc 6 (by simp_arith))
  | 0x66 => Bytecode.Push (st.slice pc 7 (by simp_arith))
  | 0x67 => Bytecode.Push (st.slice pc 8 (by simp_arith))
  | 0x68 => Bytecode.Push (st.slice pc 9 (by simp_arith))
  | 0x69 => Bytecode.Push (st.slice pc 10 (by simp_arith))
  | 0x6a => Bytecode.Push (st.slice pc 11 (by simp_arith))
  | 0x6b => Bytecode.Push (st.slice pc 12 (by simp_arith))
  | 0x6c => Bytecode.Push (st.slice pc 13 (by simp_arith))
  | 0x6d => Bytecode.Push (st.slice pc 14 (by simp_arith))
  | 0x6e => Bytecode.Push (st.slice pc 15 (by simp_arith))
  | 0x6f => Bytecode.Push (st.slice pc 16 (by simp_arith))
  -- 80s: Duplication Operations
  | 0x80 => Bytecode.Dup {val:=0, isLt:=(by simp_arith)}
  | 0x81 => Bytecode.Dup {val:=1, isLt:=(by simp_arith)}
  | 0x82 => Bytecode.Dup {val:=2, isLt:=(by simp_arith)}
  | 0x83 => Bytecode.Dup {val:=3, isLt:=(by simp_arith)}
  | 0x84 => Bytecode.Dup {val:=4, isLt:=(by simp_arith)}
  | 0x85 => Bytecode.Dup {val:=5, isLt:=(by simp_arith)}
  | 0x86 => Bytecode.Dup {val:=6, isLt:=(by simp_arith)}
  | 0x87 => Bytecode.Dup {val:=7, isLt:=(by simp_arith)}
  | 0x88 => Bytecode.Dup {val:=8, isLt:=(by simp_arith)}
  | 0x89 => Bytecode.Dup {val:=9, isLt:=(by simp_arith)}
  | 0x8a => Bytecode.Dup {val:=10, isLt:=(by simp_arith)}
  | 0x8b => Bytecode.Dup {val:=11, isLt:=(by simp_arith)}
  | 0x8c => Bytecode.Dup {val:=12, isLt:=(by simp_arith)}
  | 0x8d => Bytecode.Dup {val:=13, isLt:=(by simp_arith)}
  | 0x8e => Bytecode.Dup {val:=14, isLt:=(by simp_arith)}
  | 0x8f => Bytecode.Dup {val:=15, isLt:=(by simp_arith)}
  -- 90s: Exchange Operations
  -- a0s: Logging Operations
  -- f0s: System operations
  | _ => Bytecode.Invalid

example : (EvmCode.decode #[0x60,0xf5] 0) = Bytecode.Push [0x60] :=
by
  simp

David Pearce (May 26 2024 at 23:14):

Removed UInt4, FinVec and Bytes32

David Pearce (May 26 2024 at 23:15):

Error message is:

(match #[96, 245][0].val with
  | 0 => Bytecode.Stop
  | 1 => Bytecode.Add
  | 2 => Bytecode.Mul
  | 3 => Bytecode.Sub
  | 4 => Bytecode.Div
  | 80 => Bytecode.Pop
  | 96 => Bytecode.Push [(List.splitAt 0 [96, 245]).snd.head?.getD 0]
  | 97 => Bytecode.Push [(List.splitAt 0 [96, 245]).snd.head?.getD 0, (List.splitAt 0 [96, 245]).snd.tail.head?.getD 0]
  | 98 =>
...

Which makes sense, except that of course I'm expecting it to simplify away all the dead cases.

Henrik Böving (May 26 2024 at 23:19):

Right. This does make sense and I think we can get that kind of issue fixed.

As a general remark on your code, you are using the data function on Array to obtain a List. That is generally a bad idea. You want to express your algorithm in terms of Array operations only and use the Array.data function only in proofs if at all. This will make your code both more performant and should also make it easier to reason about (though it will not fix this particular issue) as the proof automation on Array is geared towards working with Array and not the conversion function to linked lists as much.

David Pearce (May 26 2024 at 23:21):

As a general remark on your code, you are using the data function

hmmmm, there is only one case in that example above: st.data.splitAt

Henrik Böving (May 26 2024 at 23:22):

Yes, this is already not a good idea. Calling Array.data on your array will take the array and convert it into a linked list in O(n), that's not something you want to do

David Pearce (May 26 2024 at 23:22):

Array does not have many functions defined on it, at least in Batteries: https://leanprover-community.github.io/mathlib4_docs/Batteries/Data/Array/Basic.html

David Pearce (May 26 2024 at 23:23):

Yes, this is already not a good idea. Calling Array.data on your array will take the array and convert it into a linked list in O(n), that's not something you want to do

Well, I understand that. But, I did not easily find a better way of doing it. All I really want is a slice of the array from an index n to a later index m. I was surprised I couldn't find that in Batteries.

Henrik Böving (May 26 2024 at 23:24):

There does exist a type called Subarray for this but it has quite little theory built around it as nobody so far has been interested in using it.

David Pearce (May 26 2024 at 23:25):

There's also ByteArray I just noticed ... which might be helpfuil in this case.

David Pearce (May 26 2024 at 23:25):

There does exist a type called Subarray for this but it has quite little theory built around it as nobody so far has been interested in using it.

Yeah, I just don't think my use case is well supported by Lean at this time.

Henrik Böving (May 26 2024 at 23:25):

ByteArray is an optimized version of Array UInt8 with the UInt8 being completely unboxed, note that this one has even less theory built around it. This is again mostly because nobody has used it yet.

Henrik Böving (May 26 2024 at 23:26):

writing simulators for low level architectures is very much something that's possible right now, for example https://github.com/leanprover/LNSym does this for Armv8

Henrik Böving (May 26 2024 at 23:32):

All that being said, if you do not care about performance because you do not actually want to execute thus program, running this whole thing with List instead of array is perfectly fine and it will give you a much smoother time.

David Pearce (May 26 2024 at 23:34):

running this whole thing with List instead of array is perfectly fine and it will give you a much smoother time.

Well, its a learning exercise :) And, yes, this is definitely the path of least resistance!

Mario Carneiro (May 27 2024 at 00:41):

David Pearce said:

Array does not have many functions defined on it, at least in Batteries: https://leanprover-community.github.io/mathlib4_docs/Batteries/Data/Array/Basic.html

Note that Batteries only contains supplemental definitions, on top of the pre-existing set of Array functions in core. These functions are distributed about somewhat, but most are in Init.Prelude and Init.Data.Array.Basic

David Pearce (May 27 2024 at 00:43):

So, Array.extract is what I was originally looking for then I think

David Pearce (May 27 2024 at 00:45):

Its hard to find things if you don't know exactly where to look. I used the search function in the mathlib4 docs, but didn't find it :frown:

Mario Carneiro (May 27 2024 at 00:45):

Have you tried loogle?

David Pearce (May 27 2024 at 00:45):

?

Mario Carneiro (May 27 2024 at 00:46):

@loogle Array

loogle (May 27 2024 at 00:46):

:search: Array, Array.empty, and 4005 more

David Pearce (May 27 2024 at 00:46):

No, I haven't come across that before. Will bookmark it :)

David Pearce (May 27 2024 at 00:47):

Array.extract is there ... buried under a bunch of other stuff.

Mario Carneiro (May 27 2024 at 00:49):

well, I made a very broad search there.

@loogle Array ?a -> Nat -> Nat -> Array ?a

loogle (May 27 2024 at 00:49):

:search: Array.extract, Array.appendCore.loop, and 11 more

David Pearce (May 27 2024 at 00:50):

Yeah, but what else can you do if you don't know what you're looking for. "Array slice" did not provide any useful answers for me.

David Pearce (May 27 2024 at 00:50):

Error

application type mismatch
  Array slice
argument
  slice
has type
  Lean.ParserDescr : Type
but is expected to have type
  Type ?u.144514 : Type (?u.144514 + 1)

Mario Carneiro (May 27 2024 at 00:50):

loogle lets you search by type signature, as well as by label segment

David Pearce (May 27 2024 at 00:51):

Yeah, i see

Mario Carneiro (May 27 2024 at 00:51):

note that it has structured input, see the loogle docs

David Pearce (May 27 2024 at 00:51):

Yeah, and if I do that then Array.extract is the top hit. Ok, so that works :)

Mario Carneiro (May 27 2024 at 00:51):

if you want unstructured (natural language) input, there is also moogle.ai

Mario Carneiro (May 27 2024 at 00:53):

and searching for "Array slice" on moogle actually yields Array.extract, near the bottom of the first page


Last updated: May 02 2025 at 03:31 UTC