Zulip Chat Archive
Stream: lean4
Topic: Simplifying Access to Array Literal
David Pearce (May 26 2024 at 22:24):
I feel confused why simp
fails to make progress here. Did I miss something?
example : #[00][0] = 0 :=
by
simp
Henrik Böving (May 26 2024 at 22:25):
This particular case you can solve with by decide
. Simp probably doesn't have lemmas that allow this type of reduction.
David Pearce (May 26 2024 at 22:26):
I figured out you can do it with rfl
. What is decide
?
Henrik Böving (May 26 2024 at 22:27):
rfl
and decide
are similar but decide
is more powerful in some situations. rfl
can prove goals that hold by definitional equality while decide
can solve all goals for which a docs#Decidable instance can be synthesized. It will synthesize this instance, reduce it and use the proof it produces to close the goal.
David Pearce (May 26 2024 at 22:27):
Ok, ta
Henrik Böving (May 26 2024 at 22:28):
There are also cases where rfl
is stronger than decide
though. For example if you don't have a Decidable instance for something but it still holds definitionally. This might in particular be the case when there are variables floating around within your goal. In the end it depends on the situation when to use which
David Pearce (May 26 2024 at 22:29):
Hmmm, well I can just try it and see whether it helps in any given instance.
David Pearce (May 26 2024 at 22:37):
It makes using arrays really unpleasant, because I can't simplify a large expression involving e.g. a match #[0][0]
.
Henrik Böving (May 26 2024 at 22:42):
You can enable ground term reduction in simp (hover over ground
to see what it does):
simp (config := { ground := true })
Note that depending on your goal this can do a lot of computation so this is the reason it is not generally enabled. If this happens you can in theory still zoom in on the relevant terms with conv
mode
David Pearce (May 26 2024 at 22:43):
It feels like doing even seemingly simple things in Lean lead to complexity. I'm constantly getting tripped by a myriad of details like this ... ?
David Pearce (May 26 2024 at 22:44):
What's conv
mode?
Henrik Böving (May 26 2024 at 22:46):
I think we can teach simp about this with a simproc that is enabled by default. If you are working with a theory fragment that not many people have worked with before (I don't think many people work with indices on constant arrays) it is possible that proof automation in this region has not been brought to the level of other regions yet. If you find such things please do report them here, we are looking to fix as many of these papercuts as possible!
Henrik Böving (May 26 2024 at 22:46):
David Pearce said:
What's
conv
mode?
https://lean-lang.org/theorem_proving_in_lean4/conv.html. Note that you do not need to use conv mode if just the simp
with the config option works fine. I just explained the reason why it is not on by default and how, if this bites you, you can get around it if you want.
David Pearce (May 26 2024 at 22:47):
All I'm actually doing is writing an example
which serves as a test to ensure a piece of code behaves as expected. I'm using lean more in the style of a programming language than for e.g. proving mathmetics.
Henrik Böving (May 26 2024 at 22:49):
Right. Are you just evaluating a program without any free variables? In that case you can use eval
+ guard_msgs
to write quick unit tests, check e.g. https://github.com/leanprover/leansat/blob/main/Test/AIG/Shared.lean#L10-L12. The idea is that you write the eval
, then you put guard_msgs in
above it and use the code action that guard_msgs provides to auto generate the doc string that determines the desired output.
David Pearce (May 26 2024 at 22:49):
Are you just evaluating a program without any free variables?
Yes, that is what I'm doing in this case
David Pearce (May 26 2024 at 22:50):
So, conversion mode allows you to apply rewrites to very specific subterms of an expression? Interesting.
Henrik Böving (May 26 2024 at 22:51):
That is correct yes.
David Pearce (May 26 2024 at 22:51):
But, no, #eval
is not what I want. I want to write a proof that something reduces to something.
Henrik Böving (May 26 2024 at 22:51):
In that case why doesn't rfl
suffice?
Henrik Böving (May 26 2024 at 22:52):
If a proof can be done by reduction rfl
will find it
David Pearce (May 26 2024 at 22:53):
Well it does and it doesn't. I got stuck as rfl
couldn't solve my example
. There was actually a bug so the example should fail. But, the expression I was left with was very large and incomprehensible. I was expecting simp
to reduce it down to something more manageable. But it did not.
Henrik Böving (May 26 2024 at 22:54):
Can you show me?
David Pearce (May 26 2024 at 22:54):
In fact, using List
directly instead of Array
and I get the desired behaviour. To be honest, I think using arrays is just a bad idea.
David Pearce (May 26 2024 at 22:54):
Can you show me?
Uhh, I mean ... I could ... but its part of a large example.
David Pearce (May 26 2024 at 22:55):
Let me try
Henrik Böving (May 26 2024 at 22:55):
It depends on the context. Writing proof automation about List
is quite simple and a lot of it has been done already, Array
is certainly lacking behind in a few cases like this but it will eventually be fixed. The main motivation to use Array
should be performance.
David Pearce (May 26 2024 at 22:56):
Well, performance is not a concern for me write now. But, an array is the natural decription for what I'm doing.
Henrik Böving (May 26 2024 at 22:58):
What is it that you are doing then?
David Pearce (May 26 2024 at 22:59):
Leaning Lean :grinning:
Henrik Böving (May 26 2024 at 23:00):
Right^^ I was talking about the thing that has a natural description as arrays
David Pearce (May 26 2024 at 23:00):
Using a toy example which is based on an EVM: https://github.com/DavePearce/LeanEVM
Henrik Böving (May 26 2024 at 23:00):
Right, where is the piece of code that you are struggling with?
David Pearce (May 26 2024 at 23:01):
Hang on
David Pearce (May 26 2024 at 23:01):
I'm just simplifying it
David Pearce (May 26 2024 at 23:10):
Hmmm, I don't think this is going to help:
import Batteries.Data.List.Lemmas
-- A set of zero or more bytes upto a given size.
structure FinVec {n:Nat} (T) where
data : List T
isLt : data.length <= n
def Bytes32 := FinVec (n:=32) UInt8
abbrev UInt4 := Fin 16
@[simp]
abbrev U4_MAX : UInt4 := {val:=15, isLt:=(by simp_arith)}
inductive Bytecode where
-- 0s: Stop and Arithmetic Operations
| Stop
| Add
| Sub
| Mul
| Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| Pop
-- 60s & 70s: Push Operations
| Push(bs:Bytes32)
-- 80s: Duplication Operations
| Dup(n:UInt4)
-- 90s: Exchange Operations
| Swap(n:UInt4)
-- a0s: Logging Operations
-- f0s: System operations
| Invalid
/- =============================================================== -/
/- Code ROM -/
/- =============================================================== -/
def MAX_CODE_SIZE := 24576
def EvmCode := Array UInt8
@[simp]
def EvmCode.read(st:EvmCode)(pc:Nat) : UInt8 :=
if r:pc >= st.size
then
0
else
st.get {val:=pc,isLt:=(by omega)}
-- Read `n` bytes from the code sequence starting a given `pc` position.
@[simp]
def EvmCode.slice(st:EvmCode)(pc:Nat)(n:Nat)(p:n≤32) : Bytes32 :=
let head := (st.data.splitAt pc).snd
let bytes := head.takeD n 0
-- Prove bytes has at most 32 elements.
have q : bytes.length = n := by sorry
-- Construct FinVec
{data:=bytes, isLt:=by exact le_of_eq_of_le q p}
-- Decode the instruction at a given `pc` position within the code sequence.
@[simp]
def EvmCode.decode (st:EvmCode)(pc:Nat) : Bytecode :=
-- Read opcode
let opcode := st.read pc;
-- Decode opcode
match opcode.val with
-- 0s: Stop and Arithmetic Operations
| 0x00 => Bytecode.Stop
| 0x01 => Bytecode.Add
| 0x02 => Bytecode.Mul
| 0x03 => Bytecode.Sub
| 0x04 => Bytecode.Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| 0x50 => Bytecode.Pop
-- 60s & 70s: Push Operations
| 0x60 => Bytecode.Push (st.slice pc 1 (by simp_arith))
| 0x61 => Bytecode.Push (st.slice pc 2 (by simp_arith))
| 0x62 => Bytecode.Push (st.slice pc 3 (by simp_arith))
| 0x63 => Bytecode.Push (st.slice pc 4 (by simp_arith))
| 0x64 => Bytecode.Push (st.slice pc 5 (by simp_arith))
| 0x65 => Bytecode.Push (st.slice pc 6 (by simp_arith))
| 0x66 => Bytecode.Push (st.slice pc 7 (by simp_arith))
| 0x67 => Bytecode.Push (st.slice pc 8 (by simp_arith))
| 0x68 => Bytecode.Push (st.slice pc 9 (by simp_arith))
| 0x69 => Bytecode.Push (st.slice pc 10 (by simp_arith))
| 0x6a => Bytecode.Push (st.slice pc 11 (by simp_arith))
| 0x6b => Bytecode.Push (st.slice pc 12 (by simp_arith))
| 0x6c => Bytecode.Push (st.slice pc 13 (by simp_arith))
| 0x6d => Bytecode.Push (st.slice pc 14 (by simp_arith))
| 0x6e => Bytecode.Push (st.slice pc 15 (by simp_arith))
| 0x6f => Bytecode.Push (st.slice pc 16 (by simp_arith))
-- 80s: Duplication Operations
| 0x80 => Bytecode.Dup {val:=0, isLt:=(by simp_arith)}
| 0x81 => Bytecode.Dup {val:=1, isLt:=(by simp_arith)}
| 0x82 => Bytecode.Dup {val:=2, isLt:=(by simp_arith)}
| 0x83 => Bytecode.Dup {val:=3, isLt:=(by simp_arith)}
| 0x84 => Bytecode.Dup {val:=4, isLt:=(by simp_arith)}
| 0x85 => Bytecode.Dup {val:=5, isLt:=(by simp_arith)}
| 0x86 => Bytecode.Dup {val:=6, isLt:=(by simp_arith)}
| 0x87 => Bytecode.Dup {val:=7, isLt:=(by simp_arith)}
| 0x88 => Bytecode.Dup {val:=8, isLt:=(by simp_arith)}
| 0x89 => Bytecode.Dup {val:=9, isLt:=(by simp_arith)}
| 0x8a => Bytecode.Dup {val:=10, isLt:=(by simp_arith)}
| 0x8b => Bytecode.Dup {val:=11, isLt:=(by simp_arith)}
| 0x8c => Bytecode.Dup {val:=12, isLt:=(by simp_arith)}
| 0x8d => Bytecode.Dup {val:=13, isLt:=(by simp_arith)}
| 0x8e => Bytecode.Dup {val:=14, isLt:=(by simp_arith)}
| 0x8f => Bytecode.Dup {val:=15, isLt:=(by simp_arith)}
-- 90s: Exchange Operations
-- a0s: Logging Operations
-- f0s: System operations
| _ => Bytecode.Invalid
example : (EvmCode.decode #[0x60,0xf5] 0) = Bytecode.Push {data:=[0x60],isLt:=by simp} :=
by
simp
David Pearce (May 26 2024 at 23:11):
I wasn't able to simplify it down more. In the end, note that I know how to prove the example
. That's not the problem.
David Pearce (May 26 2024 at 23:13):
A slightly shorter version:
import Batteries.Data.List.Lemmas
inductive Bytecode where
-- 0s: Stop and Arithmetic Operations
| Stop
| Add
| Sub
| Mul
| Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| Pop
-- 60s & 70s: Push Operations
| Push(bs:List UInt8)
-- 80s: Duplication Operations
| Dup(n:Fin 16)
-- 90s: Exchange Operations
| Swap(n:Fin 16)
-- a0s: Logging Operations
-- f0s: System operations
| Invalid
/- =============================================================== -/
/- Code ROM -/
/- =============================================================== -/
def MAX_CODE_SIZE := 24576
def EvmCode := Array UInt8
@[simp]
def EvmCode.read(st:EvmCode)(pc:Nat) : UInt8 :=
if r:pc >= st.size
then
0
else
st.get {val:=pc,isLt:=(by omega)}
-- Read `n` bytes from the code sequence starting a given `pc` position.
@[simp]
def EvmCode.slice(st:EvmCode)(pc:Nat)(n:Nat)(p:n≤32) : List UInt8 :=
let head := (st.data.splitAt pc).snd
let bytes := head.takeD n 0
-- Prove bytes has at most 32 elements.
--have q : bytes.length = n := by sorry
-- Construct FinVec
bytes
-- Decode the instruction at a given `pc` position within the code sequence.
@[simp]
def EvmCode.decode (st:EvmCode)(pc:Nat) : Bytecode :=
-- Read opcode
let opcode := st.read pc;
-- Decode opcode
match opcode.val with
-- 0s: Stop and Arithmetic Operations
| 0x00 => Bytecode.Stop
| 0x01 => Bytecode.Add
| 0x02 => Bytecode.Mul
| 0x03 => Bytecode.Sub
| 0x04 => Bytecode.Div
-- 10s: Comparison & Bitwise Logic Operations
-- 20s: SHA3
-- 30s: Environment Information
-- 40s: Block Information
-- 50s: Stack, Memory Storage and Flow Operations
| 0x50 => Bytecode.Pop
-- 60s & 70s: Push Operations
| 0x60 => Bytecode.Push (st.slice pc 1 (by simp_arith))
| 0x61 => Bytecode.Push (st.slice pc 2 (by simp_arith))
| 0x62 => Bytecode.Push (st.slice pc 3 (by simp_arith))
| 0x63 => Bytecode.Push (st.slice pc 4 (by simp_arith))
| 0x64 => Bytecode.Push (st.slice pc 5 (by simp_arith))
| 0x65 => Bytecode.Push (st.slice pc 6 (by simp_arith))
| 0x66 => Bytecode.Push (st.slice pc 7 (by simp_arith))
| 0x67 => Bytecode.Push (st.slice pc 8 (by simp_arith))
| 0x68 => Bytecode.Push (st.slice pc 9 (by simp_arith))
| 0x69 => Bytecode.Push (st.slice pc 10 (by simp_arith))
| 0x6a => Bytecode.Push (st.slice pc 11 (by simp_arith))
| 0x6b => Bytecode.Push (st.slice pc 12 (by simp_arith))
| 0x6c => Bytecode.Push (st.slice pc 13 (by simp_arith))
| 0x6d => Bytecode.Push (st.slice pc 14 (by simp_arith))
| 0x6e => Bytecode.Push (st.slice pc 15 (by simp_arith))
| 0x6f => Bytecode.Push (st.slice pc 16 (by simp_arith))
-- 80s: Duplication Operations
| 0x80 => Bytecode.Dup {val:=0, isLt:=(by simp_arith)}
| 0x81 => Bytecode.Dup {val:=1, isLt:=(by simp_arith)}
| 0x82 => Bytecode.Dup {val:=2, isLt:=(by simp_arith)}
| 0x83 => Bytecode.Dup {val:=3, isLt:=(by simp_arith)}
| 0x84 => Bytecode.Dup {val:=4, isLt:=(by simp_arith)}
| 0x85 => Bytecode.Dup {val:=5, isLt:=(by simp_arith)}
| 0x86 => Bytecode.Dup {val:=6, isLt:=(by simp_arith)}
| 0x87 => Bytecode.Dup {val:=7, isLt:=(by simp_arith)}
| 0x88 => Bytecode.Dup {val:=8, isLt:=(by simp_arith)}
| 0x89 => Bytecode.Dup {val:=9, isLt:=(by simp_arith)}
| 0x8a => Bytecode.Dup {val:=10, isLt:=(by simp_arith)}
| 0x8b => Bytecode.Dup {val:=11, isLt:=(by simp_arith)}
| 0x8c => Bytecode.Dup {val:=12, isLt:=(by simp_arith)}
| 0x8d => Bytecode.Dup {val:=13, isLt:=(by simp_arith)}
| 0x8e => Bytecode.Dup {val:=14, isLt:=(by simp_arith)}
| 0x8f => Bytecode.Dup {val:=15, isLt:=(by simp_arith)}
-- 90s: Exchange Operations
-- a0s: Logging Operations
-- f0s: System operations
| _ => Bytecode.Invalid
example : (EvmCode.decode #[0x60,0xf5] 0) = Bytecode.Push [0x60] :=
by
simp
David Pearce (May 26 2024 at 23:14):
Removed UInt4
, FinVec
and Bytes32
David Pearce (May 26 2024 at 23:15):
Error message is:
(match #[96, 245][0].val with
| 0 => Bytecode.Stop
| 1 => Bytecode.Add
| 2 => Bytecode.Mul
| 3 => Bytecode.Sub
| 4 => Bytecode.Div
| 80 => Bytecode.Pop
| 96 => Bytecode.Push [(List.splitAt 0 [96, 245]).snd.head?.getD 0]
| 97 => Bytecode.Push [(List.splitAt 0 [96, 245]).snd.head?.getD 0, (List.splitAt 0 [96, 245]).snd.tail.head?.getD 0]
| 98 =>
...
Which makes sense, except that of course I'm expecting it to simplify away all the dead cases.
Henrik Böving (May 26 2024 at 23:19):
Right. This does make sense and I think we can get that kind of issue fixed.
As a general remark on your code, you are using the data
function on Array
to obtain a List
. That is generally a bad idea. You want to express your algorithm in terms of Array
operations only and use the Array.data
function only in proofs if at all. This will make your code both more performant and should also make it easier to reason about (though it will not fix this particular issue) as the proof automation on Array
is geared towards working with Array
and not the conversion function to linked lists as much.
David Pearce (May 26 2024 at 23:21):
As a general remark on your code, you are using the data function
hmmmm, there is only one case in that example above: st.data.splitAt
Henrik Böving (May 26 2024 at 23:22):
Yes, this is already not a good idea. Calling Array.data
on your array will take the array and convert it into a linked list in O(n), that's not something you want to do
David Pearce (May 26 2024 at 23:22):
Array does not have many functions defined on it, at least in Batteries: https://leanprover-community.github.io/mathlib4_docs/Batteries/Data/Array/Basic.html
David Pearce (May 26 2024 at 23:23):
Yes, this is already not a good idea. Calling Array.data on your array will take the array and convert it into a linked list in O(n), that's not something you want to do
Well, I understand that. But, I did not easily find a better way of doing it. All I really want is a slice of the array from an index n
to a later index m
. I was surprised I couldn't find that in Batteries.
Henrik Böving (May 26 2024 at 23:24):
There does exist a type called Subarray
for this but it has quite little theory built around it as nobody so far has been interested in using it.
David Pearce (May 26 2024 at 23:25):
There's also ByteArray
I just noticed ... which might be helpfuil in this case.
David Pearce (May 26 2024 at 23:25):
There does exist a type called Subarray for this but it has quite little theory built around it as nobody so far has been interested in using it.
Yeah, I just don't think my use case is well supported by Lean at this time.
Henrik Böving (May 26 2024 at 23:25):
ByteArray
is an optimized version of Array UInt8
with the UInt8
being completely unboxed, note that this one has even less theory built around it. This is again mostly because nobody has used it yet.
Henrik Böving (May 26 2024 at 23:26):
writing simulators for low level architectures is very much something that's possible right now, for example https://github.com/leanprover/LNSym does this for Armv8
Henrik Böving (May 26 2024 at 23:32):
All that being said, if you do not care about performance because you do not actually want to execute thus program, running this whole thing with List instead of array is perfectly fine and it will give you a much smoother time.
David Pearce (May 26 2024 at 23:34):
running this whole thing with List instead of array is perfectly fine and it will give you a much smoother time.
Well, its a learning exercise :) And, yes, this is definitely the path of least resistance!
Mario Carneiro (May 27 2024 at 00:41):
David Pearce said:
Array does not have many functions defined on it, at least in Batteries: https://leanprover-community.github.io/mathlib4_docs/Batteries/Data/Array/Basic.html
Note that Batteries only contains supplemental definitions, on top of the pre-existing set of Array functions in core. These functions are distributed about somewhat, but most are in Init.Prelude and Init.Data.Array.Basic
David Pearce (May 27 2024 at 00:43):
So, Array.extract
is what I was originally looking for then I think
David Pearce (May 27 2024 at 00:45):
Its hard to find things if you don't know exactly where to look. I used the search function in the mathlib4 docs, but didn't find it :frown:
Mario Carneiro (May 27 2024 at 00:45):
Have you tried loogle?
David Pearce (May 27 2024 at 00:45):
?
Mario Carneiro (May 27 2024 at 00:46):
@loogle Array
loogle (May 27 2024 at 00:46):
:search: Array, Array.empty, and 4005 more
David Pearce (May 27 2024 at 00:46):
No, I haven't come across that before. Will bookmark it :)
David Pearce (May 27 2024 at 00:47):
Array.extract
is there ... buried under a bunch of other stuff.
Mario Carneiro (May 27 2024 at 00:49):
well, I made a very broad search there.
@loogle Array ?a -> Nat -> Nat -> Array ?a
loogle (May 27 2024 at 00:49):
:search: Array.extract, Array.appendCore.loop, and 11 more
David Pearce (May 27 2024 at 00:50):
Yeah, but what else can you do if you don't know what you're looking for. "Array slice" did not provide any useful answers for me.
David Pearce (May 27 2024 at 00:50):
Error
application type mismatch
Array slice
argument
slice
has type
Lean.ParserDescr : Type
but is expected to have type
Type ?u.144514 : Type (?u.144514 + 1)
Mario Carneiro (May 27 2024 at 00:50):
loogle lets you search by type signature, as well as by label segment
David Pearce (May 27 2024 at 00:51):
Yeah, i see
Mario Carneiro (May 27 2024 at 00:51):
note that it has structured input, see the loogle docs
David Pearce (May 27 2024 at 00:51):
Yeah, and if I do that then Array.extract
is the top hit. Ok, so that works :)
Mario Carneiro (May 27 2024 at 00:51):
if you want unstructured (natural language) input, there is also moogle.ai
Mario Carneiro (May 27 2024 at 00:53):
and searching for "Array slice" on moogle actually yields Array.extract, near the bottom of the first page
Last updated: May 02 2025 at 03:31 UTC