Zulip Chat Archive

Stream: general

Topic: (Byte)Array lemmas

ZHAO Jiecheng (Mar 22 2024 at 04:04):

Scott Morrison said:

Could you say more about these lemmas? Generally I'd prefer lemmas about Array and ByteArray go as high as possible (i.e. Lean or Std).

Something like this (s1 + s2 ≤ a.size) → (e2 + s1 ≤ e1) → (e1 ≤ a.size) → (a.extract s1 e1).extract s2 e2 = a.extract (s1 + s2) (s1 + e2) to show that extract twice is equal to extract once with some shift. And the lemma to prove this in my path.

Kim Morrison (Mar 22 2024 at 04:05):

Could you please PR such lemmas direct to the lean4 repo? Please feel free to ping me when they need review.

ZHAO Jiecheng (Mar 22 2024 at 04:06):

Scott Morrison said:

Could you please PR such lemmas direct to the lean4 repo? Please feel free to ping me when they need review.

So I shall fork the lean4 and then PR that from my fork? Where should it place?

Kim Morrison (Mar 22 2024 at 04:06):

Yes, that's right.

Kim Morrison (Mar 22 2024 at 04:07):

Hopefully these lemmas could go in Init/Data/Array/Lemmas.lean, in a section /-! # extract -/.

ZHAO Jiecheng (Mar 22 2024 at 06:26):

Scott Morrison said:

Hopefully these lemmas could go in Init/Data/Array/Lemmas.lean, in a section /-! # extract -/.

I have set up the env. But I found I developed it with some tactic provided by Mathlib. How can I change my code in such a low level lib like init. Is there any advice?

Kim Morrison (Mar 22 2024 at 06:52):

Which tactics? Can you give me a link to your code?

Kim Morrison (Mar 22 2024 at 06:53):

We can put the code somewhere lower if the use of Mathlib tactics makes a big difference, but I'd prefer to make sure we can't easily remove them first.

Notification Bot (Mar 22 2024 at 06:55):

8 messages were moved here from #mathlib4 > github permission by Johan Commelin.

ZHAO Jiecheng (Mar 22 2024 at 07:37):

Scott Morrison said:

We can put the code somewhere lower if the use of Mathlib tactics makes a big difference, but I'd prefer to make sure we can't easily remove them first.

https://gist.github.com/JiechengZhao/d5dcafd2e34c38703c2e2d7bdcc4c4bd This is my code, it works before but full of errors now when I pasted it in. I am fixing and rearranging it. I found rw works different and nth_rw is missing.

ZHAO Jiecheng (Mar 22 2024 at 08:10):

Scott Morrison said:

Which tactics? Can you give me a link to your code?

Sorry, the rw should be the same and it is simp caused the problem. simp used some lemma in mathlib automatically, and in init it works less and then the following rw failed. I think there is a simp like tactic can print out all the lemmas it used, what is that?

Kim Morrison (Mar 22 2024 at 08:10):

simp?

ZHAO Jiecheng (Mar 22 2024 at 08:11):

Scott Morrison said:

simp?

Yes, simp, It will automatically use any @[simp] lemma in mathlib. And this blocks me.

Kim Morrison (Mar 22 2024 at 08:12):

simp? is the command that tells you what simp lemmas simp used. :-)

ZHAO Jiecheng (Mar 22 2024 at 08:12):

Scott Morrison said:

simp? is the command that tells you what simp lemmas simp used. :-)

Understood. Thank you.

Kim Morrison (Mar 22 2024 at 08:13):

Okay, this code is not ready to go in the lean4 repo. Perhaps you can put some small parts of it in a PR to mathlib for some review? I don't really want it going in mathlib, but it needs cleanup first and that's how you might get it.

ZHAO Jiecheng (Mar 22 2024 at 08:17):

Scott Morrison said:

Okay, this code is not ready to go in the lean4 repo. Perhaps you can put some small parts of it in a PR to mathlib for some review? I don't really want it going in mathlib, but it needs cleanup first and that's how you might get it.

I think so. It is not a piece of neat and clean code yet. Could you give me an invitation to the mathlib4. We can review it there first.

Kim Morrison (Mar 22 2024 at 08:57):

Okay, invitation sent!

Kim Morrison (Mar 22 2024 at 08:59):

@ZHAO Jiecheng, here's a suggestion.

Could you make an initial PR that just shows the final theorems you want (or you use proof_wanted: look in Std for examples)? (i.e. without the proofs or all the intermediate lemmas).

Kim Morrison (Mar 22 2024 at 09:00):

From this, we can work out what is actually needed, and then work backwards from there.

ZHAO Jiecheng (Apr 19 2024 at 00:30):

Hi @Kim Morrison, since the Array part has been added, let's talk about the ByteArray part. There are two points to it:

To speed up the calculation, the append operator in ByteArray is defined via copeslice. However, most proof utilities come from Array and List. So, I added an append_to_data_append theorem to connect them. Shall we put it in std or lean. https://gist.github.com/JiechengZhao/d5dcafd2e34c38703c2e2d7bdcc4c4bd#file-arraytools-lean-L253
When I write something, I feel it is common to pad zeros to a ByteArray. So, I also added some tools about it. But I am not sure if there are some better practices. https://gist.github.com/JiechengZhao/d5dcafd2e34c38703c2e2d7bdcc4c4bd#file-arraytools-lean-L81

Kim Morrison (Apr 19 2024 at 01:16):

Sorry, large gists take a long time to read and work out what you're wanting to do. Could you give a briefer summary here?

ZHAO Jiecheng (Apr 19 2024 at 01:20):

Add this to somewhere

theorem ByteArray.append_eq_data_append {a b: ByteArray}: a ++ b = {data := a.data ++ b.data} := by

Add some utils like this:

def ByteArray.zeros (n : Nat) : ByteArray :=
  (double_until n {data := #[0, 0, 0, 0]: ByteArray} (by decide)).extract 0 n

theorem ByteArray.zeros_get_zero (h : i < n)
  : (ByteArray.zeros n)[i]'(by rw [zeros_size] ; exact h) = 0 := by

@[simp]
theorem ByteArray.zeros_size: (zeros n).size = n := by

Kim Morrison (Apr 19 2024 at 01:21):

Should the first one not be (a ++ b).data = ...?

ZHAO Jiecheng (Apr 19 2024 at 01:24):

Kim Morrison said:

Should the first one not be (a ++ b).data = ...?

They should be the same. However, I think when you rewrite something in ByteArray, (a ++ b) would be more convenient to use because there is no need to rewrite the ByteArray to Array first.

Kim Morrison (Apr 19 2024 at 01:29):

I'm skeptical that it would even be a good idea to rewrite by append_eq_data_append as written.

Kim Morrison (Apr 19 2024 at 01:31):

For your second point, there is no double_until, so I presume you are suggesting adding a bunch of other stuff. Start with that?

ZHAO Jiecheng (Apr 19 2024 at 01:34):

Kim Morrison said:

I'm skeptical that it would even be a good idea to rewrite by append_eq_data_append as written.

Any suggestion about how to apply lemmas in Array to ByteArray? I saw append_eq_data_append in Array and List. I usually use that in my proofs, but if there is a more smart tactic, please tell me.

ZHAO Jiecheng (Apr 19 2024 at 01:36):

Kim Morrison said:

For your second point, there is no double_until, so I presume you are suggesting adding a bunch of other stuff. Start with that?

Alright. Should I add that to Mathlib first?

Kim Morrison (Apr 19 2024 at 01:43):

Sure, open a PR, or describe here.

Kim Morrison (Apr 19 2024 at 01:44):

Could you link the existing append_eq_data_append? I can't find it.

ZHAO Jiecheng (Apr 19 2024 at 01:45):

double util looks like this, it simply flatten the current bytearray and truncate it to the desired length. :

def ByteArray.double_until (n: Nat) (d: ByteArray) (h: d.size > 0) : ByteArray :=
    if hd: d.size ≥ n then d
    else
      have : n - ByteArray.size (d ++ d) < n - ByteArray.size d := by
        rw [size_append_eq]
        rw [Nat.sub_add_eq]
        apply Nat.sub_lt
        exact tsub_pos_iff_not_le.mpr hd
        exact h
      have : (d ++ d).size > 0 := by
        rw [size_append_eq]
        simp
        exact h
      double_until n (d ++ d) this
termination_by _ => n - d.size

Kim Morrison (Apr 19 2024 at 01:48):

Both of your haves can finish by omega a few steps earlier, I think. :-)

Kim Morrison (Apr 19 2024 at 01:49):

I'd like to be a bit more convinced of the end uses of these utilities first before adding them to the core Lean API. Either lets discuss that, or perhaps send these to Std for now.

ZHAO Jiecheng (Apr 19 2024 at 01:51):

Sorry, the append_eq_data_append is a lemma I wrapped for convenience. The original version is here. It looks like what you suggested: https://leanprover-community.github.io/mathlib4_docs/Init/Data/Array/Lemmas.html#Array.appendList_data

Kim Morrison (Apr 19 2024 at 01:51):

Okay, yeah, I am unconvinced by the usefulness of your new versions.

ZHAO Jiecheng (Apr 19 2024 at 02:08):

Here is why I introduce zeros in my implementation of a memory: when you retrieve something from memory but exceed the current size, the result is padded with zeros to ensure the get function returns the desired size of content. When you set something to memory but the current memory is too short, the ByteArray is extended to the desired length first. I guess this would be common for people wanting to mimic memory in Lean.

abbrev Memory := ByteArray

namespace Memory

def get_helper (data: Memory) (offset size: Nat) : ByteArray :=
  data.extract offset (offset + size) ++
    ByteArray.zeros (min (offset + size - data.size) size)

def get (data : Memory) (offset size: U256) : ByteArray :=
  get_helper data offset.val size.val


def set_helper (self: Memory) (offset : Nat) (value: ByteArray) (target_size: Nat)
  : Memory :=
  let value_ := if value.size < target_size
    then (value ++ ByteArray.zeros (target_size - value.size))
    else value
  let data_ := if self.size < offset + target_size
    then (self ++ ByteArray.zeros (offset + target_size - self.size))
    else self
  value_.copySlice 0 data_ offset target_size

def set (self: Memory) (offset : U256) (value: ByteArray) (target_size: U256)
  : Memory :=
  set_helper self offset.val value target_size.val

end Memory

ZHAO Jiecheng (Apr 19 2024 at 02:16):

I guess you are right, almost no reference of appendList_data in mathlib and std.

Timo Carlin-Burns (Apr 19 2024 at 02:33):

ZHAO Jiecheng said:

when you retrieve something from memory but exceed the current size, the result is padded with zeros to ensure the get function returns the desired size of content. When you set something to memory but the current memory is too short, the ByteArray is extended to the desired length first.

For this use case I would suggest adding an Array.resize function with the same specification as Rust's Vec::resize. Then you would use ByteArray.resize data (max data.size idx) 0 to extend with 0s if the index is out of bounds.

ZHAO Jiecheng (Apr 19 2024 at 02:46):

Timo Carlin-Burns said:

ZHAO Jiecheng said:

when you retrieve something from memory but exceed the current size, the result is padded with zeros to ensure the get function returns the desired size of content. When you set something to memory but the current memory is too short, the ByteArray is extended to the desired length first.

For this use case I would suggest adding an Array.resize function with the same specification as Rust's Vec::resize. Then you would use ByteArray.resize data (max data.size idx) 0 to extend with 0s if the index is out of bounds.

Yes, repeating only zeros in a byteArray may not be very common. Resizing an array is a better idea, especially if you can assign a different default value.

I will add my current code to a PR first. Though it is not we want now but we can start there.

Timo Carlin-Burns (Apr 19 2024 at 02:52):

For the use case of creating an array of all 0s, is there a reason to construct such an array by doubling? It seems like it would be better to have a function Array.replicate which is analogous to docs#List.replicate and constructs the array via repeated applications of Array.push

Timo Carlin-Burns (Apr 19 2024 at 02:54):

I expect doubling to have much worse performance as it has to create new allocations at each iteration. Array.push can re-use the same allocation, especially if you start with Array.mkEmpty n to set the appropriate initial capacity

ZHAO Jiecheng (Apr 19 2024 at 02:56):

Timo Carlin-Burns said:

For the use case of creating an array of all 0s, is there a reason to construct such an array by doubling? It seems like it would be better to have a function Array.replicate which is analogous to docs#List.replicate and constructs the array via repeated applications of Array.push

I want to take advantage of copySlice which is written in C, but I think you've convinced me a bit that this is not a good idea.

ZHAO Jiecheng (Apr 19 2024 at 04:27):

@Kim Morrison @Timo Carlin-Burns You convince me that this code might be useless, and I will not go ahead. Thank you for your time.

Kim Morrison (Apr 19 2024 at 04:28):

Sorry for the discouraging response. Designing the APIs for these basic datatypes takes a while. Looking forward to further contributions!

Last updated: May 02 2025 at 03:31 UTC