Zulip Chat Archive

Stream: CSLib

Topic: SortedArray design (#247)

Shreyas Srinivas (Dec 25 2025 at 14:58):

Why reinvent Finvecs from mathlib?

Shreyas Srinivas (Dec 25 2025 at 15:01):

import Mathlib

structure SortedArray (α : Type u) (size : Nat) [LinearOrder α] where
  array : Fin size → α
  sorted : ∀ i j : Fin size, i < j → array i ≤ array j

Shreyas Srinivas (Dec 25 2025 at 15:02):

~~Also docstring comments only seem to work on top level declarations at the moment~~

Shreyas Srinivas (Dec 25 2025 at 15:04):

Lean is complaining about your use of where in the first place

Shreyas Srinivas (Dec 25 2025 at 15:07):

Another thing : you can save some effort with sorting if you count List.get calls as unit calls. Then you have sorted lists as well as all the List API including List.Sorted : https://leanprover-community.github.io/mathlib4_docs/Mathlib/Deprecated/Sort.html#List.Sorted

Shreyas Srinivas (Dec 25 2025 at 15:17):

Also core already has binary search on arrays : https://leanprover-community.github.io/mathlib4_docs/Init/Data/Array/BinSearch.html#Array.binSearch

Sorrachai Yingchareonthawornchai (Dec 25 2025 at 15:20):

Please read the content of the PR.

Shreyas Srinivas (Dec 25 2025 at 15:21):

I have already read and made review comments

Shreyas Srinivas (Dec 25 2025 at 15:22):

Also Nat indexing would make your array potentially infinite and then it would just be a stream which is also defined in mathlib.

Shreyas Srinivas (Dec 25 2025 at 15:25):

The indexing proof obligations are really not a big issue

Shreyas Srinivas (Dec 25 2025 at 15:26):

There is a whole tactic for that get_elem_tactic

Chris Henson (Dec 25 2025 at 15:36):

In the future let's try to not veer too far off the topic of the thread (a specific problem adding a docstring), but since the discussion has started, I think we do need to discuss the design of SortedArray in this PR.

My first question is how this fits in with cslib#165, where you defined operations on lists. We should standardize on these modules working over the same types. I agree with @Shreyas Srinivas that, especially given that there is already binary search in core, it doesn't seem to be too much of a burden to do this over arrays (or lists?).

Sorrachai Yingchareonthawornchai (Dec 25 2025 at 16:07):

I tried it a bit using an array in the core; I think the proof becomes quite complex quickly (since more invariants are needed to verify the array index). If experts know a better way to write proofs using array in the core, I am fine with that. The principle here is that I want the analysis to be as simple as possible. I cannot think a simpler way than this. My feeling is that objects in the core are not designed to aid analysis and proofs.

If one insists on writing proofs on the core, I am happy to do so in a separate PR. One way to implement this strategy is to reduce this function to the form of the array representation. But if someone comes up with a direct proof that is as simple as this function representation, I am fine with that.

Sorrachai Yingchareonthawornchai (Dec 25 2025 at 16:10):

TimeM in general is a way to keep track of time complexity. In the analysis of binary search, I track only the number of array accesses, which is equal to the number of three-way comparisons (<, =, >). It is not hard to verify that it tracks correctly.

Sorrachai Yingchareonthawornchai (Dec 25 2025 at 16:39):

For a case study, this paper illustrates formalization efforts for binary search. For me, using Lean to formalize binary search with functional array has been relatively easy and fun. Not sure if I need to write a technical report on this. But for sure, using an array in the core would not be as clean.

Shreyas Srinivas (Dec 25 2025 at 16:44):

It’s a paper explaining verifun. Binary search is a good case study. It is fairly normal in PL and verification papers to choose basic examples for two reasons : formal verification (even of seemingly simple things) can be incredibly hard. This was even more so in 2002. There is usually little room in the paper to explain complicated examples. Further that would defeat the purpose of a case study, which is to showcase the tool itself.

Shreyas Srinivas (Dec 25 2025 at 16:46):

I don’t immediately see if it would translate well to lean. Further my objection is not to using Lean Array. It is to not using Finvecs or streams for which the API already exists. It’s about reinventing definitions and (more so) notations that already exist.

Chris Henson (Dec 25 2025 at 17:01):

I am in full agreement with Shreyas. We have plenty of existing definitions like FinVec, Array, etc. that are compatible with the TimeM notion. We should pick one and standardize on it for this area of CSLib.

Chris Henson (Dec 25 2025 at 19:15):

I realized I have permission to move messages within this channel (thanks whoever set that up!) so have moved these design questions into a separate topic.

Shreyas Srinivas (Dec 25 2025 at 21:31):

One small suggestion on top of all else above: when making a PR it is better to use a non-main branch of your fork. Otherwise you risk adding the commits of this PR to all future PRs

Shreyas Srinivas (Dec 25 2025 at 21:32):

and I just noticed that size := n makes no sense here. When initialised there is no guarantee that size = n. All it does is set size to n if no one supplies a different value.

Chris Henson (Dec 25 2025 at 21:52):

Shreyas Srinivas said:

One small suggestion on top of all else above: when making a PR it is better to use a non-main branch of your fork. Otherwise you risk adding the commits of this PR to all future PRs

It is also nice for the reviewers because it is easier to push a commit to the PR directly when it's not made from the main branch.

Eric Wieser (Dec 25 2025 at 22:19):

Shreyas Srinivas said:

import Mathlib

structure SortedArray (α : Type u) (size : Nat) [LinearOrder α] where
  array : Fin size → α
  sorted : ∀ i j : Fin size, i < j → array i ≤ array j

This is just docs#OrderHom Nat α

Shreyas Srinivas (Dec 25 2025 at 22:26):

Fin size not Nat, but essentially yes

Chris Henson (Dec 25 2025 at 22:32):

Eric Wieser said:

Shreyas Srinivas said:

import Mathlib

structure SortedArray (α : Type u) (size : Nat) [LinearOrder α] where
  array : Fin size → α
  sorted : ∀ i j : Fin size, i < j → array i ≤ array j

This is just docs#OrderHom Nat α

I thought of suggesting this, but LinearOrder seemed like a complication? I'd also just rather work with arrays/lists if it's not significantly more complex for consistency with other modules.

Shreyas Srinivas (Dec 25 2025 at 22:33):

Honestly I think with mvcgen that would be the best way to go

Shreyas Srinivas (Dec 25 2025 at 22:33):

But as this PR stands, it still needs quite a bit of fixing (see above). For one thing, I believe indices can be Fin without too much effort.

Shreyas Srinivas (Dec 25 2025 at 22:34):

If anything, it is a red flag if we cannot prove the side conditions (note the type of lo in docs#Array.binSearchAux )

Shreyas Srinivas (Dec 25 2025 at 22:35):

Another broader point: I think we need to have a TimeMonad m typeclass for a given monad m

Shreyas Srinivas (Dec 25 2025 at 22:36):

Instead of a single TimeM monad

Shreyas Srinivas (Dec 25 2025 at 22:36):

Because there is no reason this binary search can only run in the Time Monad. I just don't want to tinker with this until my Free monad stuff is done which will be around the end of the holidays

Shreyas Srinivas (Dec 26 2025 at 03:48):

Can I make commits to this PR somehow? I ran into something weird called "internal exception #3"

Shreyas Srinivas (Dec 26 2025 at 03:50):

which incredibly happens when a proof seems to go through (tried done)

Shreyas Srinivas (Dec 26 2025 at 03:51):

internal exception #3 is the complete error message

Last updated: Feb 28 2026 at 14:05 UTC