Zulip Chat Archive
Stream: computer science
Topic: Splay trees
Derrick R Burns (Aug 23 2025 at 04:51):
As a first experiment with Lean, I created proofs of correctness and amortized running time for various operations on splay trees. I used Claude to create the proofs. Is there any appetite for these proofs in Mathlib.
Specifically, I've implemented a complete formalization of splay trees for mathlib4:
- 6 modules with full proofs (no sorries)
- Amortized O(log n) complexity analysis
- Working set property, dynamic finger theorem
- Branch: https://github.com/derrickburns/mathlib4/tree/splay-mathlib4
If there is interest, what is the process?
Yuyang Zhao (Aug 23 2025 at 10:27):
I haven't looked at everything in detail, but I don't think it proves the dynamic finger theorem.
Derrick R Burns (Aug 23 2025 at 17:32):
Yuyang Zhao said:
I haven't looked at everything in detail, but I don't think it proves the dynamic finger theorem.
Perhaps you are thinking of the open dynamic optimality conjecture?
Yuyang Zhao (Aug 23 2025 at 18:46):
I meant that your lean code does not correctly state the dynamic finger theorem (and a lot of other things). Were they written by Claude? LLMs often fail to formalize a definition or statement correctly, and you need to check it yourself.
Derrick R Burns (Aug 23 2025 at 19:07):
Ah! Yes this is my first test of Claude for this. I will check the work! Thank you for this feedback!
Kim Morrison (Aug 23 2025 at 23:17):
@Derrick R Burns, you need to clearly label work done largely by LLMs as such, up front. Otherwise you are egregiously wasting other people's time. LLMs can produce both good and useful code. But they usually don't.
I'm a bit annoyed that I gave you feedback when you first posted about this work and you reposted it without first addressing my feedback. That's really wasting people's time, and just adds to the negative feeling towards AI assisted work.
Derrick R Burns (Aug 24 2025 at 01:21):
@Kim Morrison I’m sorry to annoy you. I have likely read more pull requests than most people you know. I have no issue with providing and receiving constructive feedback. I’m particularly kind to newcomers. I suggest you consider doing the same.
You said that my work was likely out of scope for Mathlib. You said I didn’t use a Mathlib tree. Perhaps you are annoyed that I did not make changes to use the Mathlib tree? Or perhaps you are annoyed that I used AI and it produced erroneous statements of theorems that I did not catch? I can see where this might be annoying. But consider this...
Your other advice was to post to the computer science project. I did.
I clearly stated in my very first message that the files were produced by Claude:
As a first experiment with Lean, I created proofs of correctness and amortized running time for various operations on splay trees. I used Claude to create the proofs. Is there any appetite for these proofs in Mathlib.
Perhaps you can suggest a more effective way to communicate that.
I’m trying to learn lean, automated theorem proving, the calculus of inductive constructions, and to understand the limits of AI, particularly in the area of logical deduction and reasoning.
I am 61, an actual former student of Robert Tarjan from 1985 when he first discovered splay trees, and recently retired. I’m trying to figure out how to engage a new community. I appreciate all suggestions. My vision is to provide proofs of complexity bounds for his proofs in appreciation of his advocacy for me. If my time is not valued here, I can find something else to do!
Alternatively, perhaps we can start over. :)
Derrick R Burns (Aug 24 2025 at 01:22):
(deleted)
Matt Diamond (Aug 24 2025 at 02:38):
I've loaded one of your files into the web editor and I'm seeing errors. Did you attempt to verify that the code type-checked correctly?
Derrick R Burns (Aug 24 2025 at 05:33):
I did. Let me try again to confirm. Also, as pointed out earlier, some of the theorems are not correct. Let me fix it. Thx!
Kim Morrison (Aug 24 2025 at 06:41):
My apologies --- I simply read your message too quickly and missed the reference to Claude. That was clearly sufficient flagging, and I was in error to criticize as quickly as I did.
In particular, I hope that:
Alternatively, perhaps we can start over. :)
is a good option!
Kim Morrison (Aug 24 2025 at 06:41):
(I'm actually quite enthusiastic about the prospects for LLMs to assist in both programming and theorem proving in Lean; indeed somewhat exasperated sometimes about resistance to it, and so when I see what I (in this case mistakenly!) think is someone "dragging down" LLMs by failing to explain that they are being used, I can get critical quickly.)
Kim Morrison (Aug 24 2025 at 06:41):
Now, that said, my criticism about not using the Mathlib tree is only a small one. But I think interesting to think about in the context of using LLMs to assist new developments. AIs are very prone to "doing things over again", and it's easy for them, and its easy for the human driving it to just accept the duplication. This is something that both needs to be addressed in prompting (e.g. when you notice an LLMs duplicating things, update Claude.md to steer it away from doing so!), but also I'm curious how it interacts with the review process. When, as has happened, a reviewer says "actually, your Tree definition is isomorphic to the existing definition", can you then turn that into detailed instructions to the LLM about how to refactor the entire development? (e.g. a basic question is: do we define abbrev SplayTree := Tree, and then put functions in that namespace, or do we just add new operations on the existing tree, e.g. Tree.splayInsert; can an LLM help with that question? can it coherently implement a decision on this question?).
Kim Morrison (Aug 24 2025 at 06:41):
Getting material into Mathlib or Cslib always requires substantial revision --- and usually the material gets much better along the way. Can LLMs cope with this process? I think an easy pitfall here is that the ridiculous speed of current LLMs means a development can quickly get so big that it becomes expensive to be responsive to review comments that require refactors of the entire corpus. But maybe that expense can be paid off by the speed of the LLM.
Sorrachai Yingchareonthawornchai (Sep 16 2025 at 20:26):
theorem sequential_access_optimal (xs : List α) (t : Tree α)
(h_sorted : xs.Sorted (· < ·)) (h_all : ∀ x ∈ xs, x ∈ t) :
∃ (total_cost : ℝ), total_cost ≤ 3 * xs.length :=
This statement does not seem to capture the cost of the splay tree. It states that there is a real number at most 3*xs.length.
Last updated: Dec 20 2025 at 21:32 UTC