Zulip Chat Archive

Stream: new members

Topic: Understanding newbie questions

Will Crichton (Aug 07 2024 at 20:40):

Hi folks, I am giving a Topos Institute talk in a few weeks about the human factors of formalized mathematics. I'm gathering some anecdata about the most common challenges faced by Lean newbies. So if you frequently answer questions in #new members, a few questions for you:

What are the most common categories of questions asked by newbies? Does it differ between mathematicians and software engineers?
Which Lean features do newbies struggle with the most early on? Which Lean features does everyone seem to struggle with no matter how long they've used Lean?
What are common misconceptions about formalized mathematics that you observe in newbie questions? What resources do you use to correct those misconceptions?

Feel free to answer any or all of these, and thank you in advance!

Eric Wieser (Aug 07 2024 at 21:09):

Two reasonably common categories are:

I am having installation troubles, due to some combination of:
- A weird system or hostile antivirus
- Following bad instructions from lean 3, ChatGPT, an outdated course repo
- A bug
I have a cryptic error message that I don't understand the meaning of

Will Crichton (Aug 07 2024 at 21:11):

@Eric Wieser I guess Lean does often go viral, so the former is understandable. Re: errors, are there particular kinds of error messages you see brought up a lot? Is there a feedback loop where the Lean compiler devs get informed about confusing errors?

Edward van de Meent (Aug 07 2024 at 21:13):

looking at my answers in this channel, the kinds of questions that i tend to answer (beware selection bias) are typically ones resulting from lack of familiarity/experience with lean and/or mathlib. example patterns are:

"how do i formalize (mathematical) notion X"
"how do i prove property X of notion Y" (similar to "is there code for X", but without direction as to what lemma is wanted/needed/expected). this one seems to be pretty frequent as a pattern.
"how do i change the goal state in X way", where generally the answer is "use Y tactic"
"how do i prove (property X for) instance Y for type Z" (can be seen as sub-pattern of the second, but answers can take other shapes)

Eric Wieser (Aug 07 2024 at 21:16):

Is there a feedback loop where the Lean compiler devs get informed about confusing errors?

I think this loop is very slow, because there are not very many Lean compiler devs and improving the error messages of things that are not supposed to work is usually lower priority that fixing bugs in places where things are supposed to work. Issues like lean4#2950 are the mechanism behind such a loop

Edward van de Meent (Aug 07 2024 at 21:17):

as a general pattern, i'd say for mathematicians it can be non-trivial to know what definitions in mathlib corresponds to the "paper" mathematics they're interested in, as this tends to come with experience using both lean and mathlib

Eric Wieser (Aug 07 2024 at 21:18):

To that end, you might be interested in what happens in #Is there code for X? too

Edward van de Meent (Aug 07 2024 at 21:22):

although i do think in there there is a higher likelihood of people providing a #mwe for what lemma they want, which makes me think that that doesn't quite reflect people being new to lean, per se...

Will Crichton (Aug 07 2024 at 21:25):

This is super helpful, thank you both for the insights! @Edward van de Meent: is it fair to summarize the principal issue as "discoverability", i.e. discovering structures, tactics, etc.? or is it more perhaps "making connections", as in people know of a tactic but don't realize it's applicable in a given situation?

Edward van de Meent (Aug 07 2024 at 21:40):

i don't think i'd say that "things are hard to discover", but rather that "lack of experience makes things harder to discover" or "for newbies, things can be hard to discover".
For me, it took some time to get a good idea of what kind of shape i should expect lemmas and lemma names to have, and how to search for these things (using things like @ loogle and the searchbar and links in the mathlib docs).

Eric Wieser (Aug 07 2024 at 21:41):

Edward van de Meent said:

and how to search for these things

Discovering how to discover is one of the more important tricks!

Edward van de Meent (Aug 07 2024 at 21:41):

i can think of one case where someone didn't realize X was applicable in his question but did know about X, but i don't think that case is representative...

Will Crichton (Aug 07 2024 at 21:44):

"How to discover" is a great point. (Also TIL about Loogle! I'd just been using the search bar.....) Like one strategy I use is when I want to apply? in a complicated proof, it often times out. So I'll try and extract a simplified version of the subgoal into a standalone lemma, use apply? then translate the result back into the complicated proof.

Are there any resources that teach Lean users about stuff like Loogle, or effective use of the search bar?

Edward van de Meent (Aug 07 2024 at 21:45):

for tactics i'd definitely say that there are some that are hard to come across when reading docs. i'd say the main source of learning about tactics is from the books and games, and there are definitely some niche ones which you have to know about to realise you should use them.

Will Crichton (Aug 07 2024 at 21:49):

So this seems like a good use case for LLMs (which I don't say often or lightly!), although my experience with the internet-trained ones (e.g. ChatGPT, Claude) and the Lean-specific ones (LeanCopilot) has been that LLMs are at best marginally useful. Have y'all seen the same?

Eric Wieser (Aug 07 2024 at 21:49):

One of the reasons that tactics are a painpoint is that the documentation does not yet include them

Eric Wieser (Aug 07 2024 at 21:50):

Which is not so much because the raw documentation hasn't been written, but because the infrastructure for rendering it on the webpage is still in the works

Edward van de Meent (Aug 07 2024 at 21:50):

well... you can look at Mathlib.Tactics and go through one by one, but that is hardly the ideal way to come by them.

Edward van de Meent (Aug 07 2024 at 21:50):

and then read the docstrings

Edward van de Meent (Aug 07 2024 at 21:51):

but these do sometimes leave me wanting

Eric Wieser (Aug 07 2024 at 21:51):

Compared with https://leanprover-community.github.io/mathlib_docs/tactics.html (for lean 3), which gave you everything in one place.

Will Crichton (Aug 07 2024 at 21:52):

IME, the docstrings are marginally helpful. I've found it more useful to find examples of a tactic being used in Mathlib, and translate those examples into my code.

Edward van de Meent (Aug 07 2024 at 21:54):

yes, quite so. the docstring gives a good general idea of what a tactic does/ is intended for, but the examples there (for example, look at induction or cases) don't do a great job at getting across what the various syntaxes do (imo).

Will Crichton (Aug 07 2024 at 21:55):

Alright, so the take home message is: discover how to discover so you can induct how to induct.

Edward van de Meent (Aug 07 2024 at 21:56):

i don't know what you mean by "how to induct", but i completely agree with the first part.

Will Crichton (Aug 07 2024 at 21:56):

Oh hehe I just mean "how to use the induction tactic" :-)

Will Crichton (Aug 07 2024 at 21:57):

Anyways but this discussion has been helpful, thank you both again!!

Edward van de Meent (Aug 07 2024 at 21:57):

do note that this merely touches on newbies issues with mathlib, not so much lean.

Shreyas Srinivas (Aug 07 2024 at 21:57):

I answer questions here sometimes. Usually these questions about under documented features

Shreyas Srinivas (Aug 07 2024 at 21:58):

Example: termination proofs

Shreyas Srinivas (Aug 07 2024 at 21:58):

Or naming the condition in a conditional expression

Edward van de Meent (Aug 07 2024 at 21:58):

those are great examples, yes

Shreyas Srinivas (Aug 07 2024 at 21:59):

Once I recall that the answer to a question was autoImplicit introducing a variable

Shreyas Srinivas (Aug 07 2024 at 21:59):

But I also puzzlingly find autoImplicit to be something that even some experts have trouble with

Eric Wieser (Aug 07 2024 at 22:00):

Shreyas Srinivas said:

Once I recall that the answer to a question was autoImplicit introducing a variable

This one you can count the number of occurences of by searching for autoImplicit false in this stream

Shreyas Srinivas (Aug 07 2024 at 22:01):

Sometimes there are debugging questions

Shreyas Srinivas (Aug 07 2024 at 22:01):

Like how to use some debugging feature

Shreyas Srinivas (Aug 07 2024 at 22:02):

The set_option stuff is not easily found in documentation

Shreyas Srinivas (Aug 07 2024 at 22:03):

More broadly I think newbies struggle to come up with MWEs

Shreyas Srinivas (Aug 07 2024 at 22:04):

This is because when you are new to a language, especially a proof assistant, it is hard to understand how to debug things and isolate the cause of errors enough to produce a minimum working example

Shreyas Srinivas (Aug 07 2024 at 22:04):

There is all the learning curve that comes with learning a functional language, plus the proof stuff.

Shreyas Srinivas (Aug 07 2024 at 22:07):

Also, in the math side, it is not easy to pick the right finiteness type for a given discrete math problem. This was discussed in a different topic recently

Shreyas Srinivas (Aug 07 2024 at 22:10):

Another class of errors comes from typeclasses and implicit parameters. Sometimes you have two seemingly equal objects in a hypothesis or a goal but rw fails, or rfl won't close the goal and so on. Sometimes it is because there are two competing instances of the same typeclass or some implicit parameter was instantiated differently

Shreyas Srinivas (Aug 07 2024 at 22:17):

Another common class of questions is "why doesn't such and such tactic /proof from <insert book/tutorial>" work and the answers range from "this tactic is <insert resource> specific" to "the resource isn't up to date with current changes"

Edward van de Meent (Aug 07 2024 at 22:18):

or alternatively, "the tactic only works for X types/typeclass, and not Y"

Edward van de Meent (Aug 07 2024 at 22:18):

(see omega, ring, abel)

Eric Wieser (Aug 07 2024 at 22:18):

(ring gets me sometimes, because it does not work when Y = Ring!)

Edward van de Meent (Aug 07 2024 at 22:19):

wait what?

Eric Wieser (Aug 07 2024 at 22:19):

Yes, exactly :)

Edward van de Meent (Aug 07 2024 at 22:20):

i was thinking of the fact that you might forget that your fact depends on commutativity or something like that, i didn't know about that one!

Edward van de Meent (Aug 07 2024 at 22:20):

could you give an example?

Eric Wieser (Aug 07 2024 at 22:20):

import Mathlib

example {R} (x y : R) [Ring R] : y + x + y = 2*y + x := by ring -- fails

Edward van de Meent (Aug 07 2024 at 22:22):

why isn't the left side here in normal form?

Edward van de Meent (Aug 07 2024 at 22:23):

or rather, why is that considered normal form?

Edward van de Meent (Aug 07 2024 at 22:23):

i'd expect it not to be

Riccardo Brasca (Aug 07 2024 at 22:26):

There are a lot of questions about natural subtraction and related stuff.

Eric Wieser (Aug 08 2024 at 00:01):

Edward van de Meent said:

why isn't the left side here in normal form?

Because ring is powerless on non-commutative rings

Ruben Van de Velde (Aug 08 2024 at 06:02):

Oh, is it the powers that get in the way? (sorry)

Adomas Baliuka (Aug 08 2024 at 06:36):

Shreyas Srinivas said:

The set_option stuff is not easily found in documentation

Can you provide a link? I haven't been able to find it yet.

Ralf Stephan (Aug 08 2024 at 07:39):

I think the tactics need an ontology, i.e., a hierarchy or grouping that categorizes them. All documents about tactics are simply long lists. (PS: Actually that task is nontrivial, as GPT-4o fails to give a satisfying complete answer after uploading such a list).

Questions often asked: rewrite inside sums/other binders (Answer: simp_rw).

Ralf Stephan (Aug 08 2024 at 09:34):

Adomas Baliuka said:

Shreyas Srinivas said:

The set_option stuff is not easily found in documentation

Can you provide a link? I haven't been able to find it yet.

#help option

Shreyas Srinivas (Aug 08 2024 at 14:21):

On the one hand, I am grateful #help option exists. On the other hand, having searchable doc pages for these options in a neatly categorised way would be super helpful. If it weren't behind the FROwall, I'd probably make a PR.

Ralf Stephan (Aug 08 2024 at 14:37):

Problem is how to maintain a manual hierarchy, would you constantly add things? It's not that the inflow is decreasing.

Ralf Stephan (Aug 08 2024 at 14:41):

(I thought more about #help tactics than options when writing this)

Shreyas Srinivas (Aug 08 2024 at 14:50):

No, but I'd probably start by extracting the options and their docstrings systematically and display them as a, say, sphinx doc page

Shreyas Srinivas (Aug 08 2024 at 14:50):

Clearly extracting this information is possible given that mathlib docs and loogle can supply them for other declarations

Ralf Stephan (Aug 08 2024 at 14:52):

For example for tactics: https://seasawher.github.io/mathlib4-tactics/

Luigi Massacci (Aug 08 2024 at 15:54):

For the original question, as for the mathematician vs software engineer split, my impression is that when it comes to questions of calculus / single variable real analysis this is quite visible, the latter group often having questions that are really about math (even if disguised as lean questions) especially surrounding rigorous notions of limit and derivative

Luigi Massacci (Aug 08 2024 at 15:55):

And functions to / from the empty type (but more generally vacuously true statements) similarly seem to elicit confusion quite frequently

Michal Wallace (tangentstorm) (Aug 09 2024 at 01:33):

Coming from the software side and having just gone through the first part of the learning curve, here are some things I noticed:

You probably don't just want Lean, you want Lean + Mathlib.Tactics, even if you're not doing things that are particularly mathematical.
There is not a lot of "quick reference" material available, aside from the searchable Mathlib docs. I found myself going back and skimming numerous texts over and over looking for the name of a tactic or something.
As an example of what would be helpful, I found myself referring to this page from project xena numerous times: https://www.ma.imperial.ac.uk/~buzzard/xena/formalising-mathematics-2022/Part_C/cheatsheet.html (though even then, it occasionally pointed me in the wrong direction: in my particular case, if I wanted to do something with an ∃x hypothesis, what I almost always was looking for was obtain x.
I knew about simp almost from the start but I didn't understand how powerful / fundamental it was: that it can unfold a def or that all the other search tools work better if you run simp first.
As you say, the public LLMs are mostly useless. ChatGPT seems to confuse lean 3 and 4 no matter what I try. Claude was able to generate a script for me that would list all the definitions in a namespace, but it took several rounds of error correcting.
The nightmare problems for me were simple issues with natural number division. I had several occasions where I spent literally hours trying to prove a simple fact about arithmetic, only to post a question here and have it answered in minutes by someone with more experience. :)

Michal Wallace (tangentstorm) (Aug 09 2024 at 01:40):

Actually, it wasn't a question, but the number one problem I had was probably wasting time doing what the UI seemed to suggest that I do instead of thinking about what I really wanted to do. It's very easy to zone out and just react to what's on the screen... Even if I accidentally reversed the direction of an arrow, or if the first step prompted me to prove a statement that wasn't the proof I had planned.

Eduardo Ochs (Aug 10 2024 at 01:35):

@Michal Wallace (tangentstorm), I tried to write a program that would list all definitions in a namespace - without LLMs - and failed miserably... can you share your code?

Damiano Testa (Aug 10 2024 at 09:02):

You could do this. It does not filter by namespace, but just by beginning of declaration name (it would be easy to make it search a specific namespace, but this seemed more appropriate):

import Mathlib

open Lean Elab Command

elab "#autocomplete " id:ident bostx:(num)? : command => do
  let bo := match bostx with | some n => n.getNat | none => 5000
  let env ← getEnv
  let consts := env.constants
  let pref := id.getId.toString
  let prel := pref.length
  let newDeclNames ← consts.foldM (init := ({} : NameSet)) fun a b _c => do
    if (!(← b.isBlackListed)) && b.toString.take prel == pref then return a.insert b else return a
  let sorted := newDeclNames.toArray.qsort Name.lt
  let sa : Subarray Name :=
    { array:= sorted
      start := 0
      stop := min sorted.size bo
      start_le_stop := Nat.zero_le _
      stop_le_array_size := Nat.min_le_left .. }
  match newDeclNames.size with
    | 0 => logInfo m!"No declarations starting with '{id}'"
    | n => logInfoAt id <| m!"{n} declaration{if n == 1 then "" else "s"} starting with '{id}'" ++
                      (if bo < sorted.size then m!"\nShowing the first {bo}" else "") ++
                      (if bostx.isNone then m!"\n(you can modify the default bound of 5000 using `#autocomplete {id} bound`)" else "") ++
                      (indentD m!"\n{sa}")

/-
4008 declarations starting with 'Nat'
Showing the first 4

  [Nat, NatCast, NatOrdinal, NatPow]
-/
#autocomplete Nat 4

Michal Wallace (tangentstorm) (Aug 10 2024 at 16:35):

Nice. The one Claude generated no longer works for me, and IIRC, the output was rather ugly and overwhelming anyway.

Adomas Baliuka (Aug 12 2024 at 21:02):

Ralf Stephan said:

Adomas Baliuka said:

Shreyas Srinivas said:

The set_option stuff is not easily found in documentation

Can you provide a link? I haven't been able to find it yet.

#help option

Sorry for digging this out so late, but I don't understand how to use it. I just get

unexpected token '#'; expected command

Eric Wieser (Aug 12 2024 at 22:04):

import Mathlib.Tactic?

Last updated: May 02 2025 at 03:31 UTC