Zulip Chat Archive

Stream: general

Topic: rejected by leanchecker but accepted by trepplein


David Renshaw (May 09 2021 at 03:57):

I've been running some fuzz testing on the various Lean 3 checkers. One finding is the following file, which is rejected by leanchecker but accepted by trepplein:

9 #NS 0 u
2 #NS 0 eq
3 #NS 0 alpha
1 #UP 1
0 #ES 1
4 #NS 0 a
1 #EV 0
5 #NS 0 alpt
2 #EV 1
3 #ES 0
4 #EP #BD 5 2 3
5 #EP #BD 4 1 4
6 #EP #BI 3 0 5
6 #NS 2 refl
7 #EC 2 1
8 #EA 7 2
9 #EA 8 1
10 #EA 9 1
11 #EP #BD 4 1 10
12 #EP #BI 3 0 11
#IND 2 2 6 1 6 12 1

I suspect the reason for the discrepancy has something to do with laziness, and that it does not imply the existence of any serious unsoundness.
However, it's somewhat troubling to me that such a discrepancy exists at all.

Is this troubling to anyone else? Should there be an effort to standardize checker behavior to avoid this kind of discrepancy?

Mario Carneiro (May 09 2021 at 04:08):

There is at least one known theoretical discrepancy between trepplein and leanchecker, to do with universe handling

Mario Carneiro (May 09 2021 at 04:14):

Lean uses a normalization approach to defeq testing for level expressions, but the rules it uses for normalization are not complete for the language of nat with zero/succ/max/imax. As a result, there are some level expressions that lean will not think are equal even though they are equal for all concrete values of the universe variables:

universes u v w
example (α : Sort (imax u (imax v w))) (β : Sort (imax (max u v) w)) := α = β -- fails in lean, works in trepplein

Trepplein uses a case splitting approach instead to normalize imax in the goal; as a result it has a complete decision procedure for this language. The rules given in my thesis are also following trepplein here instead of lean.

Mario Carneiro (May 09 2021 at 04:36):

I worked out that data by hand using export_format.md as a guide.

9 #NS 0 u = `u
2 #NS 0 eq = `eq
3 #NS 0 alpha = `alpha
1 #UP 1 = u1 (???)
0 #ES 1 = Sort u1
4 #NS 0 a = `a
1 #EV 0 = #0
5 #NS 0 alpt = `alpt
2 #EV 1 = #1
3 #ES 0 = Sort 0
4 #EP #BD 5 2 3 = Π (alpt : #1), Sort 0
5 #EP #BD 4 1 4 = Π (a : #0) (alpt : #0), Sort 0
6 #EP #BI 3 0 5 = Π {alpha : Sort u1} (a : alpha) (alpt : alpha), Sort 0
6 #NS 2 refl = `refl
7 #EC 2 1 = eq.{u1}
8 #EA 7 2 = eq.{u1} #1
9 #EA 8 1 = eq.{u1} #1 #0
10 #EA 9 1 = eq.{u1} #1 #0 #0
11 #EP #BD 4 1 10 = Π (a : #0), eq.{u1} #0 a a
12 #EP #BI 3 0 11 = Π {alpha : Sort u1} (a : alpha), eq.{u1} alpha a a
#IND 2 2 6 1 6 12 1 =
  inductive {u1} eq {alpha : Sort u1} (a : alpha) : Π (alpt : alpha), Sort 0
  | refl : Π {alpha : Sort u1} (a : alpha), eq.{u1} alpha a a

It appears to be a perfectly correct definition of eq as an inductive type, except that line 4 constructs a universe parameter that I have called u1 which references name index 1, but there are no previous definitions of a name with that index (grep for 1 #N), so it seems reasonable to throw an error.

Mario Carneiro (May 09 2021 at 04:39):

I guess the fuzz tester decided to change the first character in the file, since it looks like the first line was supposed to be the name

Mario Carneiro (May 09 2021 at 04:41):

However, it's somewhat troubling to me that such a discrepancy exists at all.

Is this troubling to anyone else? Should there be an effort to standardize checker behavior to avoid this kind of discrepancy?

I don't think the checkers have been fuzz tested before, so I would put high probability on you finding at least one bug. The checkers are not actually that well tested against things other than mathlib.txt

Gabriel Ebner (May 09 2021 at 10:50):

I also suspect that there at least some logical bugs in the checkers. Like missing a side-condition for inductives, etc. It is unfortunately hard to construct text files that should be rejected---everything that lean --export produces should be correct after all. So you have to resort to constructing them by hand.

David Renshaw (May 09 2021 at 12:58):

Here's another example:

1 #NS 0 u
2 #NS 0 eq
3 #NS 0 alpha
1 #UP 1
0 #ES 1
4 #NS 0 a
1 #EV 0
5 #NS 0 alpt
2 #EV 1
3 #ES 0
4 #EP #BD 5 2 3
5 #EP #BD 4 1 4
6 #EP #BI 3 0 5
6 #NS 2 refl
7 #EC 2 1
8 #EA 7 2
9 #EA 8 1
10 #EA 9 1
11 #EP #BD 0 1 10
12 #EP #BI 3 0 11
#IND 2 2 6 1 6 12 1

This is rejected by leanchecker with output:

line 21: eq: invalid anonymous binder name

but accepted by trepplein with output:

-- successfully checked 3 declarations

David Renshaw (May 09 2021 at 13:38):

I also suspect that there at least some logical bugs in the checkers. Like missing a side-condition for inductives, etc.

I opened https://github.com/gebner/trepplein/issues/3.

Mario Carneiro (May 09 2021 at 18:20):

I think leanchecker is being too strict here. The problem that was introduced here is the 11 #EP line that uses 0 = [anon] for the name instead of 4 = `a. That is, it is a construction of:

  inductive {u} eq {alpha : Sort u} (a : alpha) : Π (alpt : alpha), Sort 0
  | refl : Π {alpha : Sort u} ([anon] : alpha), eq.{u} alpha [anon] [anon]

A user can't write this, but [anon] here is only needed for pretty printing and I think that it should be allowed to just stub out all binder names in this way.

Chris B (May 09 2021 at 18:35):

Mario Carneiro said:

It appears to be a perfectly correct definition of eq as an inductive type, except that line 4 constructs a universe parameter that I have called u1 which references name index 1, but there are no previous definitions of a name with that index (grep for 1 #N), so it seems reasonable to throw an error.

I think this should just produce a parse error.

Chris B (May 09 2021 at 18:37):

The rust one rejects the first one and accepts the second.

Chris B (May 09 2021 at 18:58):

David Renshaw said:

However, it's somewhat troubling to me that such a discrepancy exists at all.

Is this troubling to anyone else? Should there be an effort to standardize checker behavior to avoid this kind of discrepancy?

I'd like to think this is is redundancy working more or less as planned. What kind of standards would you suggest? There are still a bunch of open questions about how external checking will work for lean 4, but I think this points to a good idea of having a collection of known bad examples that shouldn't get past a type checker.

David Renshaw (May 09 2021 at 19:38):

Chris B said:

What kind of standards would you suggest?

Ideally, we would agree on the correct behavior for checkers on all edge cases, like the anonymous name case identified above.

Mario Carneiro (May 09 2021 at 19:40):

it's not like there was ever a standards committee meeting for this stuff

David Renshaw (May 09 2021 at 19:44):

Another source of discrepancy: leanchecker apparently can omit validation of lines that are not referenced by any declaration. So, e.g.

1 #NS p l1

is accepted by leanchecker but rejected by trepplein.

David Renshaw (May 09 2021 at 19:46):

All of these little discrepancies make it more difficult to validate the checkers against each other.

Mario Carneiro (May 09 2021 at 19:51):

That should definitely be a syntax error

Mario Carneiro (May 09 2021 at 19:51):

I can understand delaying some validation but that's a straight-up parse error, there is a letter instead of a number

David Renshaw (May 09 2021 at 19:52):

Yeah, I agree.

Chris B (May 09 2021 at 20:09):

I agree on the anonymous identifier thing. IMO those kinds of implementation details aren't unimportant, but it's hard to find them all before they come up in a context like this where someone's exploring edge cases. If you have a list, I can at least make sure the rust version conforms with the ones that seem reasonable.

David Renshaw (May 09 2021 at 20:19):

Yep, totally. I don't have a list (other than what I've reported above), and I don't expect anyone to just come up with one. My motivation in starting this discussion is mainly to gauge the interest in working towards more uniformity.

Chris B (May 09 2021 at 20:38):

I know this is a bad answer since "yeah well when Lean 4 is ready..." is sort of a weasle-ish way of justifying inaction, but when lean 4 is at a point where we can start hammering this out there will probably be much more interest from the broader community in tightening and battle-testing different verifiers. I think fuzzing and collecting negative examples is definitely a good idea for where to start beyond just "does it check mathlib".

Mario Carneiro (May 09 2021 at 22:14):

David Renshaw said:

Another source of discrepancy: leanchecker apparently can omit validation of lines that are not referenced by any declaration. So, e.g.

1 #NS p l1

is accepted by leanchecker but rejected by trepplein.

I took a look at the code that does the parsing, and it seems that the istream is not error checked, meaning that it will read the line as if it said 1 #NS 0 "". Not sure what it thinks about empty name components but that is a valid name otherwise

David Renshaw (May 09 2021 at 22:47):

Ah, interesting. I gather that this is the code you're referring to: https://github.com/leanprover-community/lean/blob/05dd36d1717649932fccaafa0868321fb87f916d/src/checker/text_import.cpp#L182

Mario Carneiro (May 09 2021 at 22:49):

Yes. I'm preparing a fix PR now

Mario Carneiro (May 09 2021 at 22:51):

Regarding the anonymous name issue, I am thinking of just removing this line, although that will have an effect on the kernel, not just the text importer. A more conservative change would be to have the text importer make up a fresh name if the name that was read is anonymous

Mario Carneiro (May 09 2021 at 22:58):

lean#574

Mario Carneiro (May 09 2021 at 23:03):

Not sure what it thinks about empty name components but that is a valid name otherwise

I guess lean is totally fine with empty names:

def «» := 1
#print «»

Mario Carneiro (May 10 2021 at 00:03):

Could someone on macOS test lean#574? I think it's a line endings issue, but I'm not sure if the import or export is to blame

David Renshaw (May 10 2021 at 00:13):

Just make test? I'm doing that on MacOS right now...

David Renshaw (May 10 2021 at 00:14):

 2/1411 Test    #2: leanchecker .................................................***Failed    5.20 sec

Mario Carneiro (May 10 2021 at 01:33):

You can do the test steps:

in library/:
../build/shell/lean --recursive --export=../build/shell/library_export.out .
checker/leanchecker shell/library_export.out

Mario Carneiro (May 10 2021 at 01:34):

apparently this fails on line 1, so I would like to know what library_export.out looks like

David Renshaw (May 10 2021 at 01:36):

1 #NS 0 u
2 #NS 0 eq
3 #NS 0 α
1 #UP 1
0 #ES 1
4 #NS 0 a
1 #EV 0
5 #NS 0 
2 #EV 1
3 #ES 0
4 #EP #BD 5 2 3
5 #EP #BD 4 1 4
6 #EP #BI 3 0 5
6 #NS 2 refl
7 #EC 2 1
8 #EA 7 2
9 #EA 8 1
10 #EA 9 1
11 #EP #BD 4 1 10
12 #EP #BI 3 0 11
#IND 2 2 6 1 6 12 1
...

David Renshaw (May 10 2021 at 01:40):

I'll see if I can narrow down the cause...

David Renshaw (May 10 2021 at 01:43):

https://github.com/leanprover-community/lean/pull/574/files#r629004579

David Renshaw (May 10 2021 at 23:45):

@Chris B Here's an input that leanchecker and trepplein reject, but nanoda_lib accepts:

1 #NS 8 u

David Renshaw (May 10 2021 at 23:50):

and tc also accepts it

Chris B (May 11 2021 at 02:27):

Thanks. In my case it goes uncontested because there are no declarations, I suspect it's the same for tc. AFAIK Lean always puts the export items in order, but there's nothing in the (limited) spec which requires primitives to appear in order, they just have to appear before being used in a declaration. It's an easy fix though, if it'll make what you're doing easier I'll change it.

Chris B (May 11 2021 at 02:32):

I think it's probably preferable to require them to be in order in a future lean 4 export format since that should be easy to implement across the board. I think eagerly evaluated languages that don't use indirection for the graph elements would be annoyed with the unordered approach.

David Renshaw (May 11 2021 at 02:34):

In my case it goes uncontested because there are no declarations

nanoda_lib also accepts it in this case (and leanchecker still rejects):

1 #NS 8 u
2 #NS 0 eq
3 #NS 0 alpha
1 #UP 1
0 #ES 1
4 #NS 0 a
1 #EV 0
5 #NS 0 alphahat
2 #EV 1
3 #ES 0
4 #EP #BD 5 2 3
5 #EP #BD 4 1 4
6 #EP #BI 3 0 5
6 #NS 2 refl
7 #EC 2 1
8 #EA 7 2
9 #EA 8 1
10 #EA 9 1
11 #EP #BD 4 1 10
12 #EP #BI 3 0 11
#IND 2 2 6 1 6 12 1

David Renshaw (May 11 2021 at 02:36):

It's an easy fix though, if it'll make what you're doing easier I'll change it.

It would help me find more such discrepancies. I'll leave it up to you to determine whether that is a positive outcome. :)

Chris B (May 11 2021 at 02:38):

lol touche. Do you need it to terminate nicely, or can it just panic?

David Renshaw (May 11 2021 at 02:40):

In my current setup, panic would be fine, because I'm just invoking nanoda_lib as a black box subprocess.

David Renshaw (May 11 2021 at 02:41):

But in general, richer error information and avoidance of panics is better.

David Renshaw (May 11 2021 at 02:43):

I should also note that I don't have a strong grasp the meanings of the generated inputs. I'm trusting you all to make good judgments about what the "correct" behavior a checker should be.

David Renshaw (May 11 2021 at 02:46):

Like, I don't want to encourage everyone to converge on a suboptimal standardization.

Chris B (May 11 2021 at 02:51):

https://github.com/ammkrn/nanoda_lib/tree/ptr_check

Chris B (May 11 2021 at 02:54):

I think you are right on this one since it's not actually "no declarations", it's "no declarations in which the identity of 1 #NS 8 u matters enough to ever be read". It's wrong in a weird way but it's still wrong.

Chris B (May 11 2021 at 03:00):

I'm on a different computer, but I don't know why it's saying I'm six commits ahead of master. They seem to be the same other than those changes but lmk if it gives you any issues.

David Renshaw (May 11 2021 at 03:18):

@Chris B nanoda_lib seems to only check the prefix of commands, so it accepts this:

1 #NSBAD 0 u

(and leanchecker rejects it with line 1: unknown term definition kind: #NSBAD)

Chris B (May 11 2021 at 03:54):

Thanks. Should be fixed on that branch.

David Renshaw (May 11 2021 at 11:01):

Rejected by leanchecker, trepplein, and tc, but accepted by nanoda_lib:

1 #NS 0 u
2 #NS 0 eq
3 #NS 0 alpha
1 #UP 1
0 #ES 1
4 #NS 0 o
1 #EV 0
5 #NS 0 alphahat
2 #EV 1
3 #ES 0
4 #EP #BD 5 2 3
5 #EP #BD 4 1 4
6 #EP #BI 3 0 5
6 #NS 2 refl
7 #EC 2 1
8 #EA 7 2
9 #EA 8 1
10 #EA 9 5
11 #EP #BD 4 1 10
12 #EP #BI 3 0 11
#IND 2 2 6 1 6 12 1

Mario Carneiro (May 11 2021 at 14:45):

Chris B said:

Thanks. In my case it goes uncontested because there are no declarations, I suspect it's the same for tc. AFAIK Lean always puts the export items in order, but there's nothing in the (limited) spec which requires primitives to appear in order, they just have to appear before being used in a declaration. It's an easy fix though, if it'll make what you're doing easier I'll change it.

I agree that we should make this part of the spec. No forward references, not just to help eager checkers but also to avoid cyclic references

Mario Carneiro (May 11 2021 at 14:56):

The last one is a bug in nanoda_lib, and for a pretty reasonable example:

  inductive {u} eq {alpha : Sort u} (a : alpha) : Π (alphahat : alpha), Sort 0
  | refl : Π {alpha : Sort u} (a : alpha), eq.{u} alpha a (Π (a' : a) (alphahat : a), Sort 0)

You can see that the second argument in the type of refl has been corrupted. This is a type error because Π (a' : a) (alphahat : a), Sort 0 has type Sort 1 and it is expecting something of type alpha

Chris B (May 11 2021 at 18:18):

These are helpful, thank you. That should be fixed now. I don't have a local export file for mathlib on this computer, but I'll test it more thoroughly tonight.


Last updated: Dec 20 2023 at 11:08 UTC