Zulip Chat Archive

Stream: mathlib4

Topic: CI failure when building MathlibExtras


Eric Wieser (Sep 22 2023 at 22:03):

I'm getting this rather alarming CI failure

[3777/3779] Building MathlibExtras.LibrarySearch
error: > LEAN_PATH=./lake-packages/std/build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/Cli/build/lib:./lake-packages/proofwidgets/build/lib:./build/lib LD_LIBRARY_PATH=./build/lib /home/lean/.elan/toolchains/leanprover--lean4---v4.1.0-rc1/bin/lean ./././MathlibExtras/LibrarySearch.lean -R ././. -o ./build/lib/MathlibExtras/LibrarySearch.olean -i ./build/lib/MathlibExtras/LibrarySearch.ilean -c ./build/ir/MathlibExtras/LibrarySearch.c
error: stderr:
libc++abi: terminating due to uncaught exception of type lean::throwable: deep recursion was detected at 'interpreter' (potential solution: increase stack space in your system)
interpreter stacktrace:
#1 Mathlib.Tactic.LibrarySearch.buildDiscrTree._lambda_1
#2 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#3 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
#4 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#5 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
<about another 5000 repeats of the last two lines>
#9902 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#9903 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
#9904 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#9905 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg._lambda_1
#9906 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg._lambda_1._boxed
#9907 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9908 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9909 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9910 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9911 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9912 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9913 Lean.PersistentHashMap.mapM._at.Lean.Meta.DiscrTree.mapArrays._spec_4._rarg
#9914 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg
#9915 Mathlib.Tactic.DiscrTreeCache.mk._at.Mathlib.Tactic.LibrarySearch.buildDiscrTree._spec_4._lambda_4
#9916 Mathlib.Tactic.DiscrTreeCache.mk._at.Mathlib.Tactic.LibrarySearch.buildDiscrTree._spec_4._lambda_4._boxed
#9917 Mathlib.Tactic.DeclCache.mk._rarg._lambda_1
#9918 Mathlib.Tactic.DeclCache.mk._rarg._lambda_1._boxed
#9919 Lean.profileitM._at.Lean.Meta.synthInstance?._spec_16._rarg
#9920 Mathlib.Tactic.DeclCache.mk._rarg._lambda_2
#9921 Mathlib.Tactic.DeclCache.mk._rarg._lambda_2._boxed
#9922 Mathlib.Tactic.Cache.get._rarg._lambda_6

Scott Morrison (Sep 22 2023 at 23:32):

Curiously, I cannot reproduce locally!

Sebastian Ullrich (Sep 23 2023 at 07:56):

You two have different ulimit -s values probably

Sebastian Ullrich (Sep 23 2023 at 07:57):

It would be good to homogenize that setting somewhere along the path. In Lake?

Eric Wieser (Sep 23 2023 at 08:16):

Is the assumption then that this isn't an infinite loop, and that my PR genuinely just needs an awful lot of stack space?

Sebastian Ullrich (Sep 23 2023 at 08:31):

It looks like it's linear in the height of the tree. In which case that might not be a reasonable height for a discrimination tree!

Eric Wieser (Sep 23 2023 at 09:00):

Does this suggest that recursion-based processing of discrimination trees is a bad idea?

Eric Wieser (Sep 23 2023 at 09:01):

Or simply that we should expect high stack usage?

Joachim Breitner (Sep 23 2023 at 09:13):

The problematic code is in Mathlib.Lean.Meta.DiscrTree, the implementation of DiscrTree.mapArrays. I wonder if the monadic mapM on arrays is a problem, stack-space wise? It seems only a pure map is needed here in the end. Although the stack trace doesn’t look like the array length matters…

Scott Morrison (Sep 23 2023 at 10:03):

@Eric Wieser, do you have a branch exhibiting this problem? You workflow log seems to be for some previous commit?

Eric Wieser (Sep 23 2023 at 10:04):

It is indeed for a previous commit on the branch

Scott Morrison (Sep 23 2023 at 10:04):

I made a non-monadic version of DiscrTree.mapArrays.

Scott Morrison (Sep 23 2023 at 10:04):

Could you make a branch for me, and tell me the name?

Eric Wieser (Sep 23 2023 at 10:05):

Well, there's a "branch" called 5654bed32ccdb8ae40830e4094050a385fd4b299 :wink:

Eric Wieser (Sep 23 2023 at 10:05):

(which you can get locally with git fetch origin 5654bed32ccdb8ae40830e4094050a385fd4b299)

Eric Wieser (Sep 23 2023 at 10:07):

This is a commit from #7244

Scott Morrison (Sep 23 2023 at 10:07):

Okay, you can try merging branch#DiscrTree.mapArrays in and see if it helps.

Scott Morrison (Sep 23 2023 at 10:08):

(If you'd made a branch for me I would have done the merge for you, and even followed up to check the CI...)

Eric Wieser (Sep 23 2023 at 10:21):

That branch doesn't appear to exist

Mauricio Collares (Sep 23 2023 at 10:32):

branch#eric_CI_troubles

Joachim Breitner (Sep 23 2023 at 10:35):

@Scott Morrison , isn’t the monadic version unused anyways and can be replaced with the pure one in any case? (Unless I mis-grepped)

Eric Wieser (Sep 23 2023 at 10:47):

Scott Morrison said:

(If you'd made a branch for me I would have done the merge for you, and even followed up to check the CI...)

I've cherry-picked your change into #7244, as the head of that branch is still failing in the same way (and I'm done with the other changes now)

Eric Wieser (Sep 23 2023 at 12:24):

Thanks for trying Scott, but I'm afraid this didn't help

Scott Morrison (Sep 24 2023 at 00:48):

Can you reproduce locally?

Scott Morrison (Sep 24 2023 at 00:48):

I have some more code to inspect DiscrTrees, I can send you this to try to find what is going wrong.

Scott Morrison (Sep 24 2023 at 00:50):

On master, we already have some excessively long entries in the DiscrTree (2596!) I'm guessing you've just set a new record and gone over some bound.

Scott Morrison (Sep 24 2023 at 00:50):

Presumably the fix is to truncate the keys before we insert them into the DiscrTree.

Scott Morrison (Sep 24 2023 at 00:51):

Current longest key is for src#PowerBasis.quotientEquivQuotientMinpolyMap_symm_apply

Scott Morrison (Sep 24 2023 at 01:19):

This is the current "spectrum" of key lengths in the exact? discrimination tree:

[(1, 19243), (2, 8706), (3, 10585), (4, 8973), (5, 10958), (6, 10332), (7, 9878), (8, 8599),
(9, 7929), (10, 7187), (11, 8856), (12, 6699), (13, 8449), (14, 5550), (15, 5543), (16, 5001),
(17, 5433), (18, 4457), (19, 4319), (20, 3699), (21, 3402), (22, 3227), (23, 3397), (24, 2777),
(25, 3211), (26, 2355), (27, 2334), (28, 2422), (29, 1933), (30, 1844), (31, 1807), (32, 1592),
(33, 1595), (34, 1624), (35, 1293), (36, 1310), (37, 1164), (38, 1263), (39, 1122), (40, 1138),
(41, 1038), (42, 950), (43, 905), (44, 743), (45, 776), (46, 753), (47, 740), (48, 692),
(49, 708), (50, 624), (51, 614), (52, 645), (53, 503), (54, 517), (55, 580), (56, 411), (57, 457),
(58, 478), (59, 461), (60, 398), (61, 366), (62, 327), (63, 339), (64, 341), (65, 332), (66, 295),
(67, 266), (68, 297), (69, 288), (70, 283), (71, 297), (72, 261), (73, 253), (74, 223), (75, 215),
(76, 238), (77, 214), (78, 213), (79, 227), (80, 200), (81, 181), (82, 185), (83, 188), (84, 153),
(85, 195), (86, 150), (87, 170), (88, 133), (89, 149), (90, 136), (91, 146), (92, 121), (93, 113),
(94, 139), (95, 131), (96, 130), (97, 96), (98, 116), (99, 123), (100, 115), (101, 91), (102, 73),
(103, 96), (104, 97), (105, 88), (106, 99), (107, 97), (108, 86), (109, 73), (110, 55), (111, 73),
(112, 81), (113, 76), (114, 65), (115, 47), (116, 78), (117, 68), (118, 75), (119, 58), (120, 59),
(121, 70), (122, 46), (123, 42), (124, 64), (125, 44), (126, 57), (127, 46), (128, 60), (129, 47),
(130, 55), (131, 49), (132, 52), (133, 50), (134, 51), (135, 36), (136, 31), (137, 38), (138, 35),
(139, 34), (140, 23), (141, 30), (142, 35), (143, 40), (144, 28), (145, 43), (146, 32), (147, 29),
(148, 19), (149, 24), (150, 37), (151, 32), (152, 20), (153, 27), (154, 28), (155, 22), (156, 27),
(157, 28), (158, 25), (159, 19), (160, 28), (161, 33), (162, 21), (163, 38), (164, 18), (165, 25),
(166, 28), (167, 18), (168, 25), (169, 12), (170, 19), (171, 15), (172, 16), (173, 13), (174, 20),
(175, 32), (176, 26), (177, 32), (178, 21), (179, 22), (180, 10), (181, 10), (182, 14), (183, 18),
(184, 11), (185, 11), (186, 7), (187, 14), (188, 17), (189, 21), (190, 7), (191, 12), (192, 16),
(193, 7), (194, 16), (195, 11), (196, 9), (197, 9), (198, 8), (199, 14), (200, 7), (201, 16),
(202, 11), (203, 6), (204, 7), (205, 5), (206, 14), (207, 20), (208, 12), (209, 16), (210, 15),
(211, 9), (212, 13), (213, 12), (214, 11), (215, 9), (216, 19), (217, 14), (218, 8), (219, 8),
(220, 4), (221, 5), (222, 14), (223, 6), (224, 11), (225, 1), (226, 12), (227, 12), (228, 7),
(229, 3), (230, 5), (231, 5), (232, 6), (233, 2), (234, 11), (235, 10), (236, 12), (237, 7),
(238, 3), (239, 3), (240, 2), (241, 6), (242, 5), (243, 8), (244, 11), (245, 9), (246, 5), (247, 4),
(248, 11), (249, 6), (250, 8), (251, 3), (252, 8), (253, 5), (254, 5), (255, 6), (256, 1), (257, 10),
(258, 5), (259, 8), (260, 4), (261, 6), (262, 3), (263, 3), (264, 9), (265, 4), (266, 2), (267, 5),
(268, 3), (269, 2), (270, 6), (271, 7), (272, 4), (273, 4), (274, 4), (275, 6), (276, 2), (277, 1),
(278, 1), (279, 4), (280, 2), (281, 3), (282, 3), (284, 4), (285, 2), (286, 2), (287, 4), (288, 4),
(289, 2), (290, 3), (291, 4), (292, 1), (293, 2), (295, 4), (296, 1), (297, 1), (298, 5), (299, 3),
(300, 4), (301, 3), (302, 2), (303, 3), (304, 5), (305, 3), (306, 1), (307, 2), (308, 4), (309, 2),
(310, 1), (312, 3), (314, 2), (315, 3), (316, 4), (317, 3), (318, 3), (319, 3), (320, 2), (321, 3),
(323, 1), (324, 3), (325, 1), (326, 3), (327, 5), (328, 1), (329, 6), (330, 3), (331, 1), (332, 5),
(333, 1), (334, 1), (336, 3), (338, 3), (339, 2), (340, 1), (341, 1), (342, 2), (343, 2), (344, 2),
(346, 4), (347, 1), (348, 1), (349, 2), (350, 3), (352, 4), (353, 3), (354, 3), (355, 2), (356, 2),
(357, 1), (358, 1), (359, 1), (360, 1), (361, 2), (362, 1), (364, 1), (367, 2), (368, 1), (369, 5),
(372, 1), (373, 5), (374, 2), (375, 1), (379, 3), (382, 2), (384, 1), (385, 3), (386, 2), (387, 2),
(388, 1), (392, 1), (393, 2), (394, 1), (395, 2), (396, 1), (397, 3), (398, 1), (400, 2), (401, 1),
(403, 1), (404, 3), (406, 1), (407, 2), (409, 3), (410, 1), (411, 3), (414, 1), (415, 2), (416, 1),
(417, 2), (419, 1), (421, 2), (422, 2), (423, 3), (424, 1), (425, 2), (426, 1), (428, 1), (429, 2),
(430, 4), (431, 1), (432, 2), (433, 1), (436, 1), (438, 2), (439, 2), (441, 1), (442, 1), (449, 1),
(451, 1), (453, 1), (454, 3), (459, 1), (461, 1), (462, 2), (466, 2), (471, 1), (472, 3), (474, 2),
(475, 1), (481, 1), (485, 2), (488, 1), (489, 1), (490, 2), (492, 1), (494, 1), (499, 2), (501, 1),
(503, 1), (509, 1), (510, 2), (511, 1), (512, 1), (513, 1), (515, 2), (517, 4), (518, 1), (520, 1),
(521, 2), (524, 2), (525, 2), (533, 2), (535, 1), (539, 2), (544, 1), (545, 1), (549, 1), (554, 3),
(555, 2), (560, 1), (566, 3), (570, 1), (575, 2), (582, 2), (585, 2), (587, 1), (591, 1), (592, 1),
(595, 1), (597, 3), (598, 1), (599, 2), (602, 1), (606, 1), (608, 1), (611, 2), (612, 1), (613, 1),
(614, 1), (619, 2), (621, 2), (626, 1), (628, 2), (635, 1), (657, 1), (667, 1), (676, 1), (693, 3),
(697, 1), (717, 1), (723, 2), (743, 2), (755, 1), (766, 1), (771, 1), (773, 1), (777, 1), (812, 1),
(813, 1), (826, 1), (831, 2), (845, 2), (852, 1), (895, 1), (926, 2), (960, 1), (965, 2), (1000, 1),
(1187, 1), (1376, 1), (1397, 1), (1482, 1), (1543, 1), (1572, 1), (1660, 1), (1710, 1), (1968, 1),
(2053, 1), (2597, 1)]

Scott Morrison (Sep 24 2023 at 01:23):

I propose we just truncate at 1000. This will be "safe" for this stackoverflow issue, and affects vanishingly few lemmas.

It's possible that truncating won't even prevent those few lemmas from being applied (possibly if we truncate carefully).

However I'm inclined to not spend the time investigating that, and just leave a note.

Scott Morrison (Sep 24 2023 at 01:32):

@Eric Wieser, can I leave #7345 in your hands?

Joachim Breitner (Sep 24 2023 at 07:36):

This is a trie data structure, right? We could implement path compression: a sequence of degree one nodes without associated values is stored in a flat array. Should also greatly reduce the footprint of the trie.

Joachim Breitner (Sep 24 2023 at 07:46):

I see more optimization potential there (array of tuples looks more expensive in terms of memory footprint and cache lines than a tuple of arrays). The DiscrTree is used all over the place, so maybe worthwhile to see what middle-hanging fruit there is left to pluck?

Sebastian Ullrich (Sep 24 2023 at 07:48):

Looking into optimizations would be great as we do use it all over the place! But also it is an approximating data structure. We should evaluate whether there is even any benefit of keys of length 1000 compared to, say, 100, or even 10.

Joachim Breitner (Sep 24 2023 at 08:03):

If we had dependent arrays we could safely build compact data structures (e.g. a ArrayPair a b stored as a single array with 2n elements).

Joachim Breitner (Sep 24 2023 at 15:28):

(see <https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/More.20compact.20DiscrTree> for a possible way to make that data structure smaller and much less likely to cause stack issues.)

Eric Wieser (Sep 25 2023 at 16:49):

Scott Morrison said:

Eric Wieser, can I leave #7345 in your hands?

I tested it but unfortunately it didn't help.

Joachim Breitner (Sep 25 2023 at 21:17):

https://github.com/leanprover/lean4/pull/2577 should help reliably, but it's not decided yet if that change will go in

Scott Morrison (Sep 25 2023 at 23:27):

@Eric Wieser there's a toolchain for that PR as leanprover/lean4-pr-releases:pr-release-2577. Could you test your branch against that?

I'm worried that if truncating at 1000 didn't help, we are misdiagnosing the problem.

Eric Wieser (Sep 25 2023 at 23:29):

I likely won't have any time to do that tomorrow, but feel free to push whatever to #7244 to diagnose as long as you don't rewrite the history of commits authored by me

Eric Wieser (Sep 25 2023 at 23:29):

(force-pushing away my cherry-pick of your change is fine)

Scott Morrison (Sep 25 2023 at 23:30):

No hurry. @Joachim Breitner's lean4#2577 may not happen quickly, in any case.

Eric Wieser (Sep 25 2023 at 23:30):

(relatedly, a review of #7237 would substantially decrease the size of that PR)

Eric Wieser (Sep 25 2023 at 23:31):

I assume you mean lean4#2577?

Joachim Breitner (Sep 26 2023 at 11:23):

I pushed a CPS’sed and thus tail-recursive implementation of Trie.mapArray to #7244… but it still overflows the stack.
It says “interpreter stacktrace” – does the interpreter not perform tail-calls without using stack space?
And is it expected that there is an interpreter involved, shouldn’t this be using compiled code?

Mario Carneiro (Sep 26 2023 at 11:31):

Is there a need for this to be monadic?

Mario Carneiro (Sep 26 2023 at 11:33):

CPS'ing this function doesn't help, you still will get a giant stack when evaluating the closure you have built up

Joachim Breitner (Sep 26 2023 at 11:33):

No, but switching to a pure mapArray didn’t help, Scott tried that already (see https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/CI.20failure.20when.20building.20MathlibExtras/near/392674332)

Joachim Breitner (Sep 26 2023 at 11:34):

Hmm, I see. I was expecting that if each closure itself continues in a tail-recursive way, no stack would be used.

Joachim Breitner (Sep 26 2023 at 11:34):

(Of course, the nested closures would be giant on the heap, but that’s heap space then.)

Mario Carneiro (Sep 26 2023 at 11:34):

this is the interpreter stack we're talking about, it's already "on the heap"

Joachim Breitner (Sep 26 2023 at 11:35):

So the interpreter doesn’t do tail-call optimization?

Mario Carneiro (Sep 26 2023 at 11:35):

I ... don't think so?

Mario Carneiro (Sep 26 2023 at 11:36):

the compiler is the one that does TCO

Mario Carneiro (Sep 26 2023 at 11:36):

including the byte code compiler which prepares things for the interpreter

Joachim Breitner (Sep 26 2023 at 11:37):

Ok, then I was too optimistic :-)

Joachim Breitner (Sep 26 2023 at 11:38):

But what about the other question: Why is this interpreted, and not compiled? Should it be that way?

Mario Carneiro (Sep 26 2023 at 11:39):

why would it be compiled? Isn't it just an #eval in MathlibExtras?

Joachim Breitner (Sep 26 2023 at 11:40):

(It seems that In src/library/compiler/ir_interpreter.cpp has support for non-mutual tail-recursion, but no other mentions of tail in that file.)

Joachim Breitner (Sep 26 2023 at 11:43):

I am still ignorant of when things are evaluated/executed how :-).
Maybe again optimistically, I’d assume it would use compiled code when evaluating imported things – the intepreter code says “Whenever possible, we try to switch to native code by checking for the mangled symbol”

So if we’d rewrite that from being an #eval to a proper executable that loads Mathlib from it’s main function it would be using compiled?

Joachim Breitner (Sep 26 2023 at 11:43):

Would it make a difference whether Trie.mapArray is defined in mathlib, std or init?

Mario Carneiro (Sep 26 2023 at 11:46):

Here's a slightly cleaned up version of the IR for these two functions (from set_option trace.compiler.ir.result true):

def go._rarg._lambda_1 (x_1 : u8) (x_2 : obj) (x_3 : obj) (x_4 : obj) : obj :=
  let x_5 : obj := ctor_0[Lean.Meta.DiscrTree.Trie.node] x_2 x_4;
  let x_6 : obj := app x_3 x_5;
  ret x_6
def go._rarg (x_1 : u8) (x_2 : ) (x_3 : obj) (x_4 : obj) (x_5 : obj) : obj :=
  let x_6 : obj := proj[0] x_4;
  inc x_6;
  let x_7 : obj := proj[1] x_4;
  inc x_7;
  dec x_4;
  inc x_3;
  let x_8 : obj := app x_3 x_6;
  let x_9 : obj := Array.size  x_7;
  let x_10 : obj := Array.mkEmpty  x_9;
  dec x_9;
  let x_11 : obj := box x_1;
  let x_12 : obj := pap go._rarg._lambda_1._boxed x_11 x_8 x_5;
  let x_13 : obj := goA._rarg x_1  x_3 x_7 x_10 x_12;
  ret x_13
def goA._rarg._lambda_1 (x_1 : obj) (x_2 : obj) (x_3 : u8) (x_4 : obj) (x_5 : obj) (x_6 : obj) (x_7 : obj) : obj :=
  let x_8 : obj := ctor_0[Prod.mk] x_1 x_7;
  let x_9 : obj := Array.push  x_2 x_8;
  let x_10 : obj := goA._rarg x_3  x_4 x_5 x_9 x_6;
  ret x_10
def goA._rarg (x_1 : u8) (x_2 : ) (x_3 : obj) (x_4 : obj) (x_5 : obj) (x_6 : obj) : obj :=
  let x_7 : obj := Array.size  x_5;
  let x_8 : obj := Array.size  x_4;
  let x_9 : u8 := Nat.decLt x_7 x_8;
  dec x_8;
  case x_9 : u8 of
  Bool.false 
    dec x_7;
    dec x_4;
    dec x_3;
    let x_10 : obj := app x_6 x_5;
    ret x_10
  Bool.true 
    let x_11 : obj := Array.get  x_4 x_7;
    dec x_7;
    let x_12 : obj := proj[0] x_11;
    inc x_12;
    let x_13 : obj := proj[1] x_11;
    inc x_13;
    dec x_11;
    let x_14 : obj := box x_1;
    inc x_3;
    let x_15 : obj := pap goA._rarg._lambda_1._boxed x_12 x_5 x_14 x_3 x_4 x_6;
    let x_16 : obj := go._rarg x_1  x_3 x_13 x_15;
    ret x_16

Mario Carneiro (Sep 26 2023 at 11:48):

One thing that immediately sticks out is that if the interpreter wanted to notice that e.g. the call to go._rarg on the second to last line is a tail call it would have to look ahead to the ret instruction

Mario Carneiro (Sep 26 2023 at 11:49):

IRs designed for tail calls make this syntactically obvious with a call like ret go._rarg x_1 ◾ x_3 x_13 x_15, so this bodes ill for the IR having native tail call support

Mario Carneiro (Sep 26 2023 at 11:51):

Joachim Breitner said:

Maybe again optimistically, I’d assume it would use compiled code when evaluating imported things – the intepreter code says “Whenever possible, we try to switch to native code by checking for the mangled symbol”

That's the precompileImports setting. We don't use it in mathlib because it has issues with having to dynamically link hundreds or thousands of files, which seems not to be an intended use case

Mario Carneiro (Sep 26 2023 at 11:54):

Joachim Breitner said:

So if we’d rewrite that from being an #eval to a proper executable that loads Mathlib from it’s main function it would be using compiled?

Yes, if you ran it as an executable (e.g. like lake exe cache, which is compiled).

Would it make a difference whether Trie.mapArray is defined in mathlib, std or init?

No, you would have to compile any dependencies in order to make an executable, so it would be compiled no matter where it ends up (although you probably want to not pull in too much of mathlib to compile this executable)

Mario Carneiro (Sep 26 2023 at 11:57):

actually, it seems the interpreter does have tail recursion detection after all:

                    // tail recursion?
                    if (expr_tag(e) == expr_kind::FAp && expr_fap_fun(e) == get_frame().m_fn &&
                        fn_body_tag(cont) == fn_body_kind::Ret && !arg_is_irrelevant(fn_body_ret_arg(cont)) &&
                        arg_var_id(fn_body_ret_arg(cont)) == fn_body_vdecl_var(b)) {

Joachim Breitner (Sep 26 2023 at 11:58):

That’s what I saw, too. But that’s only recursion, not arbitrary calls, right?

Mario Carneiro (Sep 26 2023 at 11:59):

yeah, the first line is checking if it is a full application to the current function

Mario Carneiro (Sep 26 2023 at 12:00):

In any case I think the proper way to do this iteratively is using an explicit stack

Joachim Breitner (Sep 26 2023 at 12:05):

Let me try turning this into a single tail-recursive function, just to see if that optimization kicks in :-)

Joachim Breitner (Sep 26 2023 at 12:09):

We don't use it in mathlib because it has issues with having to dynamically link hundreds or thousands of files, which seems not to be an intended use case

I see that this would be expensive.
But I am a bit surprised. Do do all the tactics like simp etc. run always completely interpreted? Isn’t this significantly slower than if we could run all meta-code in compiled form? (Is that part of what Sebastian wants to solve by having different import for meta and non-meta definitions)?

Mario Carneiro (Sep 26 2023 at 12:10):

no, things from init/lean/lake are compiled

Joachim Breitner (Sep 26 2023 at 12:17):

Ah, if mapArrays was defined in init, it would have avoided the interpreter here.

Joachim Breitner (Sep 26 2023 at 12:23):

Thanks anyways, that was very helpful. Just to double check that I got this right:
In mathlib, meta-code (tactics, elaboration) is performed using compiled code if its defined in the lean repo, but if it’s defined in std or aesop or mathlib, it’s executed by the IR interpreter.
In particular, all the data structures defined in std, and operations in std for existing data structures, are not necessary as fast as they could be.

Mario Carneiro (Sep 26 2023 at 12:24):

here's my attempt at turning this into a single tail-recursive function with an explicit stack:

inductive Cont (α β : Type) (s)
  | nil : Cont α β s
  | cons : Array β  Array (Key s × Trie α s)  Array (Key s × Trie β s) 
    Key s  Cont α β s  Cont α β s

inductive State (α β : Type) (s)
  | arr : Array β  Array (Key s × Trie α s)  Array (Key s × Trie β s) 
    Cont α β s  State α β s
  | ret : Cont α β s  Trie β s  State α β s

/-- Apply a monadic function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α  Array β) : Trie β s :=
  go (start t .nil)
where
  @[inline] start : Trie α s  Cont α β s  State α β s
    | .node vs cs, k => .arr (f vs) cs (.mkEmpty cs.size) k

  go : State α β s  Trie β s
  | .arr vs' cs cs' k =>
    if h : cs'.size < cs.size then
      let (key, c) := cs.get cs'.size, h
      go (start c (.cons vs' cs cs' key k))
    else
      go (.ret k (.node vs' cs'))
  | .ret .nil t => t
  | .ret (.cons vs' cs cs' key k) c' =>
    go (.arr vs' cs (cs'.push (key, c')) k)

Mario Carneiro (Sep 26 2023 at 12:25):

yes to the stuff about compilation in std/aesop/mathlib

Joachim Breitner (Sep 26 2023 at 12:48):

Nice; is that a straight forward defunctionalization of the CPS variant?

Here is another variant, using arrays for the explicit stack (essentially an implicit zipper). Fun puzzle indeed :-)

/-- Apply a function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α  Array β) : Trie β s :=
  let .node vs0 cs0 := t
  go #[] #[] (f vs0) 0 cs0
where
  go parents cs' vs' i cs :=
    if h : i < cs.size then -- move down
      let (k, .node vs0 cs0) := cs.get i, h
      go (parents.push (cs', vs', i, cs, k)) #[] (f vs0) 0 cs0
    else
      let t := .node vs' cs'
      if parents.isEmpty then -- we are done
        t
      else -- move up one level
        let (cs', vs', i, cs, k) := parents.back
        go parents.pop (cs'.push (k, t)) vs' (i + 1) cs

Mario Carneiro (Sep 26 2023 at 12:55):

I don't know whether it is better to use an array or a list here; the Cont type is list-like although it is a custom inductive so that it gets a flatter representation

Mario Carneiro (Sep 26 2023 at 12:56):

in your case you probably want to also use a custom inductive for the elements of the parents array because otherwise you need 5 allocations for the tuples

Joachim Breitner (Sep 26 2023 at 12:58):

I mean, this is all mostly just idle finger exercises. I don't think we should have to rewrite this function, and instead the trie shouldn’t be excessively deep.

Mario Carneiro (Sep 26 2023 at 12:58):

or lean should just get good and handle deep stacks

Joachim Breitner (Sep 26 2023 at 12:59):

s/or/and/ :-)

Mario Carneiro (Sep 26 2023 at 13:00):

another possibility is to detect the case when the array has length 1 and tail recurse then

Mario Carneiro (Sep 26 2023 at 13:00):

because I'm guessing that most of the deep recursion is coming from non-branching nodes

Joachim Breitner (Sep 26 2023 at 13:01):

Yes, definitely – that’s why I am compressing them in lean4#2557. But you’d still need to pass some form of context or continuation to rebuild that path here, wouldn’t you?

Joachim Breitner (Sep 26 2023 at 13:03):

I am still wondering what to think of meta code from std not being compiled…
do we have a sense of how much slow down this causes? Both for compiling mathlib, but also for things like people running apply interactively?
Is it possible to precompileImports selectively (e.g. “precompile all of Std, Mathlib.Meta, Mathlib.Tactics”)

Mario Carneiro (Sep 26 2023 at 13:05):

partial def Trie.mapArrays (t : Trie α s) (f : Array α  Array β) : Trie β s :=
  go t id
where
  go
  | .node vs children, k =>
    if children.size == 1 then
      let (key, t') := children[0]!
      go t' fun c => k (.node (f vs) #[(key, c)])
    else
      k (.node (f vs) (children.map fun (k, t') => (k, t'.mapArrays f)))

Mario Carneiro (Sep 26 2023 at 13:07):

I think it's like a 20%-40% speedup IIRC

Mario Carneiro (Sep 26 2023 at 13:07):

but it helps less than you might think because most of the heavy lifting tactics are compiled

Mario Carneiro (Sep 26 2023 at 13:08):

norm_num and ring benefit from compilation

Mario Carneiro (Sep 26 2023 at 13:09):

I'm not sure what you mean by running apply interactively

Mario Carneiro (Sep 26 2023 at 13:09):

apply is compiled

Mario Carneiro (Sep 26 2023 at 13:10):

No, it is not currently possible to use precompileImports selectively, and I agree that we want to do something like that

Joachim Breitner (Sep 26 2023 at 13:12):

And that go’s continuation will not itself grow large?

Joachim Breitner (Sep 26 2023 at 13:12):

With lists for the parents stack it gets even prettier:

/-- Apply a function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α  Array β) : Trie β s :=
  let .node vs0 cs0 := t
  go [] #[] (f vs0) cs0.toList
where
  go  | [], cs', vs', []                       => .node vs' cs'
      | (cs', vs',  cs, k)::ps, cs'', vs'', [] => go ps (cs'.push (k, .node vs'' cs'')) vs' cs
      | ps, cs', vs', (k, .node vs0 cs0)::cs   => go ((cs', vs', cs, k)::ps) #[] (f vs0) cs0.toList

Joachim Breitner (Sep 26 2023 at 13:12):

Sorry, I meant apply?.

Mario Carneiro (Sep 26 2023 at 13:15):

A closure stack is basically an ad-hoc linked list

Mario Carneiro (Sep 26 2023 at 13:17):

The continuation there will only grow as long as the length of a run of length-1 arrays, because it is reset in the else case (which recurses via the call to mapArrays)

Mario Carneiro (Sep 26 2023 at 13:18):

I guess it could use go t' id there instead, not sure if lean knows to inline it or whether it will avoid doing so because of the recursion

Joachim Breitner (Sep 26 2023 at 13:23):

I expect these runs to be almost as long as the height of the tree, so the resetting won't safe us in this case.

Mario Carneiro (Sep 26 2023 at 13:23):

no, long runs are fine here

Mario Carneiro (Sep 26 2023 at 13:25):

hm, actually maybe not, when evaluating the closure stack it doesn't tail-call even though it is calling itself because it is using Ap instead of FAp for the call

Mario Carneiro (Sep 26 2023 at 13:29):

so you would need to defunctionalize

partial def Trie.mapArrays (t : Trie α s) (f : Array α  Array β) : Trie β s :=
  go t []
where
  go
  | .node vs children, k =>
    if children.size == 1 then
      let (key, t') := children[0]!
      go t' ((f vs, key) :: k)
    else
      ret k (.node (f vs) (children.map fun (k, t') => (k, go t' [])))
  ret
  | [], t => t
  | (vs', key) :: k, c => ret k (.node vs' #[(key, c)])

which loses a lot of the appeal

Joachim Breitner (Sep 26 2023 at 13:30):

Yeah, then you can just defunctionalize the whole thing

Mario Carneiro (Sep 26 2023 at 13:31):

the interpreter should really just get good, this is silly

Joachim Breitner (Sep 26 2023 at 13:31):

Good = arbitrary tail calls, or good = unbounded stack? (Or both)

Mario Carneiro (Sep 26 2023 at 13:32):

arbitrary tail calls

Mario Carneiro (Sep 26 2023 at 13:32):

there are defensible reasons for not having unbounded stack

Joachim Breitner (Sep 26 2023 at 13:32):

So you'd have to CPS, but wouldn't have to defunctionalize

Mario Carneiro (Sep 26 2023 at 13:32):

like being able to print that stack trace

Mario Carneiro (Sep 26 2023 at 13:33):

but it should be able to do tail calls to variable functions and other constant functions

Joachim Breitner (Sep 26 2023 at 14:48):

Looking at the lake code, it already seems well equipped the support selective precompilation of just some modules. It might suffice to change the per-library boolean queried in https://github.com/leanprover/lean4/blob/e6fe3bee71f359c866a29766f774864577099778/src/lake/Lake/Config/Module.lean#L119 with something selective (list of names as a first approximation).
@Mac Malone , would that be a useful contribution, or are there bigger issues with that I can't see yet?

Jannis Limperg (Sep 26 2023 at 14:54):

We used to enable Lake's precompileModules option for Aesop and std (?), but this broke the cache. I'm not sure what the current status of this issue is.

Joachim Breitner (Sep 26 2023 at 15:03):

Thanks, that explains why not even std is compiled; I assume there must have been a reason.

Sebastian Ullrich (Sep 26 2023 at 15:08):

The most recent investigation for mathlib was at https://github.com/leanprover-community/mathlib4/pull/3575. It looked like a 5% speedup was feasible but would certainly require more work in Lake and cache as well as testing on all platforms

Joachim Breitner (Sep 26 2023 at 16:40):

@Eric Wieser , I pushed my stackless implementation to your branch (which lets CI proceed), so that you are unblocked there.

Eric Wieser (Sep 26 2023 at 16:40):

I'm blocked by the dependent PR anyway, but thanks!

Joachim Breitner (Sep 26 2023 at 16:49):

Ok, in that case I better submit it as its own PR (mathlib4#7387), before more innocent PRs like yours get stuck due to this.

Eric Wieser (Oct 09 2023 at 20:11):

This is now the last dependency of #7244

Joachim Breitner (Oct 09 2023 at 20:54):

If you don't see anything wrong with it and it unblocks you, you can just merge it, I'd say.

Eric Wieser (Oct 09 2023 at 20:55):

I think I'd rather have the eyes of @Scott Morrison or @Mario Carneiro give it a final review

Joachim Breitner (Oct 09 2023 at 20:57):

:+1:


Last updated: Dec 20 2023 at 11:08 UTC