Zulip Chat Archive
Stream: mathlib4
Topic: CI failure when building MathlibExtras
Eric Wieser (Sep 22 2023 at 22:03):
I'm getting this rather alarming CI failure
[3777/3779] Building MathlibExtras.LibrarySearch
error: > LEAN_PATH=./lake-packages/std/build/lib:./lake-packages/Qq/build/lib:./lake-packages/aesop/build/lib:./lake-packages/Cli/build/lib:./lake-packages/proofwidgets/build/lib:./build/lib LD_LIBRARY_PATH=./build/lib /home/lean/.elan/toolchains/leanprover--lean4---v4.1.0-rc1/bin/lean ./././MathlibExtras/LibrarySearch.lean -R ././. -o ./build/lib/MathlibExtras/LibrarySearch.olean -i ./build/lib/MathlibExtras/LibrarySearch.ilean -c ./build/ir/MathlibExtras/LibrarySearch.c
error: stderr:
libc++abi: terminating due to uncaught exception of type lean::throwable: deep recursion was detected at 'interpreter' (potential solution: increase stack space in your system)
interpreter stacktrace:
#1 Mathlib.Tactic.LibrarySearch.buildDiscrTree._lambda_1
#2 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#3 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
#4 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#5 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
<about another 5000 repeats of the last two lines>
#9902 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#9903 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_3._rarg
#9904 Lean.Meta.DiscrTree.Trie.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_2._rarg
#9905 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg._lambda_1
#9906 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg._lambda_1._boxed
#9907 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9908 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9909 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9910 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9911 Array.mapMUnsafe.map._at.Lean.Meta.DiscrTree.mapArrays._spec_6._rarg
#9912 Lean.PersistentHashMap.mapMAux._at.Lean.Meta.DiscrTree.mapArrays._spec_5._rarg
#9913 Lean.PersistentHashMap.mapM._at.Lean.Meta.DiscrTree.mapArrays._spec_4._rarg
#9914 Lean.Meta.DiscrTree.mapArraysM._at.Lean.Meta.DiscrTree.mapArrays._spec_1._rarg
#9915 Mathlib.Tactic.DiscrTreeCache.mk._at.Mathlib.Tactic.LibrarySearch.buildDiscrTree._spec_4._lambda_4
#9916 Mathlib.Tactic.DiscrTreeCache.mk._at.Mathlib.Tactic.LibrarySearch.buildDiscrTree._spec_4._lambda_4._boxed
#9917 Mathlib.Tactic.DeclCache.mk._rarg._lambda_1
#9918 Mathlib.Tactic.DeclCache.mk._rarg._lambda_1._boxed
#9919 Lean.profileitM._at.Lean.Meta.synthInstance?._spec_16._rarg
#9920 Mathlib.Tactic.DeclCache.mk._rarg._lambda_2
#9921 Mathlib.Tactic.DeclCache.mk._rarg._lambda_2._boxed
#9922 Mathlib.Tactic.Cache.get._rarg._lambda_6
Scott Morrison (Sep 22 2023 at 23:32):
Curiously, I cannot reproduce locally!
Sebastian Ullrich (Sep 23 2023 at 07:56):
You two have different ulimit -s
values probably
Sebastian Ullrich (Sep 23 2023 at 07:57):
It would be good to homogenize that setting somewhere along the path. In Lake?
Eric Wieser (Sep 23 2023 at 08:16):
Is the assumption then that this isn't an infinite loop, and that my PR genuinely just needs an awful lot of stack space?
Sebastian Ullrich (Sep 23 2023 at 08:31):
It looks like it's linear in the height of the tree. In which case that might not be a reasonable height for a discrimination tree!
Eric Wieser (Sep 23 2023 at 09:00):
Does this suggest that recursion-based processing of discrimination trees is a bad idea?
Eric Wieser (Sep 23 2023 at 09:01):
Or simply that we should expect high stack usage?
Joachim Breitner (Sep 23 2023 at 09:13):
The problematic code is in Mathlib.Lean.Meta.DiscrTree
, the implementation of DiscrTree.mapArrays
. I wonder if the monadic mapM
on arrays is a problem, stack-space wise? It seems only a pure map is needed here in the end. Although the stack trace doesn’t look like the array length matters…
Scott Morrison (Sep 23 2023 at 10:03):
@Eric Wieser, do you have a branch exhibiting this problem? You workflow log seems to be for some previous commit?
Eric Wieser (Sep 23 2023 at 10:04):
It is indeed for a previous commit on the branch
Scott Morrison (Sep 23 2023 at 10:04):
I made a non-monadic version of DiscrTree.mapArrays
.
Scott Morrison (Sep 23 2023 at 10:04):
Could you make a branch for me, and tell me the name?
Eric Wieser (Sep 23 2023 at 10:05):
Well, there's a "branch" called 5654bed32ccdb8ae40830e4094050a385fd4b299
:wink:
Eric Wieser (Sep 23 2023 at 10:05):
(which you can get locally with git fetch origin 5654bed32ccdb8ae40830e4094050a385fd4b299
)
Eric Wieser (Sep 23 2023 at 10:07):
This is a commit from #7244
Scott Morrison (Sep 23 2023 at 10:07):
Okay, you can try merging branch#DiscrTree.mapArrays in and see if it helps.
Scott Morrison (Sep 23 2023 at 10:08):
(If you'd made a branch for me I would have done the merge for you, and even followed up to check the CI...)
Eric Wieser (Sep 23 2023 at 10:21):
That branch doesn't appear to exist
Mauricio Collares (Sep 23 2023 at 10:32):
Joachim Breitner (Sep 23 2023 at 10:35):
@Scott Morrison , isn’t the monadic version unused anyways and can be replaced with the pure one in any case? (Unless I mis-grepped)
Eric Wieser (Sep 23 2023 at 10:47):
Scott Morrison said:
(If you'd made a branch for me I would have done the merge for you, and even followed up to check the CI...)
I've cherry-picked your change into #7244, as the head of that branch is still failing in the same way (and I'm done with the other changes now)
Eric Wieser (Sep 23 2023 at 12:24):
Thanks for trying Scott, but I'm afraid this didn't help
Scott Morrison (Sep 24 2023 at 00:48):
Can you reproduce locally?
Scott Morrison (Sep 24 2023 at 00:48):
I have some more code to inspect DiscrTree
s, I can send you this to try to find what is going wrong.
Scott Morrison (Sep 24 2023 at 00:50):
On master
, we already have some excessively long entries in the DiscrTree (2596!) I'm guessing you've just set a new record and gone over some bound.
Scott Morrison (Sep 24 2023 at 00:50):
Presumably the fix is to truncate the keys before we insert them into the DiscrTree.
Scott Morrison (Sep 24 2023 at 00:51):
Current longest key is for src#PowerBasis.quotientEquivQuotientMinpolyMap_symm_apply
Scott Morrison (Sep 24 2023 at 01:19):
This is the current "spectrum" of key lengths in the exact?
discrimination tree:
[(1, 19243), (2, 8706), (3, 10585), (4, 8973), (5, 10958), (6, 10332), (7, 9878), (8, 8599),
(9, 7929), (10, 7187), (11, 8856), (12, 6699), (13, 8449), (14, 5550), (15, 5543), (16, 5001),
(17, 5433), (18, 4457), (19, 4319), (20, 3699), (21, 3402), (22, 3227), (23, 3397), (24, 2777),
(25, 3211), (26, 2355), (27, 2334), (28, 2422), (29, 1933), (30, 1844), (31, 1807), (32, 1592),
(33, 1595), (34, 1624), (35, 1293), (36, 1310), (37, 1164), (38, 1263), (39, 1122), (40, 1138),
(41, 1038), (42, 950), (43, 905), (44, 743), (45, 776), (46, 753), (47, 740), (48, 692),
(49, 708), (50, 624), (51, 614), (52, 645), (53, 503), (54, 517), (55, 580), (56, 411), (57, 457),
(58, 478), (59, 461), (60, 398), (61, 366), (62, 327), (63, 339), (64, 341), (65, 332), (66, 295),
(67, 266), (68, 297), (69, 288), (70, 283), (71, 297), (72, 261), (73, 253), (74, 223), (75, 215),
(76, 238), (77, 214), (78, 213), (79, 227), (80, 200), (81, 181), (82, 185), (83, 188), (84, 153),
(85, 195), (86, 150), (87, 170), (88, 133), (89, 149), (90, 136), (91, 146), (92, 121), (93, 113),
(94, 139), (95, 131), (96, 130), (97, 96), (98, 116), (99, 123), (100, 115), (101, 91), (102, 73),
(103, 96), (104, 97), (105, 88), (106, 99), (107, 97), (108, 86), (109, 73), (110, 55), (111, 73),
(112, 81), (113, 76), (114, 65), (115, 47), (116, 78), (117, 68), (118, 75), (119, 58), (120, 59),
(121, 70), (122, 46), (123, 42), (124, 64), (125, 44), (126, 57), (127, 46), (128, 60), (129, 47),
(130, 55), (131, 49), (132, 52), (133, 50), (134, 51), (135, 36), (136, 31), (137, 38), (138, 35),
(139, 34), (140, 23), (141, 30), (142, 35), (143, 40), (144, 28), (145, 43), (146, 32), (147, 29),
(148, 19), (149, 24), (150, 37), (151, 32), (152, 20), (153, 27), (154, 28), (155, 22), (156, 27),
(157, 28), (158, 25), (159, 19), (160, 28), (161, 33), (162, 21), (163, 38), (164, 18), (165, 25),
(166, 28), (167, 18), (168, 25), (169, 12), (170, 19), (171, 15), (172, 16), (173, 13), (174, 20),
(175, 32), (176, 26), (177, 32), (178, 21), (179, 22), (180, 10), (181, 10), (182, 14), (183, 18),
(184, 11), (185, 11), (186, 7), (187, 14), (188, 17), (189, 21), (190, 7), (191, 12), (192, 16),
(193, 7), (194, 16), (195, 11), (196, 9), (197, 9), (198, 8), (199, 14), (200, 7), (201, 16),
(202, 11), (203, 6), (204, 7), (205, 5), (206, 14), (207, 20), (208, 12), (209, 16), (210, 15),
(211, 9), (212, 13), (213, 12), (214, 11), (215, 9), (216, 19), (217, 14), (218, 8), (219, 8),
(220, 4), (221, 5), (222, 14), (223, 6), (224, 11), (225, 1), (226, 12), (227, 12), (228, 7),
(229, 3), (230, 5), (231, 5), (232, 6), (233, 2), (234, 11), (235, 10), (236, 12), (237, 7),
(238, 3), (239, 3), (240, 2), (241, 6), (242, 5), (243, 8), (244, 11), (245, 9), (246, 5), (247, 4),
(248, 11), (249, 6), (250, 8), (251, 3), (252, 8), (253, 5), (254, 5), (255, 6), (256, 1), (257, 10),
(258, 5), (259, 8), (260, 4), (261, 6), (262, 3), (263, 3), (264, 9), (265, 4), (266, 2), (267, 5),
(268, 3), (269, 2), (270, 6), (271, 7), (272, 4), (273, 4), (274, 4), (275, 6), (276, 2), (277, 1),
(278, 1), (279, 4), (280, 2), (281, 3), (282, 3), (284, 4), (285, 2), (286, 2), (287, 4), (288, 4),
(289, 2), (290, 3), (291, 4), (292, 1), (293, 2), (295, 4), (296, 1), (297, 1), (298, 5), (299, 3),
(300, 4), (301, 3), (302, 2), (303, 3), (304, 5), (305, 3), (306, 1), (307, 2), (308, 4), (309, 2),
(310, 1), (312, 3), (314, 2), (315, 3), (316, 4), (317, 3), (318, 3), (319, 3), (320, 2), (321, 3),
(323, 1), (324, 3), (325, 1), (326, 3), (327, 5), (328, 1), (329, 6), (330, 3), (331, 1), (332, 5),
(333, 1), (334, 1), (336, 3), (338, 3), (339, 2), (340, 1), (341, 1), (342, 2), (343, 2), (344, 2),
(346, 4), (347, 1), (348, 1), (349, 2), (350, 3), (352, 4), (353, 3), (354, 3), (355, 2), (356, 2),
(357, 1), (358, 1), (359, 1), (360, 1), (361, 2), (362, 1), (364, 1), (367, 2), (368, 1), (369, 5),
(372, 1), (373, 5), (374, 2), (375, 1), (379, 3), (382, 2), (384, 1), (385, 3), (386, 2), (387, 2),
(388, 1), (392, 1), (393, 2), (394, 1), (395, 2), (396, 1), (397, 3), (398, 1), (400, 2), (401, 1),
(403, 1), (404, 3), (406, 1), (407, 2), (409, 3), (410, 1), (411, 3), (414, 1), (415, 2), (416, 1),
(417, 2), (419, 1), (421, 2), (422, 2), (423, 3), (424, 1), (425, 2), (426, 1), (428, 1), (429, 2),
(430, 4), (431, 1), (432, 2), (433, 1), (436, 1), (438, 2), (439, 2), (441, 1), (442, 1), (449, 1),
(451, 1), (453, 1), (454, 3), (459, 1), (461, 1), (462, 2), (466, 2), (471, 1), (472, 3), (474, 2),
(475, 1), (481, 1), (485, 2), (488, 1), (489, 1), (490, 2), (492, 1), (494, 1), (499, 2), (501, 1),
(503, 1), (509, 1), (510, 2), (511, 1), (512, 1), (513, 1), (515, 2), (517, 4), (518, 1), (520, 1),
(521, 2), (524, 2), (525, 2), (533, 2), (535, 1), (539, 2), (544, 1), (545, 1), (549, 1), (554, 3),
(555, 2), (560, 1), (566, 3), (570, 1), (575, 2), (582, 2), (585, 2), (587, 1), (591, 1), (592, 1),
(595, 1), (597, 3), (598, 1), (599, 2), (602, 1), (606, 1), (608, 1), (611, 2), (612, 1), (613, 1),
(614, 1), (619, 2), (621, 2), (626, 1), (628, 2), (635, 1), (657, 1), (667, 1), (676, 1), (693, 3),
(697, 1), (717, 1), (723, 2), (743, 2), (755, 1), (766, 1), (771, 1), (773, 1), (777, 1), (812, 1),
(813, 1), (826, 1), (831, 2), (845, 2), (852, 1), (895, 1), (926, 2), (960, 1), (965, 2), (1000, 1),
(1187, 1), (1376, 1), (1397, 1), (1482, 1), (1543, 1), (1572, 1), (1660, 1), (1710, 1), (1968, 1),
(2053, 1), (2597, 1)]
Scott Morrison (Sep 24 2023 at 01:23):
I propose we just truncate at 1000. This will be "safe" for this stackoverflow issue, and affects vanishingly few lemmas.
It's possible that truncating won't even prevent those few lemmas from being applied (possibly if we truncate carefully).
However I'm inclined to not spend the time investigating that, and just leave a note.
Scott Morrison (Sep 24 2023 at 01:32):
@Eric Wieser, can I leave #7345 in your hands?
Joachim Breitner (Sep 24 2023 at 07:36):
This is a trie data structure, right? We could implement path compression: a sequence of degree one nodes without associated values is stored in a flat array. Should also greatly reduce the footprint of the trie.
Joachim Breitner (Sep 24 2023 at 07:46):
I see more optimization potential there (array of tuples looks more expensive in terms of memory footprint and cache lines than a tuple of arrays). The DiscrTree is used all over the place, so maybe worthwhile to see what middle-hanging fruit there is left to pluck?
Sebastian Ullrich (Sep 24 2023 at 07:48):
Looking into optimizations would be great as we do use it all over the place! But also it is an approximating data structure. We should evaluate whether there is even any benefit of keys of length 1000 compared to, say, 100, or even 10.
Joachim Breitner (Sep 24 2023 at 08:03):
If we had dependent arrays we could safely build compact data structures (e.g. a ArrayPair a b
stored as a single array with 2n elements).
Joachim Breitner (Sep 24 2023 at 15:28):
(see <https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/More.20compact.20DiscrTree> for a possible way to make that data structure smaller and much less likely to cause stack issues.)
Eric Wieser (Sep 25 2023 at 16:49):
Scott Morrison said:
Eric Wieser, can I leave #7345 in your hands?
I tested it but unfortunately it didn't help.
Joachim Breitner (Sep 25 2023 at 21:17):
https://github.com/leanprover/lean4/pull/2577 should help reliably, but it's not decided yet if that change will go in
Scott Morrison (Sep 25 2023 at 23:27):
@Eric Wieser there's a toolchain for that PR as leanprover/lean4-pr-releases:pr-release-2577. Could you test your branch against that?
I'm worried that if truncating at 1000 didn't help, we are misdiagnosing the problem.
Eric Wieser (Sep 25 2023 at 23:29):
I likely won't have any time to do that tomorrow, but feel free to push whatever to #7244 to diagnose as long as you don't rewrite the history of commits authored by me
Eric Wieser (Sep 25 2023 at 23:29):
(force-pushing away my cherry-pick of your change is fine)
Scott Morrison (Sep 25 2023 at 23:30):
No hurry. @Joachim Breitner's lean4#2577 may not happen quickly, in any case.
Eric Wieser (Sep 25 2023 at 23:30):
(relatedly, a review of #7237 would substantially decrease the size of that PR)
Eric Wieser (Sep 25 2023 at 23:31):
I assume you mean lean4#2577?
Joachim Breitner (Sep 26 2023 at 11:23):
I pushed a CPS’sed and thus tail-recursive implementation of Trie.mapArray
to #7244… but it still overflows the stack.
It says “interpreter stacktrace” – does the interpreter not perform tail-calls without using stack space?
And is it expected that there is an interpreter involved, shouldn’t this be using compiled code?
Mario Carneiro (Sep 26 2023 at 11:31):
Is there a need for this to be monadic?
Mario Carneiro (Sep 26 2023 at 11:33):
CPS'ing this function doesn't help, you still will get a giant stack when evaluating the closure you have built up
Joachim Breitner (Sep 26 2023 at 11:33):
No, but switching to a pure mapArray
didn’t help, Scott tried that already (see https://leanprover.zulipchat.com/#narrow/stream/287929-mathlib4/topic/CI.20failure.20when.20building.20MathlibExtras/near/392674332)
Joachim Breitner (Sep 26 2023 at 11:34):
Hmm, I see. I was expecting that if each closure itself continues in a tail-recursive way, no stack would be used.
Joachim Breitner (Sep 26 2023 at 11:34):
(Of course, the nested closures would be giant on the heap, but that’s heap space then.)
Mario Carneiro (Sep 26 2023 at 11:34):
this is the interpreter stack we're talking about, it's already "on the heap"
Joachim Breitner (Sep 26 2023 at 11:35):
So the interpreter doesn’t do tail-call optimization?
Mario Carneiro (Sep 26 2023 at 11:35):
I ... don't think so?
Mario Carneiro (Sep 26 2023 at 11:36):
the compiler is the one that does TCO
Mario Carneiro (Sep 26 2023 at 11:36):
including the byte code compiler which prepares things for the interpreter
Joachim Breitner (Sep 26 2023 at 11:37):
Ok, then I was too optimistic :-)
Joachim Breitner (Sep 26 2023 at 11:38):
But what about the other question: Why is this interpreted, and not compiled? Should it be that way?
Mario Carneiro (Sep 26 2023 at 11:39):
why would it be compiled? Isn't it just an #eval
in MathlibExtras
?
Joachim Breitner (Sep 26 2023 at 11:40):
(It seems that In src/library/compiler/ir_interpreter.cpp has support for non-mutual tail-recursion, but no other mentions of tail
in that file.)
Joachim Breitner (Sep 26 2023 at 11:43):
I am still ignorant of when things are evaluated/executed how :-).
Maybe again optimistically, I’d assume it would use compiled code when evaluating imported things – the intepreter code says “Whenever possible, we try to switch to native code by checking for the mangled symbol”
So if we’d rewrite that from being an #eval
to a proper executable that loads Mathlib
from it’s main
function it would be using compiled?
Joachim Breitner (Sep 26 2023 at 11:43):
Would it make a difference whether Trie.mapArray
is defined in mathlib, std or init?
Mario Carneiro (Sep 26 2023 at 11:46):
Here's a slightly cleaned up version of the IR for these two functions (from set_option trace.compiler.ir.result true
):
def go._rarg._lambda_1 (x_1 : u8) (x_2 : obj) (x_3 : obj) (x_4 : obj) : obj :=
let x_5 : obj := ctor_0[Lean.Meta.DiscrTree.Trie.node] x_2 x_4;
let x_6 : obj := app x_3 x_5;
ret x_6
def go._rarg (x_1 : u8) (x_2 : ◾) (x_3 : obj) (x_4 : obj) (x_5 : obj) : obj :=
let x_6 : obj := proj[0] x_4;
inc x_6;
let x_7 : obj := proj[1] x_4;
inc x_7;
dec x_4;
inc x_3;
let x_8 : obj := app x_3 x_6;
let x_9 : obj := Array.size ◾ x_7;
let x_10 : obj := Array.mkEmpty ◾ x_9;
dec x_9;
let x_11 : obj := box x_1;
let x_12 : obj := pap go._rarg._lambda_1._boxed x_11 x_8 x_5;
let x_13 : obj := goA._rarg x_1 ◾ x_3 x_7 x_10 x_12;
ret x_13
def goA._rarg._lambda_1 (x_1 : obj) (x_2 : obj) (x_3 : u8) (x_4 : obj) (x_5 : obj) (x_6 : obj) (x_7 : obj) : obj :=
let x_8 : obj := ctor_0[Prod.mk] x_1 x_7;
let x_9 : obj := Array.push ◾ x_2 x_8;
let x_10 : obj := goA._rarg x_3 ◾ x_4 x_5 x_9 x_6;
ret x_10
def goA._rarg (x_1 : u8) (x_2 : ◾) (x_3 : obj) (x_4 : obj) (x_5 : obj) (x_6 : obj) : obj :=
let x_7 : obj := Array.size ◾ x_5;
let x_8 : obj := Array.size ◾ x_4;
let x_9 : u8 := Nat.decLt x_7 x_8;
dec x_8;
case x_9 : u8 of
Bool.false →
dec x_7;
dec x_4;
dec x_3;
let x_10 : obj := app x_6 x_5;
ret x_10
Bool.true →
let x_11 : obj := Array.get ◾ x_4 x_7;
dec x_7;
let x_12 : obj := proj[0] x_11;
inc x_12;
let x_13 : obj := proj[1] x_11;
inc x_13;
dec x_11;
let x_14 : obj := box x_1;
inc x_3;
let x_15 : obj := pap goA._rarg._lambda_1._boxed x_12 x_5 x_14 x_3 x_4 x_6;
let x_16 : obj := go._rarg x_1 ◾ x_3 x_13 x_15;
ret x_16
Mario Carneiro (Sep 26 2023 at 11:48):
One thing that immediately sticks out is that if the interpreter wanted to notice that e.g. the call to go._rarg
on the second to last line is a tail call it would have to look ahead to the ret
instruction
Mario Carneiro (Sep 26 2023 at 11:49):
IRs designed for tail calls make this syntactically obvious with a call like ret go._rarg x_1 ◾ x_3 x_13 x_15
, so this bodes ill for the IR having native tail call support
Mario Carneiro (Sep 26 2023 at 11:51):
Joachim Breitner said:
Maybe again optimistically, I’d assume it would use compiled code when evaluating imported things – the intepreter code says “Whenever possible, we try to switch to native code by checking for the mangled symbol”
That's the precompileImports
setting. We don't use it in mathlib because it has issues with having to dynamically link hundreds or thousands of files, which seems not to be an intended use case
Mario Carneiro (Sep 26 2023 at 11:54):
Joachim Breitner said:
So if we’d rewrite that from being an
#eval
to a proper executable that loadsMathlib
from it’smain
function it would be using compiled?
Yes, if you ran it as an executable (e.g. like lake exe cache
, which is compiled).
Would it make a difference whether Trie.mapArray is defined in mathlib, std or init?
No, you would have to compile any dependencies in order to make an executable, so it would be compiled no matter where it ends up (although you probably want to not pull in too much of mathlib to compile this executable)
Mario Carneiro (Sep 26 2023 at 11:57):
actually, it seems the interpreter does have tail recursion detection after all:
// tail recursion?
if (expr_tag(e) == expr_kind::FAp && expr_fap_fun(e) == get_frame().m_fn &&
fn_body_tag(cont) == fn_body_kind::Ret && !arg_is_irrelevant(fn_body_ret_arg(cont)) &&
arg_var_id(fn_body_ret_arg(cont)) == fn_body_vdecl_var(b)) {
Joachim Breitner (Sep 26 2023 at 11:58):
That’s what I saw, too. But that’s only recursion, not arbitrary calls, right?
Mario Carneiro (Sep 26 2023 at 11:59):
yeah, the first line is checking if it is a full application to the current function
Mario Carneiro (Sep 26 2023 at 12:00):
In any case I think the proper way to do this iteratively is using an explicit stack
Joachim Breitner (Sep 26 2023 at 12:05):
Let me try turning this into a single tail-recursive function, just to see if that optimization kicks in :-)
Joachim Breitner (Sep 26 2023 at 12:09):
We don't use it in mathlib because it has issues with having to dynamically link hundreds or thousands of files, which seems not to be an intended use case
I see that this would be expensive.
But I am a bit surprised. Do do all the tactics like simp
etc. run always completely interpreted? Isn’t this significantly slower than if we could run all meta-code in compiled form? (Is that part of what Sebastian wants to solve by having different import for meta and non-meta definitions)?
Mario Carneiro (Sep 26 2023 at 12:10):
no, things from init/lean/lake are compiled
Joachim Breitner (Sep 26 2023 at 12:17):
Ah, if mapArrays
was defined in init, it would have avoided the interpreter here.
Joachim Breitner (Sep 26 2023 at 12:23):
Thanks anyways, that was very helpful. Just to double check that I got this right:
In mathlib, meta-code (tactics, elaboration) is performed using compiled code if its defined in the lean
repo, but if it’s defined in std
or aesop
or mathlib
, it’s executed by the IR interpreter.
In particular, all the data structures defined in std
, and operations in std
for existing data structures, are not necessary as fast as they could be.
Mario Carneiro (Sep 26 2023 at 12:24):
here's my attempt at turning this into a single tail-recursive function with an explicit stack:
inductive Cont (α β : Type) (s)
| nil : Cont α β s
| cons : Array β → Array (Key s × Trie α s) → Array (Key s × Trie β s) →
Key s → Cont α β s → Cont α β s
inductive State (α β : Type) (s)
| arr : Array β → Array (Key s × Trie α s) → Array (Key s × Trie β s) →
Cont α β s → State α β s
| ret : Cont α β s → Trie β s → State α β s
/-- Apply a monadic function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α → Array β) : Trie β s :=
go (start t .nil)
where
@[inline] start : Trie α s → Cont α β s → State α β s
| .node vs cs, k => .arr (f vs) cs (.mkEmpty cs.size) k
go : State α β s → Trie β s
| .arr vs' cs cs' k =>
if h : cs'.size < cs.size then
let (key, c) := cs.get ⟨cs'.size, h⟩
go (start c (.cons vs' cs cs' key k))
else
go (.ret k (.node vs' cs'))
| .ret .nil t => t
| .ret (.cons vs' cs cs' key k) c' =>
go (.arr vs' cs (cs'.push (key, c')) k)
Mario Carneiro (Sep 26 2023 at 12:25):
yes to the stuff about compilation in std/aesop/mathlib
Joachim Breitner (Sep 26 2023 at 12:48):
Nice; is that a straight forward defunctionalization of the CPS variant?
Here is another variant, using arrays for the explicit stack (essentially an implicit zipper). Fun puzzle indeed :-)
/-- Apply a function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α → Array β) : Trie β s :=
let .node vs0 cs0 := t
go #[] #[] (f vs0) 0 cs0
where
go parents cs' vs' i cs :=
if h : i < cs.size then -- move down
let (k, .node vs0 cs0) := cs.get ⟨i, h⟩
go (parents.push (cs', vs', i, cs, k)) #[] (f vs0) 0 cs0
else
let t := .node vs' cs'
if parents.isEmpty then -- we are done
t
else -- move up one level
let (cs', vs', i, cs, k) := parents.back
go parents.pop (cs'.push (k, t)) vs' (i + 1) cs
Mario Carneiro (Sep 26 2023 at 12:55):
I don't know whether it is better to use an array or a list here; the Cont
type is list-like although it is a custom inductive so that it gets a flatter representation
Mario Carneiro (Sep 26 2023 at 12:56):
in your case you probably want to also use a custom inductive for the elements of the parents
array because otherwise you need 5 allocations for the tuples
Joachim Breitner (Sep 26 2023 at 12:58):
I mean, this is all mostly just idle finger exercises. I don't think we should have to rewrite this function, and instead the trie shouldn’t be excessively deep.
Mario Carneiro (Sep 26 2023 at 12:58):
or lean should just get good and handle deep stacks
Joachim Breitner (Sep 26 2023 at 12:59):
s/or/and/ :-)
Mario Carneiro (Sep 26 2023 at 13:00):
another possibility is to detect the case when the array has length 1 and tail recurse then
Mario Carneiro (Sep 26 2023 at 13:00):
because I'm guessing that most of the deep recursion is coming from non-branching nodes
Joachim Breitner (Sep 26 2023 at 13:01):
Yes, definitely – that’s why I am compressing them in lean4#2557. But you’d still need to pass some form of context or continuation to rebuild that path here, wouldn’t you?
Joachim Breitner (Sep 26 2023 at 13:03):
I am still wondering what to think of meta code from std not being compiled…
do we have a sense of how much slow down this causes? Both for compiling mathlib, but also for things like people running apply
interactively?
Is it possible to precompileImports
selectively (e.g. “precompile all of Std
, Mathlib.Meta
, Mathlib.Tactics
”)
Mario Carneiro (Sep 26 2023 at 13:05):
partial def Trie.mapArrays (t : Trie α s) (f : Array α → Array β) : Trie β s :=
go t id
where
go
| .node vs children, k =>
if children.size == 1 then
let (key, t') := children[0]!
go t' fun c => k (.node (f vs) #[(key, c)])
else
k (.node (f vs) (children.map fun (k, t') => (k, t'.mapArrays f)))
Mario Carneiro (Sep 26 2023 at 13:07):
I think it's like a 20%-40% speedup IIRC
Mario Carneiro (Sep 26 2023 at 13:07):
but it helps less than you might think because most of the heavy lifting tactics are compiled
Mario Carneiro (Sep 26 2023 at 13:08):
norm_num
and ring
benefit from compilation
Mario Carneiro (Sep 26 2023 at 13:09):
I'm not sure what you mean by running apply
interactively
Mario Carneiro (Sep 26 2023 at 13:09):
apply
is compiled
Mario Carneiro (Sep 26 2023 at 13:10):
No, it is not currently possible to use precompileImports
selectively, and I agree that we want to do something like that
Joachim Breitner (Sep 26 2023 at 13:12):
And that go
’s continuation will not itself grow large?
Joachim Breitner (Sep 26 2023 at 13:12):
With lists for the parents stack it gets even prettier:
/-- Apply a function to the array of values at each node in a `DiscrTree`. -/
partial def Trie.mapArrays (t : Trie α s) (f : Array α → Array β) : Trie β s :=
let .node vs0 cs0 := t
go [] #[] (f vs0) cs0.toList
where
go | [], cs', vs', [] => .node vs' cs'
| (cs', vs', cs, k)::ps, cs'', vs'', [] => go ps (cs'.push (k, .node vs'' cs'')) vs' cs
| ps, cs', vs', (k, .node vs0 cs0)::cs => go ((cs', vs', cs, k)::ps) #[] (f vs0) cs0.toList
Joachim Breitner (Sep 26 2023 at 13:12):
Sorry, I meant apply?
.
Mario Carneiro (Sep 26 2023 at 13:15):
A closure stack is basically an ad-hoc linked list
Mario Carneiro (Sep 26 2023 at 13:17):
The continuation there will only grow as long as the length of a run of length-1 arrays, because it is reset in the else
case (which recurses via the call to mapArrays
)
Mario Carneiro (Sep 26 2023 at 13:18):
I guess it could use go t' id
there instead, not sure if lean knows to inline it or whether it will avoid doing so because of the recursion
Joachim Breitner (Sep 26 2023 at 13:23):
I expect these runs to be almost as long as the height of the tree, so the resetting won't safe us in this case.
Mario Carneiro (Sep 26 2023 at 13:23):
no, long runs are fine here
Mario Carneiro (Sep 26 2023 at 13:25):
hm, actually maybe not, when evaluating the closure stack it doesn't tail-call even though it is calling itself because it is using Ap
instead of FAp
for the call
Mario Carneiro (Sep 26 2023 at 13:29):
so you would need to defunctionalize
partial def Trie.mapArrays (t : Trie α s) (f : Array α → Array β) : Trie β s :=
go t []
where
go
| .node vs children, k =>
if children.size == 1 then
let (key, t') := children[0]!
go t' ((f vs, key) :: k)
else
ret k (.node (f vs) (children.map fun (k, t') => (k, go t' [])))
ret
| [], t => t
| (vs', key) :: k, c => ret k (.node vs' #[(key, c)])
which loses a lot of the appeal
Joachim Breitner (Sep 26 2023 at 13:30):
Yeah, then you can just defunctionalize the whole thing
Mario Carneiro (Sep 26 2023 at 13:31):
the interpreter should really just get good, this is silly
Joachim Breitner (Sep 26 2023 at 13:31):
Good = arbitrary tail calls, or good = unbounded stack? (Or both)
Mario Carneiro (Sep 26 2023 at 13:32):
arbitrary tail calls
Mario Carneiro (Sep 26 2023 at 13:32):
there are defensible reasons for not having unbounded stack
Joachim Breitner (Sep 26 2023 at 13:32):
So you'd have to CPS, but wouldn't have to defunctionalize
Mario Carneiro (Sep 26 2023 at 13:32):
like being able to print that stack trace
Mario Carneiro (Sep 26 2023 at 13:33):
but it should be able to do tail calls to variable functions and other constant functions
Joachim Breitner (Sep 26 2023 at 14:48):
Looking at the lake code, it already seems well equipped the support selective precompilation of just some modules. It might suffice to change the per-library boolean queried in https://github.com/leanprover/lean4/blob/e6fe3bee71f359c866a29766f774864577099778/src/lake/Lake/Config/Module.lean#L119 with something selective (list of names as a first approximation).
@Mac Malone , would that be a useful contribution, or are there bigger issues with that I can't see yet?
Jannis Limperg (Sep 26 2023 at 14:54):
We used to enable Lake's precompileModules
option for Aesop and std (?), but this broke the cache. I'm not sure what the current status of this issue is.
Joachim Breitner (Sep 26 2023 at 15:03):
Thanks, that explains why not even std
is compiled; I assume there must have been a reason.
Sebastian Ullrich (Sep 26 2023 at 15:08):
The most recent investigation for mathlib was at https://github.com/leanprover-community/mathlib4/pull/3575. It looked like a 5% speedup was feasible but would certainly require more work in Lake and cache as well as testing on all platforms
Joachim Breitner (Sep 26 2023 at 16:40):
@Eric Wieser , I pushed my stackless implementation to your branch (which lets CI proceed), so that you are unblocked there.
Eric Wieser (Sep 26 2023 at 16:40):
I'm blocked by the dependent PR anyway, but thanks!
Joachim Breitner (Sep 26 2023 at 16:49):
Ok, in that case I better submit it as its own PR (mathlib4#7387), before more innocent PRs like yours get stuck due to this.
Eric Wieser (Oct 09 2023 at 20:11):
This is now the last dependency of #7244
Joachim Breitner (Oct 09 2023 at 20:54):
If you don't see anything wrong with it and it unblocks you, you can just merge it, I'd say.
Eric Wieser (Oct 09 2023 at 20:55):
I think I'd rather have the eyes of @Scott Morrison or @Mario Carneiro give it a final review
Joachim Breitner (Oct 09 2023 at 20:57):
:+1:
Last updated: Dec 20 2023 at 11:08 UTC