Zulip Chat Archive

Stream: general

Topic: knuth bendix completion

Jared green (Aug 11 2024 at 16:04):

what would it take to implement a knuth-bendix completion tactic (kb) where from available simp lemmas(if they happen to terminate), new unnamed ones are created out of critical pairs, until the equivalence to be proven can be proven by simp, and the new unnamed simp lemmas persist into later calls to kb, even in a different file (until they are named)?

Andrés Goens (Aug 15 2024 at 20:02):

hmm, that's an interesting thought. Not sure you want this as a tactic tbh, it's probably more like a custom command. Also the idea of persisting them in a different file gives me a bit of pause, Lean already generates a bunch of stuff while elaborating (like recursors, noconfusion lemmas, equational lemmas on recursive definitions,etc), why woulnt they have a scheme like that and stay in memory?

What it would take depends a lot on how familiar you are with metaprogramming, I reckon. A baseline prototype should be relatively simple, but e.g. dealing with typclasses properly might be a bit more interesting, etc.

How seriously are you thinking about doing this? I've been thinking of implementing knuth bendix for a while, would be happy to help out with this

Jared green (Aug 15 2024 at 20:28):

i didnt know whether or how the persistance could be done, especially through imports, though surely @[simp] and other annotations do exactly that. i have no knowledge of how to do metaprogramming, only just now reading the book on it. the thought was triggered by seeing the equality saturation
https://github.com/opencompl/egg-tactic-code?tab=readme-ov-file
which does a similar thing, and i wondered if in some cases knuth bendix could be more efficient, or even a combination of both.

Andrés Goens (Aug 16 2024 at 05:45):

yeah, I think equality saturation is complementary to knuth bendix completion, knuth bendix should be better when it works (because it's a 1-off thing and gives you a set of rewrites, whereas equality saturation works for concrete terms) but often knuth bendix will not work/terminate, and there equality saturation should be better

Andrés Goens (Aug 16 2024 at 05:46):

btw, that's our old prototype repo that you linked, we've been working on a newer version based on it with @Marcus Rossel https://github.com/marcusrossel/lean-egg

Andrés Goens (Aug 16 2024 at 05:47):

this one's still WIP but much more versatile (e.g. when dealing with typeclasses, etc), we should probably edit the readme on the old one to mention this

Jared green (Aug 16 2024 at 06:15):

as they come, neither one terminates generally(though for different reasons). but from what i read both are 'complete' in the same way. ('if from the provided equations a given equivalence can be proven, _ will be able to eventually prove it') only difference being that e-saturation doesn't need the initial equations to be reducing, or rely on a given ordering metric.
i have an idea for how e-saturation can stop much sooner, though im not sure whether it sacrifices completeness. it goes like this: alter the e-graph structure so that when a node is part of an e-node, all the links to its candidate parents are instead connected to the e-node. this way the same equivalence of the form x ~> t(x) doesnt end up being chained twice.
as for the combination, i was thinking that (non-simp and non-orientable) equations would be handled by e-sat after simp, then for each element of the resulting equivalence class, repeat the process and then pick the least one.

Andrés Goens (Aug 16 2024 at 09:04):

you're right, let me be more precise :sweat_smile: in the case that the word problem for the theory defined by the set of equations is decidable and a confluent, terminating rewrite system exists, both equality saturation and knuth bendix completion will solve the word problem for that set of equations. In that case, I think knuth bendix is preferable because it just has to be done once, and then simp will solve the word problem more efficiently for that set of equations.

I think the interesting case is the semi-decidable one. Knuth bendix will not terminate, but eqsat might. In practice, it seems eqsat always terminates way before saturation/quite quickly, or it doesn't terminate at all. In that case eqsat is preferable because it works effectively as a semi-decision procedure, whereas knuth bendix just doesn't work at all (I guess you could just stop knuth bendix at some point, take some of the equations generated byit and use them for simp as a semi-decision procedure too, that's an interesting comparison and I'm not sure what would win there)

Andrés Goens (Aug 16 2024 at 09:04):

Jared green said:

i have an idea for how e-saturation can stop much sooner, though im not sure whether it sacrifices completeness. it goes like this: alter the e-graph structure so that when a node is part of an e-node, all the links to its candidate parents are instead connected to the e-node. this way the same equivalence of the form x ~> t(x) doesnt end up being chained twice.

not sure I quite understand this, what do you mean when a node is part of an e-node?

Andrés Goens (Aug 16 2024 at 09:06):

Jared green said:

as for the combination, i was thinking that (non-simp and non-orientable) equations would be handled by e-sat after simp, then for each element of the resulting equivalence class, repeat the process and then pick the least one.

oh, that's an interesting thought, using simp before eqsat! might actually be more efficient, not sure, we should test that :thinking: but where does knuth bendix come in here?

Jared green (Aug 16 2024 at 13:31):

i meant a term node, which represents either a value/variable or a function, and links to its inputs as its children. in an e-graph, term nodes are placed inside an e-node, representing an equivalence class, so that any one term in it can be switched for another, without storing full terms.
as for the combination, knuth bendix would be the outer loop, and the reductions that knuth bendix does would be done in that way.

Andrés Goens (Aug 16 2024 at 13:52):

i meant a term node, which represents either a value/variable or a function, and links to its inputs as its children. in an e-graph, term nodes are placed inside an e-node, representing an equivalence class, so that any one term in it can be switched for another, without storing full terms.

nope, not really, in an e-graph, terms are built from e-nodes, which have e-classes as children, and then e-classes have multiple e-nodes representing the equivalence classes. Full terms can then be extracted by selecting e-nodes for every e-class. What you are calling an e-node sounds like a mix of an e-class and an e-node.

Andrés Goens (Aug 16 2024 at 13:53):

as for the combination, knuth bendix would be the outer loop, and the reductions that knuth bendix does would be done in that way.

So you want eqsat to prove the equations generated from knuth bendix, do I get that right?

Jared green (Aug 16 2024 at 13:58):

it didnt look that way in the presentation i watched.
and no, e-sat would be used to find new terms when only non-orientable equations are applicable to a simplified term, and if any of the new terms can be further simplified, we try simplifying them.

Andrés Goens (Aug 16 2024 at 13:59):

it didnt look that way in the presentation i watched.

what presentation was that?

Jared green (Aug 16 2024 at 14:01):

for knuth bendix algorithm, to prove an equivalence, each time a critical pair is found, reduction is tried on both sides of it and if they converge, thats the proof.

Jared green (Aug 16 2024 at 14:05):

https://www.youtube.com/watch?v=ZLkHl7YmpIY 3 minutes in, i must have mispercieved it.

Andrés Goens (Aug 16 2024 at 14:09):

Jared green said:

https://www.youtube.com/watch?v=ZLkHl7YmpIY 3 minutes in, i must have mispercieved it.

(just for context, to be sure, did you look at the author list at the beginning of that talk?)

Jared green (Aug 16 2024 at 14:10):

i did now

Andrés Goens (Aug 16 2024 at 14:14):

it's a bit confusing the terminology, so I'm still not sure what you mean with how to speed it up, but it would be cool to understand and try it out maybe!

Jared green (Aug 16 2024 at 14:16):

somehow i actually remember a earlier version being different...

Jared green (Aug 16 2024 at 14:18):

the combination i suggest, im not even sure its necessarily faster than (unfailing) KB, but at least it doesnt require backtracking.

Andrés Goens (Aug 16 2024 at 14:20):

I see, sounds interesting. If you want to give that a try I'm happy to give you a hand (also with using our eqsat tactic for that version)

Jared green (Aug 16 2024 at 14:50):

if we do, what are we using as a reference implementation of KB?

Jared green (Aug 16 2024 at 17:38):

and whose github will it be on?

Jared green (Aug 17 2024 at 17:04):

it occurs to me that the combination i came up with would, instead of creating critical pairs, create critical sets of n terms from which (n-1) equations would be generated, in each iteration of its main loop.

Notification Bot (Aug 20 2024 at 06:56):

26 messages were moved from this topic to #general > simp local confluence checker by Joachim Breitner.

Last updated: May 02 2025 at 03:31 UTC