Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: dreamcoder

mikey liu (Apr 12 2021 at 23:28):

hi i've been playing with ATPs for a bit and just saw this paper called dreamcoder. it learns concepts by itself and is able to perform certain nontrivial tasks with just a few examples (sorting, painting specific images, etc.) it is also able to grow on its own so to speak by a process called dreaming, which i think is kinda like a more sophisticated 'exploratory play' in reinforcement learning. i felt like this might work very well with lean and other ATPs in general to further automate theorem proving. here's the link to the paper: https://arxiv.org/abs/2006.08381 and here's its github link (if you wanna install it search for 'singularity 2.5' and install that also): https://github.com/ellisk42/ec

Jason Rute (Apr 13 2021 at 02:41):

I’ve been meaning for a long time to write up something on this. I agree this paper has a lot of potential (so much that I feel if I start talking about all my ideas related to it that I won’t be able to stop :smile:).

Jason Rute (Apr 13 2021 at 02:41):

If I write something up it will be on the #Machine Learning for Theorem Proving steam.

Jason Rute (Apr 13 2021 at 02:43):

Also this might be a coincidence but Yannic Kilcher just made a video on this paper yesterday: https://m.youtube.com/watch?v=qtu0aSTDE2I

Jason Rute (Apr 13 2021 at 02:44):

Also Kevin Ellis (the lead author) has a video: https://m.youtube.com/watch?v=NYIeP1hns6A&feature=youtu.be

Jason Rute (Apr 13 2021 at 02:59):

@mikey liu have you been able to run it? If so how long does it take to train?

Notification Bot (Apr 13 2021 at 03:46):

This topic was moved here from #general > dreamcoder by Scott Morrison

mikey liu (Apr 13 2021 at 10:13):

Jason Rute said:

Also this might be a coincidence but Yannic Kilcher just made a video on this paper yesterday: https://m.youtube.com/watch?v=qtu0aSTDE2I

not a coincidence. i sub to his channel. otherwise i would not be able to dig up something like this.

Joe Palermo (S2'17) (Apr 13 2021 at 19:22):

This is a really cool paper! @Jason Rute I would love to hear your thoughts on it if you decide to do a writeup.

I can imagine applying this to theorem proving in Lean.

The wake phase could be proof search with a model like gpt-f.

The sleep-abstraction phase could involve constructing new tactics by searching for frequently used combinations of tactics. Essentially, abstracting new tactics from parts of proofs found during the wake phase.

The sleep-dreaming phase appears to require randomly generating new theorems. @Jason Rute , as you know I've been thinking a lot about how to do this in Lean (for others, see this thread: https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/Inferring.20Theorem.20from.20a.20Proof.20Term.3F).

I think being able to generate new theorems is critical to enable open-ended improvement in neural provers. I also think it may be possible to get a feedback loop working between a neural generator and a neural prover (basically a form of asymmetric self-play, unlike the symmetric forms of self-play seen in systems like AlphaZero/MuZero). I have some ideas for how to do this. One of the main problems is how you optimize the theorem generator. A purely adversarial objective (as in GANs) is probably not going to work. One idea is to optimize the generator to produce theorems that are near the limit of the prover's capability. Since the prover uses tree search (again let's assume a gpt-f like model), you can use the branch-factor of the prover's search tree as a metric for how hard the proof was. If the prover fails completely there's nothing you can say, except that it was too hard. But if it succeeds, then branch factor is a measure of how much search was required. An easy proof requires little search (branch factor could be as low as 1, if the search is linear). The prover's search times out after a certain number of steps, so one could optimize the generator (via RL) to produce theorems that when the prover tries to prove them result in a large branch factor but are still provable within the timeout limit.

Would love to know what people think of this idea, especially reasons it might not work or ways one could improve it.

Overall I think the wake-abstraction-dream cycle is a very interesting framework for advancing program synthesis / reasoning. I'm reminded of Francois Chollet's recent talk on program synthesis. I did a write up on it if you prefer reading. Basically he argues that abstraction is the key to generalization, and strong generalization from little experience is the essence of intelligence. Furthermore he argues that while deep learning is great at what he calls "value-centric abstraction" it cannot perform "program-centric abstraction", the solution to which he claims is program synthesis.

Jason Rute (Apr 14 2021 at 00:16):

For anyone looking for more details on DreamCoder (like how does it actually work), here is the extended version of the paper: https://web.mit.edu/ellisk/www/documents/dreamcoder_with_supplement.pdf

Jason Rute (Apr 14 2021 at 00:28):

Speaking of Chollet and his views, you may already know this, but there is a group (or two groups?) at MIT, working on applying DreamCoder to Chollet's ARC challenge. Here is a preliminary paper, a news article and an informal talk.

Joe Palermo (S2'17) (Apr 14 2021 at 16:40):

Thanks for sharing that. I was aware some people at MIT were working on applying dream coder to ARC but hadn't seen that talk. It's a very interesting talk. It seems like what they're proposing has a shot at really working on ARC.

Daniel Selsam (Apr 14 2021 at 17:57):

FYI @Ryan Krueger @Jesse Michael Han and I played around with ARC last year. We built an extremely flexible higher-order tactic-based synthesis framework for it (https://github.com/dselsam/arc). It is essentially a prototype of the framework we want to try on IMO. Our ARC solver didn't solve all of the training/eval problems, but I think it is just a few missing features/abstractions away from solving almost all of them.

Jason Rute (Apr 14 2021 at 23:01):

@Daniel Selsam My understanding of your ARC work (based on Jesse's writeup) is something along the lines of that you built a hand crafted DSL of components that had branching points. Then you efficiently searched over the branching points finding programs which could solve the ARC problem. When you say "a few missing features/abstractions" do you think that approach is scalable to solving ARC, or do you need machine learning to make the search space efficient (and possibly, like DreamCoder, expand your DSL)?

Jason Rute (Apr 14 2021 at 23:02):

And I should add that my understanding is that this DSL is in a new abstraction framework (StateT) which is supposed to simplify the search space for solving problems like this.

Daniel Selsam (Apr 15 2021 at 00:25):

Jason Rute said:

Daniel Selsam My understanding of your ARC work (based on Jesse's writeup) is something along the lines of that you built a hand crafted DSL of components that had branching points.

Our guiding principle was to not build a DSL. The tactics are arbitrary nondeterministic Haskell code that produce subproblems in arbitrary ways, and specify arbitrary ways of constructing guesses to the original problem given guesses to the subproblems. It is very similar to the Lean tactic framework, except the nondeterminism is reified rather than implicit in <|> calls.

Then you efficiently searched over the branching points finding programs which could solve the ARC problem.

We performed (un-optimized) search over the choice points reached by the nondeterministic tactics during execution.

When you say "a few missing features/abstractions" do you think that approach is scalable to solving ARC, or do you need machine learning to make the search space efficient (and possibly, like DreamCoder, expand your DSL)?

All three of us agreed to never even glimpse at the eval problems until near the very end. The system we built looking only at the training problems solved most of the training problems (I forget exactly but I think roughly ~70%). Then we ran it on the eval problems and it only got ~20%, but then in one weekend, without changing the architecture at all, we bumped this up to (I forget but roughly) ~60% by adding only (modular) features that we spotted in the eval set along with one or two new (modular) tactics. We didn't change the architecture at all. For both the training and eval sets, there are several other classes of problems that we had promising abstractions for but did not finish implementing, which we think would us get to "almost all". But there is still a pretty long tail of "one-offs" for which we didn't figure out how to write good abstractions for. Even after reaching "almost all", if Chollet studied our solver, he could probably construct an entire test set that humans could solve that our solver would fail on.

I think most of the abstractions need to be built-in, but to some extent, the better the heuristic search, the less precise you need to make the abstractions. So, great heuristics would help and could simplify parts of the codebase. I don't see how DL in particular would help much though. There isn't much data, and generatively modeling the problem distribution well enough to provide good synthetic data seems as hard as building a solver in the first place.

Daniel Selsam (Apr 15 2021 at 00:36):

Then you efficiently searched over the branching points finding programs which could solve the ARC problem.

We performed (un-optimized) search over the choice points reached by the nondeterministic tactics during execution.

This may be a subtle point so let me clarify. There is no DSL and there are no programs. The tactics execute nondeterministically and directly produce guesses. It would be easy to refactor so that the tactics return Haskell functions that map grids to guesses, but it would be a nightmare to try our approach with an actual concrete DSL. Our solver is much more like one big lean tactic than a traditional synthesis engine.

Jason Rute (Apr 15 2021 at 12:00):

@Daniel Selsam I clearly remembered it wrong, and this is my fault. If I understand correctly now, you wrote a nondeterministic tactic that given the three grid pairs and the unpaired grid as a "goal", finds the solution to the last grid. By "nondeterministic" you mean the tactic has branch points and making the execution of your tactic behave like a (DFS?) tree search, correct? (I'm unclear what "abstractions" mean here, but I think it is something like sub tactics that your main tactic calls.) Even with new tactic search framework, it is surprising that you claim you were able to almost solve ARC. I find this surprising since ARC is considered a big deal in the ML world, but you say this very nonchalantly, and it doesn't sound like you plan to work on it further. Do you consider this to be a missed opportunity since the official challenge is over, and you have already looked at the eval set? I think Chollet has a whole suit of secret problems he keeps adding to, but I admit I don't know when he is going to open it up for others to check against them, or what he thinks is the right way to work on ARC since data leakage is a real danger.

Joe Palermo (S2'17) (Apr 15 2021 at 14:58):

For some context, the winner of the ARC Kaggle competition shared these stats on their eval performance:

"Depth 3, without adding flipped images. Faster, but probably worse than a full run.

Total: 419 (Some of the 400 tasks have multiple tests)
Size : 409 (97.6% deduced the correct output size)
Cands: 176 (42% had the correct answer among the generated candidates)
Correct: 164 (39% gave the correct answer among top 3 answers)"

I think a "full run" for them implies searching to a max depth of 4, not sure how much that would improve the 39% eval accuracy they reported. Of course, their test accuracy fell to about 20%.

https://www.kaggle.com/c/abstraction-and-reasoning-challenge/discussion/154597

Daniel Selsam (Apr 15 2021 at 16:11):

Jason Rute said:

By "nondeterministic" you mean the tactic has branch points and making the execution of your tactic behave like a (DFS?) tree search, correct?

Yes. For example, there are many different ways of parsing shapes. Sometimes you need to treat diagonals as connections and sometimes you need to ignore them. Sometimes you need to consider multi-color shapes and sometimes color boundaries denote shape boundaries. There are many more variations. But a tactic will just say shapes <- parse grids where parse is nondeterministic and effectively tries all possibilities behind the scenes.

Daniel Selsam (Apr 15 2021 at 16:21):

Jason Rute said:

(I'm unclear what "abstractions" mean here, but I think it is something like sub tactics that your main tactic calls.)

There are a few different types of abstractions. For example:

types (shapes, objects, rectangles, axes, ...
features (has uniform frame, nearest unique shape, how many holes in shape, ...)
tactics (align shape, impute motion, remove background, fill blanks, impute permutation, ...)

There is a general-purpose (not-ARC-specific) higher-order synthesis engine in https://github.com/dselsam/arc/tree/master/src/Synth that is called as a subroutine by most tactics, that makes use of features in a generic way. So adding features (which is extremely modular) improves all the tactics. And adding types opens up the door for adding more features.

Daniel Selsam (Apr 15 2021 at 16:23):

Jason Rute said:

I find this surprising since ARC is considered a big deal in the ML world

Really? I have never heard anybody talk about it. Why would ML people in particular be interested? No ML-first approach has a chance.

Daniel Selsam (Apr 15 2021 at 16:36):

Jason Rute said:

it doesn't sound like you plan to work on it further.

I do not plan to work on it further. It is an extremely interesting domain, and it was the perfect playground for us to develop our architecture for IMO. It was also absurdly fun. But I think it has a few significant limitations. Now that we know the architecture basically works, it essentially comes down to reverse engineering boring low-level abstractions that are builtin to humans. Second, it is fundamentally ill-posed. One person has a finite set of secrets, and the challenge basically reduces to guessing these secrets. In contrast, once our IMO architecture is in place, I think the tricks we need to write for IMO will be much closer to the tricks that are taught to humans as opposed to built-in. And regarding evaluation, we get new problems every year from all over the world.

Joe Palermo (S2'17) (Apr 15 2021 at 16:52):

Daniel Selsam said:

Really? I have never heard anybody talk about it. Why would ML people in particular be interested? No ML-first approach has a chance.

As an ML person interested in ARC, I'd say ARC is still pretty niche, but there seems to be a growing subset of researchers who find it very interesting. In my case I'm interested in it precisely because it highlights the limitations of an ML-first approach to AI/AGI.

Daniel Selsam (Apr 15 2021 at 18:56):

@_Joe Palermo (S2'17)|337523 said:

Daniel Selsam said:

Really? I have never heard anybody talk about it. Why would ML people in particular be interested? No ML-first approach has a chance.

As an ML person interested in ARC, I'd say ARC is still pretty niche, but there seems to be a growing subset of researchers who find it very interesting. In my case I'm interested in it precisely because it highlights the limitations of an ML-first approach to AI/AGI.

There are some cool things one could try with ML besides just learning heuristics for search. As I mentioned above, new features lift all ships in our solver. One could easily extract additional features from NNs and throw them into the stew.

Jason Rute (Apr 15 2021 at 21:38):

@Daniel Selsam Thanks. This helps clarify a lot. (Also you might be interesting in making a bet with Szegedy.)

I would happy to bet that we will have a generic AI (simple, general purpose architecture, trained from generally available data) by 2025 that will be able to solve 95+% of the ARC problems. (Not that I think that ARC is the most challenging benchmark one could come up with.) https://twitter.com/fchollet/status/1373404016915357699

- Christian Szegedy (@ChrSzegedy)

Vladislav Bezhentsev (Apr 15 2021 at 22:42):

By the way, here is an interesting question. Is it possible to modify DreamCoder in such a way that it becomes capable of iteratively refining itself? I mean DreamCoder learns to generate programs that solve arbitrary given tasks. So, in theory, you can ask DreamCoder to generate programs "that learn to generate programs that solve arbitrary given tasks".

Jason Rute (Apr 15 2021 at 23:42):

I think you are giving DreamCoder way too much credit for what it is capable of.

Jason Rute (Apr 16 2021 at 17:14):

@Vladislav Bezhentsev Let me give a better answer. I think in theory one could have an outer DreamCoder which has as one of its DSL primitives a function dreamcoder whose inputs are a DSL (given as a list of functions), a number of steps to run, and a list of example problems. The output would be both a new, expanded DSL as well as solutions to the given problems. The outer DreamCoder can call the inner DreamCoder with a curated DSL and problem list. I do however see a lot of initial problems getting something like this off the ground. DreamCoder (and other program synthesis engines) are designed around the programs it generates being fast since it needs to explore a large space of programs. The inner dreamcoder function is not fast at all. (This is for a similar reason that hyperparameter tuning usually doesn't use reinforcement learning with neural networks. Training the (inner) model is time consuming so you only have a few datapoints to use for hyper parameter selection. Instead one usually uses Bayesian optimization for the search which is really data efficient.) Now we shouldn't rule out code which programs itself when the Singularity comes :smile:, but I personally think DreamCoder isn't anywhere close to up to this yet.

Jason Rute (Apr 21 2021 at 22:28):

I'm very interested in trying out DreamCoder. I have a number of toy tasks (and if those work, more interesting tasks) to try out on it. Has anyone here (@mikey liu, @Joe Palermo, @Julian Berman) tried to run it, either on the predesigned training sets or on custom ones? I'm going to work through the documentation, but if anyone has any advice, I'd appreciate it.

Joe Palermo (Apr 22 2021 at 01:12):

@Jason Rute I haven't but I'm very interested in trying it also.

For others, here's the repo: https://github.com/ellisk42/ec

Julian Berman (Apr 22 2021 at 21:48):

I got as far as installing it but didn't use it yet personally. I put it on my list too though, might give it a shot tomorrow

Junyan Xu (Mar 02 2024 at 19:26):

LILO: Learning Interpretable Libraries by Compressing and Documenting Code
image.png

Talk: 2024-03-06T14:00:00-05:00 https://cmsa.fas.harvard.edu/event/nt-3624/
Paper: https://arxiv.org/html/2310.19791v2 (ICLR 2024)
Code: https://github.com/gabegrand/lilo
Press: https://www.marktechpost.com/2023/11/07/mit-researchers-introduce-lilo-a-neuro-symbolic-framework-for-learning-interpretable-libraries-for-program-synthesis/

Last updated: May 02 2025 at 03:31 UTC