Zulip Chat Archive

Stream: new members

Topic: Introducing myself: John Burnham

John Burnham (Sep 21 2021 at 21:31):

Hi everyone, my name is John Burnham and I'm one of the authors of the Yatima language, https://github.com/yatima-inc/yatima, a dependently-typed language featuring IPFS-based content-addressing (somewhat like the unisonweb.org language). Really interested in what you guys are doing with Lean, and I wrote a parser and IPFS content addresser for the Lean kernel (using the Lean3 export format), that I'm hoping to develop into a full independent checker in Rust (like https://github.com/ammkrn/nanoda_lib). We've implemented two interesting reducers for Yatima that can likely be adapted to the Lean kernel, one based on lambda-DAG pointer graphs (Bottom-up Beta reduction: https://www.ccs.neu.edu/home/wand/papers/shivers-wand-10.pdf), and another based on the G-machine (https://github.com/yatima-inc/yatima/blob/gb/typechecker-machine/core/src/machine/machine.rs). We're curious to see how these techniques might compare performance-wise to the existing checkers.

Also, we're looking at potentially modifying Yatima to align with the Lean kernel. We've received a grant from the Web3 foundation (https://github.com/w3f/Grants-Program/blob/master/applications/yatima.md) to implement our language core on their Substrate blockchain platform (so that proofs can be checked trustlessly), and it seems like being able to use Lean in this context would be really interesting. Also, currently Yatima's inductive datatypes use self-types and lambda encodings, based on https://homepage.divms.uiowa.edu/~astump/papers/fu-stump-rta-tlca-14.pdf, but these have some odd behavior, and Lean's approach to implementing inductive families seems much safer and better understood.

Eric Wieser (Sep 22 2021 at 08:49):

to implement our language core on their Substrate blockchain platform (so that proofs can be checked trustlessly)

There was a discussion about block chain in a previous thread that you may be interested in. Regarding "checked trustlessly" It's not entirely clear why its better to trust a blockchain platform instead of just trusting nanoda / trepplein / leanchecker...

Bolton Bailey (Sep 22 2021 at 20:25):

I guess one reason is that once a particular proof has been checked on the blockchain, rather than others rechecking the proof on their own computers, one could just look back at the blockchain to see that the proof was already carried out.

John Burnham (Sep 22 2021 at 21:13):

I can see from the thread that there are a lot of strong opinions about blockchains here! Frankly, I think the skepticism is deserved, the field is full of specious and badly implemented projects. Here's a leaderboard of worst DeFi smart contract failures, for example: https://rekt.news/leaderboard/. To me, though, that's what makes the idea of using formal methods interesting here. In blockchain there are programs with extremely high ratios of capital to code complexity, where tens of millions of dollars relies on the correctness of a few hundred lines.

John Burnham (Sep 22 2021 at 22:18):

@Bolton Bailey, the idea of having a global persistent cache of checked results is also really interesting to me. Of course, this is really trading the cost of typechecking for the cost of searching through a big database, but for many common/expensive theorems it might be worth it

Bolton Bailey (Sep 22 2021 at 22:30):

One thing that is (for me) perhaps the biggest pain point with lean and mathlib is so called "orange bar hell". When you are editing a file in mathlib, and you realize a lemma is missing, you may want to quickly insert that lemma in a lower level file and go back to work. But because you have edited a file downstream, you must wait a long time for all the intermediate files to recompile. It doesn't necessarily have to be done with blockchain, but it would be nice if there was some kind of isolation where lean would realize that all of the lemmas you already used in your high-level file are still valid when you do this.

John Burnham (Sep 22 2021 at 22:53):

Oh, does Lean not have incremental rechecking/recompilation on a declaration-by-declaration level? I think part of what I'm working on maybe could help with that. Not blockchain per se, but in Yatima we serialize and hash individual definitions (via ipld.io, which is json-like). If we do that in Lean and then persist a a store of "already checked hashes", then we could maybe skip rechecking declarations in downstream files that don't depend on the changes made

Mario Carneiro (Sep 22 2021 at 22:56):

Rechecking can be done on a declaration-by-declaration level, but not proof construction, which uses tactics that might change their behavior depending on intermediate declarations that are not obviously referenced

Mario Carneiro (Sep 22 2021 at 22:58):

But hash collision will definitely cause everything to come tumbling down. It's not that hard to prove that a hash based system is unsound

John Burnham (Sep 22 2021 at 22:58):

Oh, I see, it's because of the metaprogramming/elaboration. The stuff I'm doing on the export format probably isn't relevant to that then

John Burnham (Sep 22 2021 at 23:01):

I think the risk of hash collisions is negligible using a 256-bit cryptographic hash. Performance might be an issue, but my experience with BLAKE is that it's roughly as fast as e.g. SipHash or MurmurHash

Mario Carneiro (Sep 22 2021 at 23:04):

Eh, I'm not a fan of risks that can't be proven to be small

Eric Wieser (Sep 22 2021 at 23:19):

They're better than the current risk that has been proven to happen, where our hashes are only 32-bit!

John Burnham (Sep 22 2021 at 23:51):

Assuming a valid cryptographic hash function (which is an assumption), with an N-bit digest, the number of hashes K you would need to make before you would see a collision with probability P can be estimated with:

K ≈ sqrt(-2 * (2^N) * ln (P))

For N = 256 and P = 10^(-10), K ≈ 10^39. So at 1 exahash (10^18 hashes per second, the Bitcoin network is ~ 100 Terahashes (10^14) by comparison), we would expect to see 1 collision every 10^21 seconds, which is greater than the expected age of the universe. If this is too high we can use 512-bit hashes. A variable digest-width hash can do this without too much performance loss.

It's more likely the specific cryptographic hash function we're using is somehow flawed though. This is definitely a concern, and can be mitigated by using multiple hash functions, like SHA-256, SHA-3, BLAKE3, etc, and comparing results. However, we should note that a practical attack on SHA-256 would break a lot of things, including, for example, both Git and Nix. Actually Git uses SHA-1 by default, for which practical attacks already exist, so if this is a concern we should make sure we're using git's experimental sha-256 object format

Mario Carneiro (Sep 22 2021 at 23:57):

Yes, my main concern is the possibility that the hash is flawed. The existence of cryptographic hashes in the first place is an open question (which I think implies P != NP). But as a formal mathematician I am aware I am being "pathologically untrusting" here

Bolton Bailey (Sep 23 2021 at 00:06):

If you don't like cryptographic hashes, you'll hate what I'm about to suggest next :smiling_devil: .

I think it could be fun to write a recursive SNARK to verify Lean proofs. This way, we could just throw out all the proofs in mathlib and replace them with ~500 byte, quickly verifiable cryptographic proofs that someone knew a proof at some point. Millions of mathematicians would cry out in horror at the disappearance of all semantic nuance in mathlib.

John Burnham (Sep 23 2021 at 00:11):

@Mario Carneiro For sure! I definitely think it's the right attitude to have, especially when it comes to proofs or cryptography. I think you're right about the existence of cryptographic hashes implying P != NP, since if P = NP, then there would be practical ways to compute pre-images. P = NP would be such a surprising result though, and even then there would still need to be practical algorithms for NP-complete problems. I mean, it's possible that that P = NP, with some ludicrously big lower bound on the polynomial.

Mario Carneiro (Sep 23 2021 at 00:14):

I think that the probability that the blake-3 hash is broken before the end of the universe is quite high, but it is very hard to put probabilities on scientific advancement

John Burnham (Sep 23 2021 at 00:14):

@Bolton Bailey I like the idea of using zero-knowledge proofs here. If the concept of "proofs on the blockchain" turns out to be viable/useful, I think that might be the way to go in the long run, so that the whole chain could be quickly reverified

John Burnham (Sep 23 2021 at 00:17):

Mario Carneiro said:

I think that the probability that the blake-3 hash is broken before the end of the universe is quite high, but it is very hard to put probabilities on scientific advancement

I agree with you, but if we discovered that blake-3 was broken, couldn't we reverify with e.g. sha-3? I would also say the probability is high that at any future time t, there will exist a hash function that is cryptographic in practice (conditional on P != NP though).

Bolton Bailey (Sep 23 2021 at 00:50):

Of course, if P = NP, we wouldn't need mathlib anyway, because we could just search for a proof in polynomial time whenever we wanted to verify something. The worst case scenario would be Impagliazzo's Pessiland world, where P != NP but one way functions don't exist.

Mario Carneiro (Sep 23 2021 at 00:52):

I forget if Impagliazzo had a world where P != NP but we couldn't prove it

Mario Carneiro (Sep 23 2021 at 00:53):

the relevance of the observation that cryptographic hashes imply P != NP is that it's really hard to prove any hash has good properties

Scott Morrison (Sep 23 2021 at 02:00):

(The world even worse than any of Impagliazzo's is where P = NP but we couldn't prove it. I imagine one day Klapaucius handing us a black box that just works, but gives no clues. The rest of the world might be happy, but can you imagine how miserable the mathematicians would be?)

Mario Carneiro (Sep 23 2021 at 02:05):

IIRC if P = NP then we already know a polynomial time algorithm for SAT

Mario Carneiro (Sep 23 2021 at 02:06):

One of my favorite paper titles: The Fastest and Shortest Algorithm for All Well-Defined Problems

Scott Morrison (Sep 23 2021 at 03:34):

"What somewhat spoils the practical applicability of M_p is the large additive constant c_p." :-)

Last updated: May 02 2025 at 03:31 UTC