Zulip Chat Archive

Stream: general

Topic: Tree notation


Yaël Dillies (Feb 12 2022 at 18:18):

Is there notation for docs#tree? If not, what could one be?

Yaël Dillies (Feb 12 2022 at 18:21):

The context is #10645 where we're gonna use a tactic to generate a (huge) certificate tree, then cache it by putting it in the code.

Yaël Dillies (Feb 12 2022 at 18:22):

roth? will output Try this: roth <insert big tree here> and roth t where t is a tree will apply a custom tactic for nodes and another one for leaves.

Yaël Dillies (Feb 12 2022 at 18:24):

So I'm actually only interested in tree unit, with specific notation if possible because I don't want to have to write () in every node.

Yaël Dillies (Feb 12 2022 at 18:25):

Of course, we can always flatten tree α to list (option α) (or maybe in our case to list bool. I'm open to suggestions. I just want something usable.

Mario Carneiro (Feb 12 2022 at 23:10):

Do you care if the tree is "readable"? You could output some compressed representation as a string and parse it in the tactic

Yaël Dillies (Feb 12 2022 at 23:12):

No, I don't care about readability at all, and it's meant to be processed by a tactic anyway so an additional parsing shouldn't. However, the length of the said string could be of the order of 1000 characters, so they can't hold on a line. Is "stuff" ++ "other stuff" an acceptable way to circumvent the problem?

Yaël Dillies (Feb 12 2022 at 23:29):

Do you think we could go as far as outputting one big number?

Eric Wieser (Feb 12 2022 at 23:30):

Why is that better than a string?

Eric Wieser (Feb 12 2022 at 23:30):

Can you wrap numbers in lean?

Yaël Dillies (Feb 12 2022 at 23:32):

It's better than a string of t and f, but it's worse than a string of literally any allowable character.

Reid Barton (Feb 12 2022 at 23:33):

base64 encoding would be the common way to encode a blob of bits in a text file.

Yaël Dillies (Feb 12 2022 at 23:42):

SHould I write such a parser from scratch and dedicated to tree or should I set up an API for turning a repr into a base64 string?

Kyle Miller (Feb 13 2022 at 02:33):

@Yaël Dillies I had a go at this:

tree <-> string

Kyle Miller (Feb 13 2022 at 02:34):

It seems better to have a bespoke parser for your application, but having functions in mathlib to convert numbers to and from base64 seems useful.

Kyle Miller (Feb 13 2022 at 02:35):

(There was a some complexity in the implementation to help Lean see everything terminated. It runs a little stack manipulation language to build the tree from a list of bools.)

Mario Carneiro (Feb 13 2022 at 07:51):

If the base64 parser ignores non-base64 content, then you can just put newlines and indentation in the string if you need multiple lines (lean doesn't require single line literals). If this is for roth, you also don't need to encode the tree structure; you can just have a stream of "oracle calls" and just pull from the stream whenever you need to make a decision at any point in the tree

Mario Carneiro (Feb 13 2022 at 07:52):

BTW, it's nice to see us finally getting proofs that are a blob of bytes like metamath proofs :)

Kevin Buzzard (Feb 13 2022 at 08:01):

Didn't you do this already somewhere in the Matiyasevich proof?

Mario Carneiro (Feb 13 2022 at 08:04):

I don't think I have, although some applications have come close

Mario Carneiro (Feb 13 2022 at 08:05):

the matiyasevich proof is just a whole lot of single letter variables

Mario Carneiro (Feb 13 2022 at 08:05):

You're probably thinking of this: https://github.com/leanprover-community/mathlib/blob/master/src/number_theory/dioph.lean#L601-L612

Mario Carneiro (Feb 13 2022 at 08:06):

The bit at the beginning with a bunch of notations is more like manual typeclass inference proving that an expression is diophantine

Mario Carneiro (Feb 13 2022 at 08:07):

and it's not particularly "encoded" - it's a literal transcription of the input but with de bruijn variables

Kevin Buzzard (Feb 13 2022 at 08:10):

Whenever I think I can read mathlib I always remind myself that I still never made it through that part

Yaël Dillies (Feb 13 2022 at 08:31):

Mario Carneiro said:

you also don't need to encode the tree structure; you can just have a stream of "oracle calls" and just pull from the stream whenever you need to make a decision at any point in the tree

But is it not easier, or at least more principled; to build the tree first and then flatten it? To be clear, I need to make decisions all the time.

Yaël Dillies (Feb 13 2022 at 08:47):

Very nice, Kyle! I'm thinking that what we should have is a typeclass has_bytes to generally declare that a given type can be encoded with list bool and decoded from it. Then we can have a translation between list bool and base64. Both seem generally useful.

Yaël Dillies (Feb 13 2022 at 08:50):

I really like this little "Automate away Roth calculations" project. It's getting me to learn metaprogramming!

Kevin Buzzard (Feb 13 2022 at 08:57):

I think the reals biject with list bool (if you allow infinite lists)

Yaël Dillies (Feb 13 2022 at 08:58):

Have you ever heard of countability, Kevin? :grinning:

Kevin Buzzard (Feb 13 2022 at 09:01):

Oh it's finite lists of course :-(

Kevin Buzzard (Feb 13 2022 at 09:02):

So isn't this just denumerable then??

Kevin Buzzard (Feb 13 2022 at 09:03):

No -- I mean docs#encodable

Kevin Buzzard (Feb 13 2022 at 09:05):

And then you just biject list bool with nat (it bijects very naturally with pnat: regard ff as 0 and tt as 1, stick a 1 in front and you have a binary natural). Hopefully I'm getting the hang of this exotic list type now

Yaël Dillies (Feb 13 2022 at 09:07):

It is, yeah, except that we pick list bool as the encding state. Don't know if that matters too much for performance. That was my reasoning.

Eric Wieser (Feb 13 2022 at 10:35):

  • dioph: a predicate stating that a set S ⊆ ℕ^α is Diophantine, i.e. that

Alright, keep your secrets

Eric Rodriguez (Feb 13 2022 at 10:40):

Eric Wieser said:

  • dioph: a predicate stating that a set S ⊆ ℕ^α is Diophantine, i.e. that

Alright, keep your secrets

#12011

Kyle Miller (Feb 13 2022 at 11:54):

Mario Carneiro said:

If the base64 parser ignores non-base64 content, then you can just put newlines and indentation in the string if you need multiple lines (lean doesn't require single line literals).

Good idea -- the one I wrote already does :smile: Though I did that without fully realizing Lean strings allow raw newlines.

Kyle Miller (Feb 13 2022 at 12:12):

Yaël Dillies said:

I'm thinking that what we should have is a typeclass has_bytes to generally declare that a given type can be encoded with list bool and decoded from it. Then we can have a translation between list bool and base64. Both seem generally useful.

I know you like exploring rabbitholes, but this is a surprisingly deep one and I worry you might never return. :wink:

To make a serialization system that's robust, you would probably want a system that encodes a version into binary blob so that if you ever want to make changes to the interface later, you don't have to go and re-encode all your blobs, for example. If you don't have a generic system, then what you can do instead is create new decoding functions when you have new versions, and the choice of decoding function in the source code serves as the version indicator.

If embedding base64 certificates in proofs becomes more prevalent, then we'd have a better idea of what properties we'd want from serialization, and it might be worth having something more generic. But until then, I'd say it's additional work that shouldn't be done. (Even though I see why it would be useful.)

Kyle Miller (Feb 13 2022 at 12:16):

There was a trick in the base64 encoder for trees, by the way. It rounds the list bool up to the nearest multiple-of-six length, since base64 is a sequence of 6-bit numbers. I was relying on the fact that ff is interpreted in of_bool_list_aux as "push tree.nil onto the stack" and that at the end of evaluation, the function returns the top of stack. That means it can serve as a no-operation if it's at the beginning of the list.

The base64 encoder could deal with arbitrary list bool, though, by having it encode at the beginning how many bits to drop, using 3 additional bits to do so.

Kyle Miller (Feb 13 2022 at 12:22):

Kevin Buzzard said:

And then you just biject list bool with nat (it bijects very naturally with pnat: regard ff as 0 and tt as 1, stick a 1 in front and you have a binary natural). Hopefully I'm getting the hang of this exotic list type now

The first thing I tried was using docs#encodable for list bool, but the naturals it produced were hilariously large for the application.

The encoding scheme you mentioned would work, though. For these trees, you could skip having that 1 in front, actually, if you have the beginning of the list be the least-significant digits, because, incidentally, the list bool -> tree unit decoder can handle an arbitrary number of ff appended to the end.

Yaël Dillies (Feb 13 2022 at 12:22):

I'm afraid the only way to preserve me from a rabbithole is to fill it up yourself :grinning:

Yaël Dillies (Feb 13 2022 at 12:25):

What about we merge the correctness proof of the algorithm now and worry about making it effective later? (#10645 for reference)

Kyle Miller (Feb 13 2022 at 13:14):

Yaël Dillies said:

we pick list bool as the encoding state. Don't know if that matters too much for performance.

I mainly used list bool so that there'd be something to recurse on in of_bool_list_aux, but also it turns out nat.to_digits is very slow.

With a faster nat.to_digits, we can make reasonably efficient nat.to_base64 and nat.from_base64 functions.

Here's a simpler implementation than before using these functions:

tree <-> string via nat

Yakov Pechersky (Feb 13 2022 at 14:52):

Ah! I would suggest using parsers for base64_of_char! docs#parser.numeral

Yakov Pechersky (Feb 13 2022 at 14:53):

We can write a hex one and a base64 also.

Mario Carneiro (Feb 13 2022 at 21:17):

I want to point out that typeclasses like has_bytes and non-meta code is unnecessary here (again, assuming this is for roth). The main thing you need is:

  • A type bitstream
  • function read_bit : state bitstream bool which peels off one bit from the stream
  • bitstream.mk : string -> bitstream which initializes a bitstream from a base64 string (the stream is self-closing so it is okay to have trailing 0's)
  • (optional) a general typeclass implementing read : state bitstream A by calling read_bit several times, until you have enough entropy to reconstitute the type. This is basically a deserializer

Mario Carneiro (Feb 13 2022 at 21:17):

For Roth, I think the actual data you want to encode is bool, so you don't need the typeclass

Mario Carneiro (Feb 13 2022 at 21:19):

For constructing these streams we also need a utility to produce the stream and turn it into a string literal that can be pasted into the code

Mario Carneiro (Feb 13 2022 at 21:19):

but that would only be used during authoring, it wouldn't be run normally

Yaël Dillies (Feb 13 2022 at 21:30):

bitstream = list bool or do you have something else in mind? Kyle mentioned that doing arithmetic on N\N might be the fastest because it has special support in the kernel.

Kyle Miller (Feb 13 2022 at 21:38):

It seems compelling having abbreviation entropy := nat and then having rand_nat (hi : nat) : state entropy nat to get a number in the range [0,hi) when hi is positive (defined by returning s % hi and replacing the current state s with s / hi). (Maybe not using randomness terminology)

Kyle Miller (Feb 13 2022 at 22:01):

(An example for (de)serializing trees.)

now with monads

Mario Carneiro (Feb 13 2022 at 22:57):

Yaël Dillies said:

bitstream = list bool or do you have something else in mind? Kyle mentioned that doing arithmetic on N\N might be the fastest because it has special support in the kernel.

I think that pulling bits from a list is faster than base conversion if you pull things of non-power-of-two sizes from the stream

Mario Carneiro (Feb 13 2022 at 23:00):

The fastest way to implement something that pulls directly from a base64 string is something like string.iterator * nat * nat where the pair of nat represents the last decoded number and the number of bits remaining in it


Last updated: Dec 20 2023 at 11:08 UTC