Zulip Chat Archive

Stream: lean4

Topic: advent of code!!!


Huỳnh Trần Khanh (Dec 02 2021 at 10:09):

hey... would you like to do this year's advent of code in lean 4? https://adventofcode.com

Henrik Böving (Dec 02 2021 at 10:21):

I would imagine that at some point the (at the moment) rather rudimentary stdlib might hit its limits but the first few days should definitely be doable.

Mario Carneiro (Dec 02 2021 at 10:38):

obviously the goal is to build the standard library in the process :)

Alcides Fonseca (Dec 02 2021 at 11:52):

Huỳnh Trần Khanh said:

hey... would you like to do this year's advent of code in lean 4? https://adventofcode.com

I've been trying to keep up with it: https://github.com/alcides/AoC2021Lean4

I couldn't find the IO documented, so I've looked at the solutions of last year: https://github.com/rwbarton/advent-of-lean-4

I wonder whether the lean4 would be open for PRs regarding the documentation, and if there were any general guidelines regarding expecting quality.

Huỳnh Trần Khanh (Dec 02 2021 at 12:08):

@Alcides Fonseca ah yeah lean4 documentation is like... pretty bare bones at the moment, i assume this is because the devs are still pretty busy developing lean4 itself :joy:

but i think this guide is pretty helpful lol https://agentultra.github.io/lean-4-hackers/

Henrik Böving (Dec 02 2021 at 12:08):

They are in principle open for docs and little tutorials over here: https://github.com/leanprover/lean4/tree/master/doc as noted here https://github.com/leanprover/lean4/blob/master/CONTRIBUTING.md#documentation

Henrik Böving (Dec 02 2021 at 12:09):

My approach for if I want to know something about Lean 4 at the moment is to just look how the stdlib/compiler does stuff by reading the source code really, I've found most of it to be quite understandable.

Mac (Dec 02 2021 at 13:37):

Here is a little snippet of code I wrote a while back that makes working with File I/O a little easier. It may be of use inn tasks like these:

import Std.Data.HashSet

open Std System

partial def forEachLine
(h : IO.FS.Handle) (init : α) (f : String  α  IO (ForInStep α)) : IO α := do
  let rec loop a := do
    match ( f ( h.getLine) a) with
    | ForInStep.done a  => pure a
    | ForInStep.yield a => loop a
  loop init

instance : ForIn IO IO.FS.Handle String where
  forIn := forEachLine

instance : ForIn IO FilePath String where
  forIn path init f := do
    let h  IO.FS.Handle.mk path IO.FS.Mode.read false
    forEachLine h init f

/- Print the common lines between two files. -/
def comm (file1 : FilePath) (file2 : FilePath) : IO PUnit := do
  let mut lines := HashSet.empty
  for line in file1 do
    lines := lines.insert line
  for line in file2 do
    if lines.contains line then
      IO.print line

Mac (Dec 02 2021 at 13:38):

With the two ForIn instances you can just use for ... in to iterate over the lines in a file! :)

Gabriel Ebner (Dec 02 2021 at 13:40):

In general I'm a big fan of adding more ForIn instances, but these seem really misleading.

Gabriel Ebner (Dec 02 2021 at 13:40):

I would have expected them to iterate over the path components, if anything.

Mac (Dec 02 2021 at 13:40):

@Gabriel Ebner yeah, the FilePath one is probably excessive. The Handle one seems good though.

Gabriel Ebner (Dec 02 2021 at 13:41):

I'd at least put a wrapper around it, like def Lines := IO Handle and def linesOf : FilePath -> Lines etc.

Mac (Dec 02 2021 at 13:42):

@Gabriel Ebner I mostly was just copying Python which also has for ... in iterate over the lines of a handle.

Sebastian Ullrich (Dec 02 2021 at 13:45):

Obviously it should be a for elaborator that adapts semantics depending on whether you name the loop variable line, word, or char

Mac (Dec 02 2021 at 13:45):

Following, @Gabriel Ebner's idea, here is a snippet that uses a distinct def:

def Lines (α : Type u) := α
@[inline] def linesOf : α  Lines α := id

instance : ForIn IO (Lines IO.FS.Handle) String where
  forIn := forEachLine

instance : ForIn IO (Lines FilePath) String where
  forIn path init f := do
    let h  IO.FS.Handle.mk path IO.FS.Mode.read false
    forEachLine h init f

/- Print the common lines between two files. -/
def comm (file1 : FilePath) (file2 : FilePath) : IO PUnit := do
  let mut lines := HashSet.empty
  for line in linesOf file1 do
    lines := lines.insert line
  for line in linesOf file2 do
    if lines.contains line then
      IO.print line

Sebastian Ullrich (Dec 02 2021 at 13:48):

Not much difference to IO.FS.lines left then except for streaming?

Mac (Dec 02 2021 at 13:49):

@Sebastian Ullrich exactly, the main point is streaming.

Sebastian Ullrich (Dec 02 2021 at 13:50):

Which should not be relevant for AoC I'd hope, or else I'd pity their servers :grimacing:

Mac (Dec 02 2021 at 13:50):

@Sebastian Ullrich Fair enough. It is also more declarative, which is nice. :)

Mac (Dec 02 2021 at 13:56):

@Sebastian Ullrich also, this does not trim newlines (which IO.FS.lines does, but Python's for ... in does not).

Sebastian Ullrich (Dec 02 2021 at 14:00):

I would not be surprised if stdlibs of various languages can be split 50/50 on this question

Calvin Lee (Dec 03 2021 at 05:25):

Mario Carneiro said:

obviously the goal is to build the standard library in the process :)

hullo, I'm doing AOC in Lean 4 :)
I just updated my old vector library for use in day #3, is that something that should get added to the stdlib or is it something that would be good as a standalone library?

Mario Carneiro (Dec 03 2021 at 05:28):

not knowing anything about it, I would say sure, PRs to mathlib4 to add tools for dealing with lean 4 stuff are welcome

Mario Carneiro (Dec 03 2021 at 05:30):

you may or may not want to hold off, since you might find something else to add for day 4. Either way is fine

Mario Carneiro (Dec 03 2021 at 05:32):

most projects of any appreciable size accrue a "for_mathlib" file containing stuff that really ought to be upstreamed to the relevant library because it's not project specific

Calvin Lee (Dec 03 2021 at 05:40):

I'm not using mathlib right now because I didn't think mathlib4 was very complete and I'm mostly just programming
I thought some standalone libraries for "outside of mathlib programming" might be nice in general

Mario Carneiro (Dec 03 2021 at 05:58):

It's not very complete. The point is for you to change that :)

Mario Carneiro (Dec 03 2021 at 06:01):

as for mathlib vs stdlib, that question will work itself out eventually, and my suggestion is just to treat mathlib4 as a programming library for now. I don't see any reason not to depend on it for your AOC stuff

Mario Carneiro (Dec 03 2021 at 06:02):

the important thing is to actually start building those libraries and dumping them somewhere; rearranging files and splitting projects is comparatively easy

Alcides Fonseca (Dec 03 2021 at 11:56):

I had the same feeling as @Calvin Lee. I was hoping for a count method in Lists (even if it required decidable_eq or similar). I believe it should life in the lean4 stdlib, but I believe many smallish PRs would not be welcome at this point. I'll try to make a list and commit them to mathlib4 or to another specific repo for archival purposes.

Henrik Böving (Dec 03 2021 at 11:58):

There was recently a thread about stdlib here: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Absolute.20value.20for.20Float/near/262585765

It boils down to the fact that what Lean 4 calls Std is basically the minimum set of Lean utilities required to compile the compiler and LSP server and not actually a standard library in the regular sense...and after some discussion we more or less agreed to stuff everything into mathlib4 now and maybe have some separate std lib extension library later on.

Arthur Paulino (Dec 03 2021 at 13:07):

Mario Carneiro said:

not knowing anything about it, I would say sure, PRs to mathlib4 to add tools for dealing with lean 4 stuff are welcome

What is the plan for mathlib4 to grow? Forking and PRing or acquiring writing access to non-protected branches?

Mac (Dec 03 2021 at 13:34):

Henrik Böving said:

after some discussion we more or less agreed to stuff everything into mathlib4 now and maybe have some separate std lib extension library later on.

We did? XD

Henrik Böving (Dec 03 2021 at 14:28):

More or less :P

Mario Carneiro (Dec 03 2021 at 17:50):

@Arthur Paulino PRs should be fine, we don't have a heavy CI setup like mathlib

Sebastian Ullrich (Dec 08 2021 at 08:53):

I was goaded into finally participating in AoC by friends. My solution for today takes 134s in the interpreter... or 4s when compiled :open_mouth: .

Kevin Buzzard (Dec 08 2021 at 08:57):

It would be lovely to see people working together on the problems in Lean, but is this sort of thing frowned upon? If so, it would be lovely to see some discussion afterwards about how different people found the questions

Kevin Buzzard (Dec 08 2021 at 08:57):

Last year it was just Reid saying "I did it, here's the repo"

Henrik Böving (Dec 08 2021 at 08:58):

Usually in AoC everyone works for themselves and afterwards there is some discussion about how people did it. E.g. someone on reddit posted a mini paper on yesterday: https://www.reddit.com/r/adventofcode/comments/rawxad/2021_day_7_part_2_i_wrote_a_paper_on_todays/

Huỳnh Trần Khanh (Dec 08 2021 at 09:06):

Kevin Buzzard said:

It would be lovely to see people working together on the problems in Lean, but is this sort of thing frowned upon? If so, it would be lovely to see some discussion afterwards about how different people found the questions

not frowned upon at all, feel free to do this if you like

Adam Topaz (Dec 08 2021 at 14:41):

I just finished day 8, and using #eval ... inside the lean file with no compilation gets me the right answer in about 2 seconds.
I don't know if I had the most efficient approach. this was a fun puzzle :)

My code is here (spoiler ahead, obviously)

Sebastian Ullrich (Dec 08 2021 at 14:42):

In my defense, I tried for the simplest possible solution :) . It's 23 lines.

Adam Topaz (Dec 08 2021 at 14:42):

Yeah, mine is a bit longer :)

Eric Taucher (Dec 08 2021 at 14:44):

Maybe next year the site could add an Advent of Code stream. Then if the code is indexed by search engines it might give Lean more visibility in the programming world.

Alcides Fonseca (Dec 15 2021 at 17:58):

I don’t know if anyone has solved part 2 of today’s challenge (day 15). It is a standard pathfinding algorithm, but with a largish size. I’ve found lean to be really slow (even using a tail recursive function). My solution took around 10 minutes on my mbp, but I was expecting it to be solved in one minute, based on other languages.

I guess my slowdown could be due to using arrays (has a sorting method included, unlike lists). Anyone else having the same issue?

https://github.com/alcides/AoC2021Lean4/blob/master/15/b.lean

Mario Carneiro (Dec 15 2021 at 17:59):

Is it actually tail recursive?

Gabriel Ebner (Dec 15 2021 at 18:00):

1) You're using insertionSort, this could be a bit slow.

Mario Carneiro (Dec 15 2021 at 18:01):

failure of TCO and accidental sharing are the first things I would check for

Gabriel Ebner (Dec 15 2021 at 18:03):

2) Array sorting is apparently pretty slow if you run the code in the interpreter. You might need to compile it, see this other discussion: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/include.20lean.2Eh/near/264881145

Mario Carneiro (Dec 15 2021 at 18:03):

I would expect arrays to be faster than lists, not slower, unless you have a lot of sharing

Sebastian Ullrich (Dec 15 2021 at 18:07):

(My bad Dijkstra takes 4s interpreted, 0.4s compiled for part 2)

Mac (Dec 15 2021 at 18:09):

Your string parsing (especially parseMatrix) is also doing way to much conversion between Lists and Arrays (toList and toArray are both O(n) operations).

Mac (Dec 15 2021 at 18:11):

Everything in there should ideally just be Arrays

Mario Carneiro (Dec 15 2021 at 18:11):

that's probably not the bottleneck though, since it only happens once. Now that I see this is Dijkstra, that insertionSort looks like the culprit since it's doing an O(n^2) operation where it should be O(log n)

Mac (Dec 15 2021 at 18:17):

@Mario Carneiro The script is converting an Array of lines (of a very large file) to a List. Then for each String line, it is converting it to a List(allocating a list node for each character of the line) and then to an Array. Then it is converting the entire result back to an Array. That is a lot of heavy duty memory allocations. While insertionSort is a problem, I would not be surprised if the bottleneck is nonetheless the memory conversion.

Mario Carneiro (Dec 15 2021 at 18:17):

It does happen in the first phase of the program, so it should not be hard to benchmark that part

Mario Carneiro (Dec 15 2021 at 18:17):

I agree that there are a lot of unnecessary conversions there

Mario Carneiro (Dec 15 2021 at 18:18):

The issue is that the insertionSort sorts the entire priority queue on every iteration, meaning that the overall complexity is somewhere around O(n^3) or O(n^4) for an n * n grid graph

Alcides Fonseca (Dec 15 2021 at 18:19):

I am aware of that bottleneck in the parsing. But I have been doing the same thing every day, and it has not been an issue. The reason for that is that I am trying to spend less time writing code (I have other tasks to do), and unfortunately List and Array do not have the same utility methods, so I end up doing the conversion every now and then. But I'm sure that is not the issue, but rather Dijksgtra (i didn't mention it, because I didn't want to spoiler anyone :-)

Mario Carneiro (Dec 15 2021 at 18:21):

If you can't be bothered to write the code yourself, you could instead put an issue on mathlib4 for any missing utility theorems so that we can at least record the problem and maybe get around to it later

Alcides Fonseca (Dec 15 2021 at 18:21):

I tried to use the binary insert instead of sorting, but it didn't work correct (probably misuse, or I didn't understand what the function was supposed to do).

I know that I should have implemented an ordered insert on the array, but I wanted to take this opportunity to use the current stdlib and identify what I need from it. That is one example.

Sebastian Ullrich (Dec 15 2021 at 18:22):

Finally a use case for spoiler tags :grinning_face_with_smiling_eyes:

Alcides Fonseca (Dec 15 2021 at 18:22):

Mario Carneiro said:

If you can't be bothered to write the code yourself, you could instead put an issue on mathlib4 for any missing utility theorems so that we can at least record the problem and maybe get around to it later

That is the point of this exercice. I am writing down all the missing functions I feel I need, and I hope to contribute them early next year.

Alcides Fonseca (Dec 15 2021 at 18:23):

I would also like to prove some things about these programs, or write them in a more type-safe approach (I am using a lot of ! methods). But I'll leave that for after the challenge.+

Mario Carneiro (Dec 15 2021 at 18:24):

It's usually pretty easy to know what to prove once you have the functions, provided the function isn't written in a way that precludes proof (i.e. partial)

Mario Carneiro (Dec 15 2021 at 18:28):

I'm not sure what your input is, maybe the distribution makes this reasonable, but the wrap9 function has a really bad asymptotic complexity

Alcides Fonseca (Dec 15 2021 at 18:31):

The highest input for that function will be 19. That is now the source of the slowdown. I agree with you that it should be the insertionSort.

Mario Carneiro (Dec 15 2021 at 18:32):

@Sebastian Ullrich Do you know why Array.back and Array.pop have the interface that they do? It seems like it would work better with the reference counting to have them combined into one function. And I'm always afraid of the pop causing a copy if the back call floats after it somehow.

Sebastian Ullrich (Dec 15 2021 at 18:34):

Allocating a pair for the return value wouldn't exactly be ideal either, no? I know the floating story is not completely clear yet, but ideally we should guarantee that this never happens imo.

Gabriel Ebner (Dec 15 2021 at 18:34):

I think we're all counting on return value unboxing. :smile:

Sebastian Ullrich (Dec 15 2021 at 18:35):

Eventually :)

Mario Carneiro (Dec 15 2021 at 18:35):

Is there any hope of LLVM doing this already?

Mario Carneiro (Dec 15 2021 at 18:36):

An alternative interface that doesn't rely on pair unboxing is IORef (Array A) -> IO A

Gabriel Ebner (Dec 15 2021 at 18:36):

More seriously, the same problem is also there with tons of other functions and other types like string. Bundling back/pop only fixes a single instance.

Mario Carneiro (Dec 15 2021 at 18:37):

you mean back/pop on string?

Sebastian Ullrich (Dec 15 2021 at 18:37):

Mario Carneiro said:

Is there any hope of LLVM doing this already?

Last time I looked, LLVM doesn't know anything about annihilating (custom) malloc-free pairs

Mario Carneiro (Dec 15 2021 at 18:38):

maybe if it inlines the custom code... generating LLVM directly would probably help here

Sebastian Ullrich (Dec 15 2021 at 18:39):

We definitely don't want to inline the allocator any more than we already do

Mario Carneiro (Dec 15 2021 at 18:40):

Are lean_objects ever stack allocated?

Mario Carneiro (Dec 15 2021 at 18:40):

These transient pairs would work well with escape analysis

Sebastian Ullrich (Dec 15 2021 at 18:41):

Not right now, no

Tim Hunter (Dec 17 2021 at 14:14):

By the way, what is a recommended way to write small unit tests for lean? I started to compile a collection of extra methods that I was missing from scala, haskell, etc. but I am not sure how to add small tests to them:
https://github.com/tjhunter/advent_of_code_lean4_2021/blob/main/MyAdvent/AdventStd.lean

The most interesting can be sent as a PR if there is interest for it. (also, any advice on the code is welcome!)

Mac (Dec 17 2021 at 21:50):

@Tim Hunter the common approach is just to write a test file with #eval statements that test the code and then run lean on the file to verify that it does not error.

Alcides Fonseca (Dec 19 2021 at 10:30):

Mac said:

Tim Hunter the common approach is just to write a test file with #eval statements that test the code and then run lean on the file to verify that it does not error.

I've been using eval for small utilities. However, because some functions have to be tested with large-ish inputs (like today's puzzle), I end up just making a lot of changes to the main function. But I agree some little DSL might be useful.

Alcides Fonseca (Dec 19 2021 at 10:57):

Btw, regarding today's puzzle, I only managed to run it when compiled. The interpreted version was too slow to be practical. I wonder if this could be useful for performance (regression) testing?

Link to my solution

James Sully (Dec 01 2023 at 11:51):

Doing day 01 2023, I have
(λ line ↦ firstAndLastDigit (digitsOnly line) sorry)
I'd like to write this as a composition, but the argument order of firstAndLastDigit makes it tricky

def firstAndLastDigit (digits : List Char) (ok : digits  []) : Char × Char :=
  (digits.head ok, digits.getLast ok)

the evidence that digits is nonempty has to come after digits in order to refer to it.
Is it possible to neaten this?

James Gallicchio (Dec 01 2023 at 18:59):

I'm nearly positive that's the neatest way to write it. If you want to abuse dot notation you can write

(firstAndLastDigit (digits := digitsOnly ·) sorry)

but that doesn't feel more readable. Or

(digitsOnly · |> firstAndLastDigit <| sorry)

which also feels worse. Of course you can use a Haskell bird combinator (generalized for dependent functions) but that is code obfuscation more than anything.

James Gallicchio (Dec 01 2023 at 19:00):

(sidenote: i discovered that named-argument parentheses are transparent to dot notation, which is a nice workaround for cases like this... although, it's definitely less readable :P)

James Sully (Dec 02 2023 at 01:17):

righto, thank you!


Last updated: Dec 20 2023 at 11:08 UTC