Zulip Chat Archive

Stream: mathlib4

Topic: Tracking pi-base


Josha Dekker (Feb 10 2024 at 10:17):

To see how far the formalization of topology is progressing and to identify some nice small projects for myself (and possibly new members), I’m interested in tracking how many results in the pi-base (https://topology.pi-base.org/) (a website that contains topological definitions and relations between spaces) have been formalized so far in Lean. I started with two simple files which import Mathlib, one that runs #check on every property in pi-base (to see if the definition is present) and one that runs #synth for every result to deduce the various theorems from pi-base. Formalization would be complete if all these commands pass without error. However, doing all of pi-base would be a lot of work, and I think that many results are nice first problems to tackle (provided some experience in topology).

As such, I was wondering if it would be interesting to set this up as a project with a Mathlib dependency in a similar fashion as done for the Matrix Cookbook by Eric Wieser? I think it can provide for a nice overview of missing results for new members. Let me know what you think, and what would be the best way to set this up!

Johan Commelin (Feb 10 2024 at 13:08):

Sounds good to me!

Eric Wieser (Feb 10 2024 at 13:28):

It looks like there are 434 theorems, which is about the same as the matrix cookbook

Eric Wieser (Feb 10 2024 at 13:29):

Except it looks far more likely that the ones here are actually correct!

Eric Wieser (Feb 10 2024 at 13:31):

Oh, I guess the property / space grid is in addition to all the theorems

Josha Dekker (Feb 10 2024 at 14:11):

Yes, they collect over three dimensions, Spaces, Properties and Theorems. I think that tracking the Theorems would be a nice way of seeing what is still missing in this regard. To this end it is important to check that all Properties are defined as well, of course. With the experience of the matrix cookbook, would you suggest that I proceed similarly here, or do you have new insights that suggest another route?

Eric Wieser (Feb 10 2024 at 14:20):

I think the main learning in the matrix cookbook was to generate as much as possible before doing any manual work

Eric Wieser (Feb 10 2024 at 14:20):

So that means, scrape the website and generate a file for each statement of the form theorem T000XYZ : sorry := sorry, probably with the LaTeX docstring above it

Eric Wieser (Feb 10 2024 at 14:21):

It might be a nice experiment to try to use AI autoformalization to state the theorems, though I don't know how effective that would actually be. Probably the output is easier to fix than write from scratch.

Josha Dekker (Feb 10 2024 at 14:26):

Okay, my experience with web scraping is non-existent, but I’ll give it a try soon! I know that they have a GitHub as well, perhaps there’s an easy file that contains what I’m looking for!

Eric Wieser (Feb 10 2024 at 14:32):

Yes, that's probably better than scraping!

Josha Dekker (Feb 10 2024 at 16:31):

I've checked the GitHub, this part should be easy

Josha Dekker (Feb 10 2024 at 17:29):

I wrote some Julia code that reads in the properties and automatically tries to infer the Mathlib names. Then it creates a file that looks as follows. Of course, for any name that deviates, I should add an exception to the code, but that is fine. Would a file like this be useful to check if Properties have been defined in Mathlib? For Theorems, I'll do a similar thing in a few days, I don't have time right now!

import Mathlib

open TopologicalSpace

-- P000001 : $T_0$
#check T0Space

-- P000002 : $T_1$
#check T1Space

-- P000003 : $T_2$
#check T2Space

-- P000004 : $T_{2 \frac{1}{2}}$
#check T25Space

-- P000005 : $T_3$
#check T3Space

-- P000006 : $T_{3 \frac{1}{2}}$
#check T35Space


....


-- P000176 : Cardinality $\geq 4$
#check CardinalityGeq4Space

-- P000177 : $\sigma$-space
#check SigmaSpace

-- P000178 : $\aleph$-space
#check AlephSpace

-- P000179 : $\aleph_0$-space
#check Aleph0Space

-- P000180 : Hereditarily separable
#check HereditarilySeparableSpace

Josha Dekker (Feb 10 2024 at 17:38):

(If you're interested, I just forked the pi-base GitHub (it seems to not have grown a lot recently) and added a file 'properties_to_cleaned.jl', which outputs 'properties_in_lean.txt' and 'properties_cleaned.txt'. (The latter should be a lean-file, but I'll change the extension manually later, to avoid errors regarding that there is no project.). Everything is very hacky, but that's fine for this purpose, as we only need to get the files once I guess)

Josha Dekker (Feb 12 2024 at 18:42):

I can generate Lean versions of the theorems in pi-base automatically now. As the names for the various spaces are determined by (hacky) pattern matching, I will add modifications as I go. For now, I can generate files that look like

import Mathlib

variable {X : Type*}
--T000001:
example [TopologicalSpace X] [CompactSpace X]
  : CountablyCompactSpace X
 := by infer_instance

--T000002:
example [TopologicalSpace X] [CountablyCompactSpace X]
  : WeaklyCountablyCompactSpace X
 := by infer_instance

--T000003:
example [TopologicalSpace X] [SequentiallyCompactSpace X]
  : CountablyCompactSpace X
 := by infer_instance

--T000004:
example [TopologicalSpace X] [CountablyCompactSpace X]
  : PseudocompactSpace X
 := by infer_instance

and

import Mathlib
--T000001:
variable {X : Type*} [TopologicalSpace X]
variable [CompactSpace X]
#synth CountablyCompactSpace X

--T000002:
variable {X : Type*} [TopologicalSpace X]
variable [CountablyCompactSpace X]
#synth WeaklyCountablyCompactSpace X

--T000003:
variable {X : Type*} [TopologicalSpace X]
variable [SequentiallyCompactSpace X]
#synth CountablyCompactSpace X

--T000004:
variable {X : Type*} [TopologicalSpace X]
variable [CountablyCompactSpace X]
#synth PseudocompactSpace X

So, most automatisation seems done

Josha Dekker (Feb 12 2024 at 18:48):

However, I have two questions:

  1. I will probably want to run this in some project that depends on Lean4, what is the best approach for this? I don't need anything fancy, if it just updates Mathlib and Lean daily and runs the files I'm happy. Extra credit if someone can set this up for me real quick (as it should probably also be able to run Julia) (I'm happy to convert my code to Python if this is substantially easier).

  2. Now I'm using infer_instance, which seems to work, but I'm not sure how to get that to work when I get spaces that do not have a certain property, like ¬CompactSpace. The same problem seems to occur for using #synth. For infer_instance, I'm just adding hypotheses like h1 : ¬CompactSpace, but I don't think that'll allow infer_instance to work. I tried aesopbut that fails at e.g.

import Mathlib

--T000099:
example [TopologicalSpace X] [T1Space X] [NormalSpace X]
  : T4Space X
 := by aesop

Josha Dekker (Feb 12 2024 at 18:49):

Currently, the project is sitting here (https://github.com/JADekker/pi_base_data), which is a fork of the pi-base GitHub. The files that check the properties are in the folders dubbed lean_checklist_..., where I've split up the files to make sure they are not too long.

Josha Dekker (Feb 12 2024 at 18:51):

As 140 out of 174 definitions in pi-base seem missing (probably less because of my bad pattern-matching), it is a bit hard to easily generate a long list of examples where it works/fails.

Josha Dekker (Feb 12 2024 at 18:54):

If that number is right, 341 out of 447 theorems in Pi-base cannot be attempted because a definition is lacking, so I think we now have a very nice list of problems that people may tackle if they want a new, hopefully usually relatively short, project

Josha Dekker (Feb 12 2024 at 18:55):

Would there be a nice way to summarise the files that I made/automatically count the number of missing definitions/identify the most important missing definitions and so forth, so we can identify what things one might focus on? Let me know!

Emilie (Shad Amethyst) (Feb 12 2024 at 20:55):

Josha Dekker said:

  1. I will probably want to run this in some project that depends on Lean4, what is the best approach for this? I don't need anything fancy, if it just updates Mathlib and Lean daily and runs the files I'm happy. Extra credit if someone can set this up for me real quick (as it should probably also be able to run Julia) (I'm happy to convert my code to Python if this is substantially easier).

Github actions would be the obvious way to go. There seems to be a CRON tool, and in general those give you a way to run commands for you, so as long as you can already run all the things you need from a bash script, you should be able to automate it :)

Kevin Buzzard (Feb 12 2024 at 21:25):

This is a very cool project BTW! Once it's up and running you might want to ping some topologists to see if we can persuade them to fill in the missing definitions :-)

Eric Wieser (Feb 12 2024 at 21:48):

I'd recommend writing

import Mathlib

namespace PiBase

variable {X : Type*}
theorem T000001 [TopologicalSpace X] [CompactSpace X]
  : CountablyCompactSpace X
 := by infer_instance

theorem T000002 [TopologicalSpace X] [CountablyCompactSpace X]
  : WeaklyCountablyCompactSpace X
 := by infer_instance

etc, because then you can easily check which theorems are present and missing, as I do in the matrix cookbook

Eric Wieser (Feb 12 2024 at 21:49):

I'm using this script and these CI steps to generate this progress bar.

Josha Dekker (Feb 13 2024 at 06:18):

Emilie (Shad Amethyst) said:

Josha Dekker said:

  1. I will probably want to run this in some project that depends on Lean4, what is the best approach for this? I don't need anything fancy, if it just updates Mathlib and Lean daily and runs the files I'm happy. Extra credit if someone can set this up for me real quick (as it should probably also be able to run Julia) (I'm happy to convert my code to Python if this is substantially easier).

Github actions would be the obvious way to go. There seems to be a CRON tool, and in general those give you a way to run commands for you, so as long as you can already run all the things you need from a bash script, you should be able to automate it :)

I’m kind of a GitHub novice, how hard would it be to set this up?

Josha Dekker (Feb 13 2024 at 06:19):

Eric Wieser said:

I'm using this script and these CI steps to generate this progress bar.

Thank you, I’ll look into it!

Yury G. Kudryashov (Feb 16 2024 at 05:36):

#10621 brings us 1 step closer by defining R0R_0 spaces

Yury G. Kudryashov (Feb 16 2024 at 05:59):

BTW, is there a name for "exists a UniformSpace structure that agrees with the topology and is a CompleteSpace"? Something between completely regular and Polish.

Josha Dekker (Feb 16 2024 at 07:18):

Yury G. Kudryashov said:

#10621 brings us 1 step closer by defining R0R_0 spaces

That’s great! #10601 Adds the implication from sigma compact to Lindelöf by the way!

Josha Dekker (Feb 16 2024 at 07:20):

Eric Wieser said:

I'm using this script and these CI steps to generate this progress bar.

I’ll try to learn some programming in Lean the coming days to properly understand this so I can modify it for pi-base, if time allows! (I can’t neglect my own research of course!)

Josha Dekker (Feb 16 2024 at 07:27):

Yury G. Kudryashov said:

BTW, is there a name for "exists a UniformSpace structure that agrees with the topology and is a CompleteSpace"? Something between completely regular and Polish.

I’m not aware of a name, good question!

Yury G. Kudryashov (Feb 16 2024 at 07:28):

Josha Dekker said:

Yury G. Kudryashov said:

#10621 brings us 1 step closer by defining R0R_0 spaces

That’s great! #10601 Adds the implication from sigma compact to Lindelöf by the way!

I merged it but I think that we should split the file and change the import hierarchy so that we can use this fact in the file about σ-compact spaces to deduplicate API.

Yury G. Kudryashov (Feb 16 2024 at 07:29):

As a first step, we can move all the compactness-related definitions to a new file Topology/Compactness/Defs, moving some of them from Topology/Defs/Filter and some from other files.

Josha Dekker (Feb 16 2024 at 07:30):

Yury G. Kudryashov said:

Josha Dekker said:

Yury G. Kudryashov said:

#10621 brings us 1 step closer by defining R0R_0 spaces

That’s great! #10601 Adds the implication from sigma compact to Lindelöf by the way!

I merged it but I think that we should split the file and change the import hierarchy so that we can use this fact in the file about σ-compact spaces to deduplicate API.

Thank you!

Josha Dekker (Feb 16 2024 at 07:31):

Yury G. Kudryashov said:

As a first step, we can move all the compactness-related definitions to a new file Topology/Compactness/Defs, moving some of them from Topology/Defs/Filter and some from other files.

Yes, this sounds like a good approach!

Yury G. Kudryashov (Feb 16 2024 at 07:31):

Another possible name for the file: Topology/Defs/Compact

Yury G. Kudryashov (Feb 16 2024 at 07:31):

Not sure which one is better.

Yury G. Kudryashov (Feb 16 2024 at 08:23):

Fun fact: π-base can't prove that https://topology.pi-base.org/spaces?q=Pseudometrizable+%2B+Separable+%2B+%7ESecond+Countable is impossible

Josha Dekker (Feb 16 2024 at 09:34):

Yury G. Kudryashov said:

#10621 brings us 1 step closer by defining R0R_0 spaces

This should close/make it able to close at least three theorems (T286 - T288) immediately or relatively easily, so that is very nice!

Josha Dekker (Feb 16 2024 at 09:36):

Yury G. Kudryashov said:

Fun fact: π-base can't prove that https://topology.pi-base.org/spaces?q=Pseudometrizable+%2B+Separable+%2B+%7ESecond+Countable is impossible

They take a database approach as far as I can tell, so if nobody added the result, it doesn't know this indeed. Once we have most things in Mathlib, a similar website/page on the site where you pick properties and it tries to infer the result in Lean would be very nice

Junyan Xu (Feb 16 2024 at 10:19):

Yury G. Kudryashov said:

BTW, is there a name for "exists a UniformSpace structure that agrees with the topology and is a CompleteSpace"? Something between completely regular and Polish.

https://en.wikipedia.org/wiki/Completely_uniformizable_space

Patrick Massot (Feb 16 2024 at 12:28):

I am a bit worried that this pi-base tracking idea could lead to performance issues, or at least code bloat. I think we really don't want every topology lemma to send instance synthesis through a maze of hundreds of classes defining extremely exotic properties of topological spaces. Also we don't want each lemma to require 20 lines of type class assumptions like we have in differential geometry.

Patrick Massot (Feb 16 2024 at 12:30):

I think we should have definitions only for properties that show up in mathematics outside of the corner of mathematics that is studying pathological topological spaces as an end goal.

Josha Dekker (Feb 16 2024 at 12:38):

Patrick Massot said:

I think we should have definitions only for properties that show up in mathematics outside of the corner of mathematics that is studying pathological topological spaces as an end goal.

That is a fair concern, I think we should probably label which results are ‘useful but missing’ and which ones are ‘missing but currently not of immediate interest’. I wouldn’t make most of them instances either, we can keep most them as theorems?

Kevin Buzzard (Feb 16 2024 at 12:54):

There should be a condensed sets version of this so we can see how many of the pathologies survive :-)

Josha Dekker (Feb 16 2024 at 13:12):

I agree that we shouldn't have too many properties/results in Mathlib that are (/seem) not necessary. However, I believe that a systematic approach in which we identify all properties that have (and have not) been added to Lean might be useful. Missing properties could then be labelled, e.g. WIP, missing, needed for X, Y, Z or missing, exotic, to indicate what the community thinks on whether these should/should not be added.

I also agree that not all theorems in pi-base should be made instances, but they could be made theorems/examples. However, it seems to me that getting a 1-1 mapping between theorems in pi-base and lean proofs of these statements might be very useful.

Kevin Buzzard (Feb 16 2024 at 13:28):

In practice it seemed to be quite difficult to establish a 1-1 mapping between theorems in the Stacks project and theorems in mathlib, because there were theorems in the Stacks project which we didn't want in mathlib (e.g. they were not stated in the right generality, they were too weird, they were actually definitions, they were used only in proofs which we didn't want because we had a different proof of the same theorem etc etc)

Yury G. Kudryashov (Feb 16 2024 at 14:11):

I added R₀ because I was generalizing a lemma from T2 to R1 and a dependency was done for T1 (R0 is enough).

Yury G. Kudryashov (Feb 16 2024 at 15:32):

!bench found no significant changes

Yury G. Kudryashov (Feb 16 2024 at 21:41):

A few more ideas (slightly outside of the scope of the pi base, but close enough):

  • create a table with rows T3Space etc and columns X × Y and other constructions (probably, separate rows for ∀ i : ι, X i with finite, countable, and generic ι) for each cell try to generate the instance;
  • code and linter to autogenerate instances for "unrelated" synonyms (e.g., topological instances for OrderDual or Multiplicative).

Josha Dekker (Feb 17 2024 at 06:54):

Could be interesting!

Steven Clontz (Apr 25 2024 at 21:37):

Cross-referencing some related discussion that spun out of my newbie questions at https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/general.20topology.20in.20mathlib where I'm taking naive steps to start writing formalizations of some of our assertions at the pi-Base (including for some properties not currently modeled in mathlib).

Steven Clontz (Apr 25 2024 at 21:39):

Patrick Massot said:

I think we should have definitions only for properties that show up in mathematics outside of the corner of mathematics that is studying pathological topological spaces as an end goal.

Sounds quite reasonable to me - as a general topologist I'm quite interested in formalization of my field, but that of course could be done as a project that just uses mathlib4 as a dependency. But it'd be great if there was a clear workflow so that things of more general use can be implemented upstream: I'm a bad judge of what those are as someone who loves pathologies in set-theoretic topology. :upside_down:

Steven Clontz (Apr 25 2024 at 22:43):

Eric Wieser said:

scrape the website

:triumph: :laughing: Enabling folks who want to build our on database without having to scrape our app (or maintain an API) is one of the several reasons it's all just text files stored at https://github.com/pi-base/data Another option is the JSON artifact we produce as part of our CI. https://github.com/pi-base/data/actions/runs/8794979526 I wish more databases open-sourced their data (though we have an advantage that we're not storing a huge computer-generated dataset)

Steven Clontz (Apr 25 2024 at 22:48):

Anyway, thanks to Josha in that other thread for catching me up on this project. I'm quite interested in collaborating with anyone who wants to work with us to find the best way our projects can support each other (and build engagement in formalized mathematics by topologists).

Richard Osborn (Apr 26 2024 at 12:39):

Steven Clontz said:

Eric Wieser said:

scrape the website

:triumph: :laughing: Enabling folks who want to build our on database without having to scrape our app (or maintain an API) is one of the several reasons it's all just text files stored at https://github.com/pi-base/data Another option is the JSON artifact we produce as part of our CI. https://github.com/pi-base/data/actions/runs/8794979526 I wish more databases open-sourced their data (though we have an advantage that we're not storing a huge computer-generated dataset)

A JSON artifact would be ideal for lean as the markdown+yaml output is quite hard to parse in general, so lean will unlikely have a parser for that format any time soon. Is there an easy way to link to the latest bundle.json artifact?

Steven Clontz (Apr 30 2024 at 02:12):

https://pi-base-bundles.s3.us-east-2.amazonaws.com/refs/heads/main.json

I should add that link explicitly at the https://topology.pi-base.org/dev page so it's easier to find (and will do so now).

Steven Clontz (Apr 30 2024 at 02:39):

And done. Thanks for pointing out how obnoxious it is to find that link. (Well, to be honest, I was trying to show a student earlier today how the pi-Base worked, and that made me remember this thread.) :sweat_smile: Anyway, hopefully the next person who wants to hack on the pi-Base can discover that we have a relatively easy JSON blob to ingest.


Last updated: May 02 2025 at 03:31 UTC