Zulip Chat Archive

Stream: PrimeNumberTheorem+

Topic: Formalizing Riemann-Stieltjes theory


Harald Helfgott (Jan 29 2026 at 00:04):

I have been attempting to get a formalization of Appendix A in Montgomery–Vaughan, which sets up a basic Riemann–Stieltjes integration theory—good enough, I presume, for everything that Montgomery–Vaughan do, which is plenty. It will certainly enable us to integrate by parts when one function is of bounded variation and the other one is in C1C^1, thus filling a yawning gap in the theory. In this way, for instance, it will finally let us prove the standard decay estimate for Fourier transforms of functions of bounded variation, the lack of which has been blocking progress in more than one place.

Most of this work has been carried out using Aristotle, with a fair amount of prompt engineering (via item 3 in its menu). One thing that became clear is that coding this kind of low-level combinatorial infrastructure in Lean —interval partitions, refinements, unions, mesh estimates—is a particularly difficult regime for Aristotle (and other AIs?), especially compared with its successes in matters that naturally impress us much more as mathematicians. Progress on Theorem A.1 only became possible after iteratively refining prompts, backtracking whenever matters did not seem to converge towards a result. I am now carefully cleaning up and restructuring the generated code by hand.

I should also say that I am still a beginner in Lean/Mathlib, and that I am getting plenty of help from AI tools (including ChatGPT and VS Code's own chatbot) as I try to understand what is going on and improve the code. Once I have something halfway satisfactory, I would like to post the .lean file so that others can help improve it further.

At any rate, here comes the crucial part. Before going much further, I would really appreciate advice on two structural questions:
(1) Abstractions for partitions.
I am currently working with a very simple structure RSPartition (a Monotone map x:Fin(n+1)Rx : \mathrm{Fin}(n+1) \to \mathbb{R} with specified endpoints). While the definition itself is trivial, proving even basic lemmas cleanly has been surprisingly painful. (Here I am speaking as the unelected representative of a team consisting of one hapless human and several AIs.) In particular, constructing a partition from a finite set of points (RSPartition.fromPoints) and then using it to define the union / refinement of partitions leads to a lot of code and code manipulations that I still find unintuitive.
Is there already an existing abstraction or API in Mathlib that I should be using instead of rolling this by hand?
(2) Interface with Mathlib’s Riemann integration.
For the theory to actually be useful, it seems essential to connect this mesh-based Riemann(-Stieltjes) theory with Mathlib’s existing Riemann integration framework (via BoxIntegral). I have asked about this before, but I would like to emphasize that we need an actual interface here, or the whole development risks remaining isolated. (More concretely: I would like to be able to use the fact that a bounded function on [a,b][a,b] that is continuous outside a set of measure 00 is Riemann-integrable.) What is the recommended way to set up this bridge?

Any pointers would be very welcome.

Sébastien Gouëzel (Jan 29 2026 at 07:54):

Isn't the box integral framework already providing all the partitions you need? I.e., what do you need to develop which isn't already there?
@Yury G. Kudryashov could probably say more, as the author of box integrals in Mathlib.

Yury G. Kudryashov (Jan 29 2026 at 07:57):

Mathlib has box partitions of a box in I -> Real, but has no specification to dimension one (at least, I didn't write any, but I didn't follow this folder since then).

Yury G. Kudryashov (Jan 29 2026 at 07:58):

Did you try building an equivalence to partitions of a box in I -> Real? Also, https://aristotle.harmonic.fun/ may be better in Lean than ChatGPT (but it has no interactive version yet, so it's more suitable for harder goals).

Yury G. Kudryashov (Jan 29 2026 at 07:59):

COI disclosure: I work for Harmonic

Harald Helfgott (Jan 29 2026 at 08:01):

My question went partly in that direction (at least in my head): what can one get by specializing box integrals to dimension $1$?

Yes, I've been using Aristotle throughout - I thought I had made it clear in the above. It is just that it got lost in the woods when proceeding on its own; things went forward only once I made ChatGPT (which was of limited utility on its own, in part due to its perception of Mathlib being rather impressionistic) draft scripts to give to Aristotle.

Harald Helfgott (Jan 29 2026 at 08:03):

I can post the .lean file here now (it still needs some golfing, and I also wanted to introduce some namespace discipline first). Then people can tell me whether this can all be done more easily through BoxIntegral.

At the very least, however, building an equivalence should give the answer to (2). Can you give me a hand? I'm not even sure of where to find the formal definition of a partition of a box in BoxIntegral.

Harald Helfgott (Jan 29 2026 at 08:06):

By the way, as I've said elsewhere, I am very impressed with Aristotle - it managed to formalize chunks of a measure-theory/Fourier-analysis textbook (Wheeden-Zygmund), often with little or no human intervention.

Harald Helfgott (Jan 29 2026 at 08:08):

I think setting up a low-level combinatorial structure in Lean is the issue here. (I can sympathize with Aristotle, as it is not natural for me either.)

Yury G. Kudryashov (Jan 29 2026 at 10:34):

See docs#BoxIntegral.Prepartition and docs#BoxIntegral.Prepartition.IsPartition

Harald Helfgott (Jan 29 2026 at 11:35):

Right; I can see that a BoxIntegral.Prepartition for n=1 must be mathematically equivalent to the definition IPartition I have in the file.

All right - here is the Lean file (my first real Lean file — expect horrors).

I don’t know whether the RS.IPart layer will eventually collapse completely into an existing Mathlib framework; I’m very much looking forward to hearing what @Yury G. Kudryashov has to say about that. I am not placing my hopes on that, since if I understood correctly, the special case n=1n=1 has not been worked out in particular in BoxIntegral, and there is much that works out differently or more simply for n=1n=1.

What will be important going forward:
(a) A need: an interface to BoxIntegral.
What I would like is the following workflow. Start with a function
f:[a,b]Rf : [a,b] \to \mathbb{R}, assume it is bounded and continuous outside a set of Lebesgue measure zero. Use Mathlib’s existing theory (BoxIntegral) to obtain that ff is Riemann integrable in the BoxIntegral sense — and then (this is the crucial interface) have this imply that ff is Riemann integrable in the lightweight theory developed in this file.

Conceptually this should be straightforward, but I don’t yet see the cleanest way to set up the bridge, and I’d be very grateful for advice from people familiar with BoxIntegral (especially its author :slight_smile:).
(b) A want: simplifying “obvious” combinatorial lemmas.
There are several lemmas in RS.IPart that are mathematically trivial but turned out to be surprisingly painful to formalize — both for Aristotle and for a human beginner like myself. I suspect that some Mathlib machinery should make these much shorter and cleaner, but I don’t yet know how to use it properly.

Examples include:

  • fromPoints

  • points_fromPoints

  • refinement_index_map

  • sum_partition_blocks

If there are standard tricks, existing lemmas, or better abstractions that would let these collapse to a few lines, I’d love to learn about them.

Any feedback — especially on structure, abstraction choices, or existing Mathlib APIs I may have missed — would be very welcome.
MyMV_A1.lean

Yury G. Kudryashov (Jan 29 2026 at 14:51):

For (a), you can apply Mathlib API to the function g : (Fin 1 → Real) → E given by g v = f (v 0). You'll have to manually transfer assumptions from f to g. This is not very convenient, but doable with the current Mathlib. For converting results about Bochner integral back and forth, see draft PR #34575 (not ready for review yet). You may want to develop a similar API for box integrals.

Yury G. Kudryashov (Jan 29 2026 at 14:52):

It would be easier to review your code if you push it to Github.

Harald Helfgott (Jan 29 2026 at 15:18):

Very well, I'll push it to GitHub. Can you give us a few more details as to (a)?

Alex Kontorovich (Jan 29 2026 at 23:44):

Harald, the general mathlib way to say Rn\R^n is Fin n → Real (an nn-tuple of real numbers is a map from (0,...,n1)(0,...,n-1) to R\R). So if you just want one-variable, Yury is suggesting Fin 1 → Real for the domain.

Yury G. Kudryashov (Jan 29 2026 at 23:48):

Since the translation between the n-dimensional theory and 1-dimensional theory is far from being automatic, it makes sense to add definitions about functions f ⁣:REf\colon \mathbb R \to E, where E is a normed space, at least for notions like integrability and main theorems about it. I wouldn't translate boxes etc until you actually need them (and the proof can't be done in dimension nn, then specialized).

Yury G. Kudryashov (Jan 29 2026 at 23:50):

E.g., a function f ⁣:REf\colon\mathbb R\to E is box integrable on [a,b][a, b], if the function gf ⁣:R1Eg_f\colon \mathbb R^1\to E given by gf(v)=f(v0)g_f(v)=f(v_0) is integrable on [consta,constb][\operatorname{const} a, \operatorname{const} b]. In Lean, it's fun _ : Fin 1 => a and fun _ : Fin 1 => b. You can also use ![a] and ![b]. While they're equal to the constant functions, they aren't definitionally equal. I'm not sure which encoding will be more convenient down the path.

Alex Kontorovich (Jan 29 2026 at 23:55):

Come to think of it, is there not general Mathlib API for coercions between Real and Fin 1 → Real, and functions on such? I would think that would come in handy all over the place?

Yury G. Kudryashov (Jan 30 2026 at 00:01):

We have them as Equivs and some bundled equivalences (e.g., MeasurableEquivs, Homeomorphs), but we don't have a good way to automatically transfer definitions/theorems along these equivalences.

Harald Helfgott (Jan 30 2026 at 00:02):

Then it sounds like I've been proceeding correctly, no? I am working on R\mathbb{R}, and I'll call on the theory on R1\mathbb{R}^1 at the last possible moment.

What I need will be really something minimal: that, if a function f:RCf:\mathbb{R}\to \mathbb{C} is continuous outside a set of Lebesgue measure zero, then it is Riemann-integrable in a very transparent sense (namely: I create a filter of meshes, and ask for the Riemann sums to converge). What one would do here would be to construct a function $g_f:\mathbb{R}^1\to \mathbb{C}$, show it is continuous outside a set of Lebesgue measure zero (duh! but I imagine there's some trickery in proving this trivial fact), call a library function to show that it is Riemann-integrable in the BoxIntegral sense, and then show that this implies that it is Riemann-integrable in the "very transparent" sense (a special case of Riemann-Stieltjes: integrability with respect to dx).

Incidentally, where and how do I push my code to GitHub? A kind soul told me at some point, but I've forgotten how. I take I can't just push it anywhere - I have to use a particular directory and so forth?

Yury G. Kudryashov (Jan 30 2026 at 00:02):

You can look at file#MeasureTheory/Integral/DivergenceTheorem for an example of transferring results about Fin n -> Real to results about Real or Prod Real Real (n = 2).

Yury G. Kudryashov (Jan 30 2026 at 00:04):

You need to create a repository (a new project using lake new) or a fork of an existing project like PNT or Mathlib.

Yury G. Kudryashov (Jan 30 2026 at 00:04):

After creating an account, if you haven't done this yet.

Harald Helfgott (Jan 30 2026 at 00:07):

An account on Github? Aha - what would be recommended protocol? It seems that we have by now some sort of ever-expanding analytic number theory project that is much wider than PNT+'s original purview, yet doesn't have the same goals or standards as Mathlib (though I take the aim is that whatever fits in Mathlib becomes part of Mathlib once it reaches Mathlib's standards).

Alex Kontorovich (Jan 30 2026 at 01:11):

Harald, if you want, I can put the code into PNT+ for you - just email it to me

Harald Helfgott (Jan 30 2026 at 07:59):

Thanks, Alex, though a man should also learn to fish and so forth.

Apparently I already have a Github account, and even my own fork created two weeks ago, somehow (https://github.com/Garald1/PrimeNumberTheoremAnd). Let me try to figure this out - I'll no doubt have questions.

So what is the accepted procedure - I set up a Lean directory in my fork, upload my files to it, and then I request for them to go into PNT+ (meaning Alex's Github)?

Harald Helfgott (Jan 30 2026 at 08:05):

I'm reading Contributing.md - the following instructions are the relevant ones, no?
image.png

Yury G. Kudryashov (Jan 30 2026 at 08:07):

You make a checkout of your fork, add files there, then open a pull request.

Aayush Rajasekaran (Jan 30 2026 at 14:26):

@Harald Helfgott I'm happy to jump on a quick call to teach you how to fish, if you'd like :fishing:

Aayush Rajasekaran (Jan 30 2026 at 14:28):

Harald Helfgott said:

I'm reading Contributing.md - the following instructions are the relevant ones, no?
image.png

Those are more related to the specifics of any changes one is making -- they're not instructions for setting up your fork and opening Pull Requests.

Harald Helfgott (Jan 31 2026 at 23:46):

@Aayush Rajasekaran Thank you! I just have a single lean file to push - care to tell me how to do it? It contains the basics of Riemann -Stieltjes integration.

Aayush Rajasekaran (Feb 01 2026 at 02:05):

Harald Helfgott said:

Aayush Rajasekaran Thank you! I just have a single lean file to push - care to tell me how to do it? It contains the basics of Riemann -Stieltjes integration.

You first need to check out your fork as Yury said above. If you've already done that, add the file to the appropriate folder, then run:

  • git add <name of the file>
  • git commit -m "feat: Riemann-Stieltjes basics"
  • git push origin main.

If you haven't checked out your fork yet, there's a few steps you have to run first. I'm happy to jump on a quick call to walk you through it, or ChatGPT can probably help you too.

Harald Helfgott (Feb 01 2026 at 02:31):

Oh, I have a fork now. Let me first finish getting a working proof of Theorem A.3 in Montgomery-Vaughan - then we can upload it.

Harald Helfgott (Feb 02 2026 at 11:53):

@Aayush Rajasekaran I now have typechecked code for Theorems 1, 2 and 3a. Care to walk me through things?

What is still unclear to me is what particular directory I should upload to - this is not a GitHub question, this is a question on what is the norm here.

Terence Tao (Feb 02 2026 at 18:51):

Right now we are placing all lean files in PrimeNumberTheoremAnd, though if there is a cluster of related Lean files it may be suitable for a subdirectory of that directory. It is a relatively simple matter to reorganize these files later if needed.

Harald Helfgott (Feb 02 2026 at 19:00):

In the end, I will just need three facts about Riemann integrals on an interval [a,b]:
(a) if f is Riemann-integrable, then it is Lebesgue integrable, and its Riemann and Lebesgue integral are the same;
(b) if f is Riemann-integrable, then it is bounded;
(c) if f and g are Riemann-integrable, then f g is Riemann integrable.

I imagine all of these are already in BoxIntegral? What are their names? (A brief Mathlib search yielded only confusion on my end, perhaps BoxIntegral.HasIntegral.integral_eq is (a)?)

If that is the case, then all we need is to show that the "naive" definition of Riemann integral:

/-
A Riemann-Stieltjes partition of [a, b] is a monotone map x : Fin (n+1) → ℝ with x(0) = a and x(n) = b.
-/
structure IPartition (a b : ) where
  n : 
  x : Fin (n + 1)  
  monotone : Monotone x
  left : x 0 = a
  right : x (Fin.last n) = b

structure TaggedPartition (a b : ) extends IPart.IPartition a b where
  xi : Fin n  
  h_xi :  i : Fin n, x i.castSucc  xi i  xi i  x i.succ

def RSum {a b : } (f :   ) (P : TaggedPartition a b) :  :=
   i : Fin P.n, f (P.xi i) * ((P.x i.succ) - (P.x i.castSucc))

def HasRSIntegral (f :   ) (a b I : ) : Prop :=
   ε > 0,  δ > 0,  (P : TaggedPartition a b), P.mesh  δ  |RSum f P - I| < ε

coincides with the BoxIntegral definition on one-dimensional intervals [a,b]. I imagine this is straightforward?

Harald Helfgott (Feb 03 2026 at 19:49):

@Yury G. Kudryashov and everybody else: I have just uploaded files to Github - see https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/pull/871 I'd be interested in your feedback.

Terence Tao (Feb 04 2026 at 02:19):

I guess the main question for me is to what extent the main Riemann-Stieltjes integration by parts results ( Theorem_A2 of MyMV_A3a.lean‎, I guess) be usefully deployed to resolve tasks that are currently blocked by the need to do something like Riemann-Stieltjes integration by parts, such as PNT#650, PNT#562, or PNT#563. It may be that a lot more "API" connecting the RS integral in that file with more conventional Mathlib concepts, such as the Lebesgue-Stieltjes integral, may be needed.

Harald Helfgott (Feb 04 2026 at 02:42):

I believe we do not need much more than what we already have. Let me explain why.

Once we have Theorem A3b, that is, the second half of Theorem A3 (we have got as far as the first half) we will just need a lemma identifying the Riemann integral as in this file (not the Riemann-Stieltjes integral in general) with the Riemann integral in BoxIntegral; that is the main missing step. Then, thanks to the Mathlib lemmas that come with BoxIntegral, we will have that a Riemann-integrable function is Lebesgue-integrable, and that the two integrals then coincide. Once we have that, the entire theory is connected, and, in particular, PNT #562 and similar statements (that is, decay estimates for the Fourier transforms of functions of bounded variation) should follow with only a little additional work.

Now, to finish Theorem A3b, we can either go through a somewhat painful process (similar to what I've been through in the last few days to obtain Theorem A3a) or we can use the link to BoxIntegral we will need anyhow to obtain basic results that we would have to prove otherwise (a Riemann-integrable function is bounded, the product a Riemann-integrable and a continuous function is Riemann-integrable [in fact, the product of any two R-integrable functions is R-integrable]; I imagine these results, or others that imply them, are already in Mathlib, for Riemann integrals as defined in BoxIntegral - that is why I have asked others to confirm this). I think the second course would be wiser.

Let me reiterate: you don't need to have a link with the Lebesgue-Stieltjes theory; the "edge that makes the graph connected" is the edge between two definitions of the Riemann integral, namely, the one here (a _special case_ of RS) and the one in the library; once we have that the two definitions of "Riemann integral" are equivalent, the graph is connected, so to speak.

Terence Tao (Feb 04 2026 at 02:54):

OK. What are the plans for documenting these several thousand lines of Lean code? We are using the blueprint to organize the rest of the PNT+ project, so one possibility is to attach some blueprint metadata to the most important theorems in these files (the ones other contributors would actually use for their purposes). (Sublemmas that are only used internally within the file do not need to be blueprinted if they have already been formalized.)

In the rest of the project we proceed in the other direction: we lay out the blueprint first, and then create Lean code for each of the components identified by the blueprint. This generally makes it easier for other contributors to understand the structure of the files without having to directly read thousands of lines of Lean code.

Harald Helfgott (Feb 04 2026 at 03:00):

Well, my hope was that people would hack away at it and replace it by better code. I should say, however, that Aristotle has done a pretty good job of documenting its own code, or rather of describing what the lemma statements say. It is the proofs that are often ugly.

I'll do a second pass of the Fejér file at some point. I find the structure of MyMVA3a.lean to be pretty readable, perhaps precisely because it was the outcome of lots of back-and-forth between Aristotle and myself (or shall I say my alter ego).

Terence Tao (Feb 04 2026 at 03:02):

Possibly all that is needed is some summary docstring at the top of the Lean file describing the main results and any general notes, similar to what is done in Mathlib Lean files (random example: https://leanprover-community.github.io/mathlib4_docs/Mathlib/MeasureTheory/Measure/Stieltjes.html / https://github.com/leanprover-community/mathlib4/blob/be8defd50c175d2ce7598a43c1be199f53b7cae6/Mathlib/MeasureTheory/Measure/Stieltjes.lean , which has a few paragraphs about the main definitions and implementation notes before diving into the lemmas and proofs).

Harald Helfgott (Feb 04 2026 at 03:03):

Sure, I can try adding that tomorrow. Let me understand that these are Aristotle and Dr. Gotthelf's first attempts, and that, even though they do typecheck, they are meant for others to improve (or replace) immediately - hopefully while giving us feedback.

Terence Tao (Feb 04 2026 at 03:07):

Well, I think there will be few volunteers to offer constructive feedback on a block of 1000+ lines of AI-generated code. If there is a high-level structure that is described in a human-readable format and describes the results that would be most useful for applications, this would make it easier to review both at a high level, and at the low level of an individual lemma or other result identified in that structure.

Harald Helfgott (Feb 04 2026 at 03:08):

Sure. In a sense there is such a detailed, human-readable structure already: the files I've uploaded follow the relevant passages in Wheeden-Zygmund and Montgomery-Vaughan.

It's 4:08am - let me document more tomorrow.

Terence Tao (Feb 04 2026 at 03:09):

In that case I would recommend tagging key theorems from WZ and MV with blueprint metadata, much as is done with other files in the blueprint that are following some literature source.

Harald Helfgott (Feb 04 2026 at 03:09):

Aha - how do I do that exactly? Should I do that within the lean file, or in some other format?

Terence Tao (Feb 04 2026 at 03:12):

That's done within the Lean file. Example: https://github.com/AlexKontorovich/PrimeNumberTheoremAnd/blob/main/PrimeNumberTheoremAnd/MobiusLemma.lean (compare with how the blueprint actually gets rendered at https://alexkontorovich.github.io/PrimeNumberTheoremAnd/blueprint/primary-chapter.html#a0000000032). (You need import Architect as a dependency of your file in order for the blueprint metadata to compile.) There are a few other files that need to be tweaked in order for this metadata to actually show up in the PNT+ blueprint, but I can sort that out manually.

Harald Helfgott (Feb 04 2026 at 03:14):

OK, I'll look into this. It seems to me that you are asking for two levels of documentation (and that's completely reasonable): one, a very basic one, to enable others to use and perhaps build on the existing code; another, more thorough, to make it easier to edit the code or replace parts of it.

Harald Helfgott (Feb 04 2026 at 03:15):

I'm quite happy to see this as a ship of Theseus that actually floats but whose individual parts can and arguably should be replaced.

Terence Tao (Feb 04 2026 at 03:16):

Yeah, those are the two basic options (Mathlib style documentation, or blueprinting.) In principle we could do both, but in practice we are using the blueprint only (and some minimal docstrings for each theorem, which Aristotle is already providing).

Terence Tao (Feb 04 2026 at 03:18):

This is what the lean documentation looks like right now for say your Mobius lemma: https://alexkontorovich.github.io/PrimeNumberTheoremAnd/docs/PrimeNumberTheoremAnd/MobiusLemma.html . It's very bare bones. One could add more text to it, but because we have the blueprint, there really isn't a strong need to.

Harald Helfgott (Feb 04 2026 at 03:19):

I wouldn't quite speak of blueprinting here, since we are trying to explain something that already exists. At any rate what the experience has taught me is that, once AIs are in the picture, blueprinting is sometimes unnecessary (Aristotle can formalize some proofs in textbooks directly) and sometimes insufficient (there are spots where the AI goes badly astray).

Terence Tao (Feb 04 2026 at 03:21):

Blueprinting becomes more valuable when it isn't just a single human using AI to formalize a project, but many humans working together (either with AI tools, or manually). So if the only objective is to document something that has already been formalized without any intention to modify the proofs further, a more bare-bones docstring would be fine, but if there are plans to refactor the code or to have different participants volunteer to work on different portions of the code, then the blueprint structure becomes quite useful.

Harald Helfgott (Feb 04 2026 at 03:23):

Right, blueprinting is meant more for humans, and we do want humans to look at this, but it's not meant quite for this situation. At any rate, we'll play this by ear. It could be an interesting experiment.

Terence Tao (Feb 04 2026 at 03:24):

Yeah, that's why I listed two separate options for how to document this code for others to be able to use.


Last updated: Feb 28 2026 at 14:05 UTC