Zulip Chat Archive

Stream: new members

Topic: Paul Chun-Kit Lee


Paul C Lee (Sep 07 2025 at 20:37):

Hello—I’m Paul. I’m a practicing cardiologist (affilated with NYU) based in Brooklyn, NY, but my nights are dedicated to a second life as an independent researcher in mathematical foundations. My project is Axiom Calibration (AxCal), a framework I am formalizing in Lean 4 to measure the logical strength of classical theorems.

My clinical background includes applied AI, and I initially hoped LLMs could accelerate my return to the math I set aside as a teenager. After being repeatedly burned by hallucinations, I now use Lean to ground every claim and, in a sense, to make the AI prove its work.

The initial spark for this came from functional analysis. Thinking of functionals as “measuring devices” and the bidual as a “mirror,” I focused on the persistent gap in infinite dimensions where a space fails to be reflexive—that is, where the canonical embedding into its bidual space is not surjective. I initially suspected Gödel-style limits were at play, but the answer only became clear through formalization. The rigor demanded by Lean 4 was the tool that allowed me to narrow the phenomenon down, showing the trail led not to incompleteness, but consistently to the Weak Limited Principle of Omniscience (WLPO).

I am not a professional mathematician and am actively seeking mentorship, code review, and direction. If this program is misguided, I would be grateful to hear it plainly now rather than wander for years. If there’s promise, I’m happy to do the unglamorous work of a trainee and keep buying the coffee.

My repository, with papers and the Lean builds, is here: https://github.com/AICardiologist/FoundationRelativity

Matt Diamond (Sep 07 2025 at 20:46):

Is this related to reverse mathematics?

Paul C Lee (Sep 07 2025 at 20:47):

Yes. It is related to CRM.

Niels Voss (Sep 07 2025 at 23:44):

Welcome! As a quick note, I'd recommend that if you are planning on primarily relying upon AI for formal proofs, that you write the definitions and theorem statements by hand, and rely on AI only for proofs. You state that you are using Lean because you don't really trust AI because of hallucinations, but if the AI writes your definitions and theorem statements, then it isn't actually any more safe than normal AI usage. We've had a lot of people in this Zulip who claim to have solved an open problem with AI, but they got the theorem statement wrong so the proof was meaningless.

Matt Diamond (Sep 07 2025 at 23:50):

@Paul C Lee you talk a lot about "bidual gaps" but when I google the phrase the only results are your papers. Is this a wholly original concept or is it known by another name in the literature?

Paul C Lee (Sep 08 2025 at 00:23):

Matt Diamond said:

Paul C Lee you talk a lot about "bidual gaps" but when I google the phrase the only results are your papers. Is this a wholly original concept or is it known by another name in the literature?

@Matt Diamond You are correct, "bidual gap" is a very descriptive and intuitive term, but it is not the standard terminology used in functional analysis literature. The usual term for a Banach space where the canonical embedding  is not surjective is a "non-reflexive Banach space." But I was literally more interested in what is in the "gap". By the time I realized what it was really called, my Github repo was already filled with the nickname.

Paul C Lee (Sep 08 2025 at 00:47):

Niels Voss said:

Welcome! As a quick note, I'd recommend that if you are planning on primarily relying upon AI for formal proofs, that you write the definitions and theorem statements by hand, and rely on AI only for proofs. You state that you are using Lean because you don't really trust AI because of hallucinations, but if the AI writes your definitions and theorem statements, then it isn't actually any safer than normal AI usage. We've had a lot of people in this Zulip who claim to have solved an open problem with AI, but they got the theorem statement wrong so the proof was meaningless.

Thank you for pointing this out. LLM can lie and create a seemingly correct proof that is hard to catch (hidden sorrys, sub-files that hide illegal shortcuts. I ended up using multiple agents where Gemini Deep Think serves as the architect to design the proof architecture and chief math consultant (senior math consultant agent); a GPT-5 Pro agent for crafting Lean codes and framework; and then Claude Code to be in charge of repo management and execution only. If the junior professor GPT-5-Pro agent fails, it will decide whether to ask Gemini Deep Think, who only provides mathematical support and Lean code snippets. A separate agent is responsible for auditing the proof.

It is because I am not an expert and "not in the loop", and quite frankly, every time I have reached out to math professors, I have been ignored. Therefore, I present my work here so that someone with actual expertise can assist me. Perhaps my proofs are all wrong, but any comments, especially criticism, help me make my work less wrong, which hopefully can benefit someone in future.

Yakov Pechersky (Sep 08 2025 at 02:10):

Where is the actual lean code for one of the "main results" godel_banach_main?

Yakov Pechersky (Sep 08 2025 at 02:13):

I only see a markdown file with an example lean snippet here
https://github.com/AICardiologist/FoundationRelativity/blob/617411835886e8d9e5bc79dcefb695423829ba7d/docs/analysis/lean-latex-alignment-p1.md?plain=1#L24-L35

**Lean Formalization**:
```lean
theorem godel_banach_main :
    consistencyPredicate peanoArithmetic 
    Function.Surjective (godelOperator (.diagonalization)).toLinearMap
```

which references some consistencyPredicate which I can only see defined here as

def consistencyPredicate : Prop := sorry

Yakov Pechersky (Sep 08 2025 at 02:14):

I think before it will be difficult to have people engage with the content when you have sorried data.

Matt Diamond (Sep 08 2025 at 02:25):

this file claims in the opening comment block "We do NOT open classical or use by_cases here." and then proceeds to do both

Weiyi Wang (Sep 08 2025 at 02:29):

(what is wrong with by_cases?)

Matt Diamond (Sep 08 2025 at 02:32):

the file is in a folder named "Constructive" so I assume the LLM was trying to avoid excluded middle, choice, etc.

Matt Diamond (Sep 08 2025 at 02:33):

(of course I don't have a problem with by_cases, I was just pointing out an example of how LLMs are unreliable)

Paul C Lee (Sep 08 2025 at 02:55):

Yakov Pechersky said:

Where is the actual lean code for one of the "main results" godel_banach_main?

In Paper 1, initially, we thought we were able to artificially put Gödel in an operator, and that turned out to be wrong as I tried to prove it. I did not clean up all the work folders/ .md and I apologize. The final paper was much more mundane, and the result is not novel: https://github.com/AICardiologist/FoundationRelativity/blob/main/Papers/P1_GBC/documentation/paper1-rankone-toggle-current.tex

Paul C Lee (Sep 08 2025 at 03:20):

Matt Diamond said:

this file claims in the opening comment block "We do NOT open classical or use by_cases here." and then proceeds to do both

Matt— you're absolutely right that the header comment in Ishihara.lean is misleading and contradicts the code. The claim "We do NOT open classical or use by_cases here" is flat-out wrong; the file opens Classical globally and uses classical, by_contra, and by_cases throughout. This is a documentation slip-up (likely from an LLM scaffold that didn't match the proof needs), not a flaw in the math itself, but it definitely erodes trust in the artifact. We'll update the comment ASAP to reflect reality.

To clarify the context: This file implements the forward direction (bidual gap ⇒ WLPO) using Ishihara's kernel technique, which is standard in constructive reverse math (CRM). In CRM, we use classical meta-logic to analyze implications over a constructive base like BISH—the object logic stays intuitionistic, but the proofs of equivalences (like this one) rely on classical tools for case-splits on undecidable props (e.g., "is α all-false?"). The kernel's g α def explicitly cases on that, which requires LEM in the meta-level, as noted in the paper's Section 2.2/ Remark 2.5. So the code is doing exactly what it should for CRM, but the "Constructive/" folder name and comment set false expectations.

@Weiyi Wang: Great question on by_cases—nothing's inherently wrong with it! In constructive settings, it's fine (and computable) if the prop is decidable (e.g., n = 0 for n : ℕ). But here, it's used on undecidable stuff like x = 0 in normed spaces or the all-false sequence, so it invokes Classical.em under the hood (enabled by open Classical). That's the classical usage we need for the meta-reasoning.

Thank you both for going over my work. :slight_smile:

Paul Schwahn (Sep 08 2025 at 03:47):

Hi Paul, I'm going to be the ass and just say it: we would all appreciate if you didn't communicate to us through LLM output (just judging from the first paragraph of your reply). This is a disrespectful use of everyone's time.

Paul C Lee (Sep 08 2025 at 04:06):

Paul Schwahn said:

Hi Paul, I'm going to be the ass and just say it: we would all appreciate if you didn't communicate to us through LLM output (just judging from the first paragraph of your reply). This is a disrespectful use of everyone's time.

You are right and I apologize. I am an amateur and I know I am dealing with a very difficult area of mathematics, and I just want to make sure I do not sound like a fool in my response.

Matt Diamond (Sep 08 2025 at 04:16):

Trust me, it's way worse to rely on an LLM than to say something stupid on here... the "new members" channel is (ideally) a safe space for asking stupid questions. I'm sure I've asked plenty of stupid questions on here (I'm also not a professional mathematician).

I would also advise you to actually take the time to learn Lean rather than relying on LLMs. You mentioned being ignored by math professors. If you want to be taken seriously, your best bet is to put in the work yourself, not outsource it to a language model.

Paul C Lee (Sep 09 2025 at 18:40):

Matt Diamond said:

Trust me, it's way worse to rely on an LLM than to say something stupid on here... the "new members" channel is (ideally) a safe space for asking stupid questions. I'm sure I've asked plenty of stupid questions on here (I'm also not a professional mathematician).

I would also advise you to actually take the time to learn Lean rather than relying on LLMs. You mentioned being ignored by math professors. If you want to be taken seriously, your best bet is to put in the work yourself, not outsource it to a language model.

Indeed my proof is flawed: I will see if it can be fixed. Thanks.

(deleted) (Sep 09 2025 at 19:00):

I don't see much hope...

(deleted) (Sep 09 2025 at 19:04):

It's rather well known that Claude Code is very bad at Lean

(deleted) (Sep 09 2025 at 19:07):

I think people using AI agents to write Lean code without a good understanding of how Lean works can make the community lose faith in AI. Even when AI does show promise.

Paul C Lee (Sep 09 2025 at 19:44):

Huỳnh Trần Khanh said:

I don't see much hope...

I will keep working on it even if it looks hopeless, until one day I get it right. Even if it will take five years or more.

suhr (Sep 11 2025 at 04:17):

Huỳnh Trần Khanh said:

I think people using AI agents to write Lean code without a good understanding of how Lean works can make the community lose faith in AI. Even when AI does show promise.

It's a common theme: people without skills try to use AI to compensate their lack of skill, while people who are proficient prefer doing things themself instead of relying on someone (or something) else.

suhr (Sep 11 2025 at 04:30):

Paul C Lee said:

You are right and I apologize. I am an amateur and I know I am dealing with a very difficult area of mathematics, and I just want to make sure I do not sound like a fool in my response.

A general advice: start small. Any difficult area starts from well-known and not so difficult results. Formalizing them will give you some ground to do more fancy stuff.

Do not outsource learning to somebody else. If something is difficult to you, do it yourself.

And don't forget about examples and counterexamples. They are as important as definitions and theorems.

Paul C Lee (Sep 11 2025 at 17:22):

suhr said:

Paul C Lee said:

You are right and I apologize. I am an amateur and I know I am dealing with a very difficult area of mathematics, and I just want to make sure I do not sound like a fool in my response.

A general advice: start small. Any difficult area starts from well-known and not so difficult results. Formalizing them will give you some ground to do more fancy stuff.

Do not outsource learning to somebody else. If something is difficult to you, do it yourself.

And don't forget about examples and counterexamples. They are as important as definitions and theorems.

Thank you for your encouragement.

Just a clarification, and take it from the perspective of a Lean novice: Claude Code cannot code Lean code, but is good at managing PR/CI on Github. Furthermore, Claude Code can do illegal shortcuts. Goedel Prover v2 did not seem to work well: but perhaps a one-shot solution is suboptimal. My Python codes for cardiology use case have always been iterative over a long period of time as we look for edge cases. So my naive thinking was: instead of expecting any AI to produce a working code on one go, it has to be iterative, with different agents, ideally from different LLMs to take advantage of their individual strengths. I noticed GPT-5-Pro has long memory and context, and can handle Lean 5 codes on a tactical level. However GPT-5-pro seems to have problem with coming up with proper architecure and strategies when the answer is not clear. In contrast, Deep Think Gemini seems to be very good with strategies and designing a framework, but may not get the Lean code to compile.

Taking advantage of their individual strength and weakness, I have gotten my Lean codes to compile, to kind of work, at least for a Lean novice. GPT-Pro-5 even told me to bring my repo to this forum to get real feedback in case we have any blind spots (which were many).

To Lean expert, what I am doing may be blasphemous. But I get to experience the joy of Lean. And I appreciate all feedbacks. (actually I was more worried about people ignoring me than people flaming me with large torches.)

In my own field (medicine), we encountered the same issue with multiple AI agent collaboration: an expert needs to oversee the final output. However, it was clear to me in medicine that the use of AI definitely augmented the clinician's ability.

Kenny Lau (Sep 11 2025 at 18:05):

yeah, but AI cannot replace years of medical training

suhr (Sep 11 2025 at 18:06):

Unlike medicine, math relies more on understanding than on knowledge. The core of Lean is actually pretty small, so you don't need much knowledge to use it effectively.

Also, probably unlike medicine (I don't know a lot about this field), a complex problem in math can be often decomposed to simpler problems. This allows for the "rising sea" approach to problem solving: you prove a bunch of simple lemmas and then suddenly a bigger result is within your reach.

This is arguably the right approach to do formalization in Lean: when you encourage a difficulty, think about lemmas you can prove to make your problem easier.

But again, to build intuition you should start small. Maybe even as small as Natural Number Game, which does a great job at building some intuition on natural numbers.

Kenny Lau (Sep 11 2025 at 18:06):

i might trust a self-driving car, but i wouldn't trust a random person operating on me (surgery) using gpt instructions

Bo Qin (Sep 11 2025 at 18:21):

I wasn't sure why he even mentioned his cardiology profession until I read that his Lean code was mostly produced by an LLM. I'd trust someone who says "I'm a nobody and here is my paper and associated lean code"

Bo Qin (Sep 11 2025 at 18:21):

Jokes write themselves

Bo Qin (Sep 11 2025 at 18:25):

Trying to run before they even learned to walk. I've seen it too often

Paul C Lee (Sep 12 2025 at 22:15):

I think I have got it. It is just like in software engineering: we use producer and consumer architecture. I did not know that in CRM they call it witness and test object. So the book says I am allowed to use classical in witness. Thanks for all the help. Basically Axel Ljungström did a simiar tricks in his thesis. This is fun. Thanks.

Paul C Lee (Sep 25 2025 at 19:27):

Not that anyone cares: as I move along, I realize Lean adopts a mathematician mindset-> polish the proof, sprinkle [simp], and push toward sorry-free. As project gets large, sorry explodes, with runaway simp searches, and code regression. I was running into sorry explosions: with a few sorrys, it exploded to 60s or more all of a sudden. Liquid Tensor project also encountered similar problems (I discovered while I searched for solutions to my problem.)

I'd like to share a potential solution for my hobby project (if anyone's interested, since I use LLM and am persona non grata here): adopt a software engineering mindset by treating Lean like a software compiler. This involves relying on metrics, error budgets, guardrails, simple fences, staged activation, and CI monitoring. This approach prevents sorrys from exploding, and prevent codes regression. Hope that is helpful to someone.

Weiyi Wang (Sep 25 2025 at 21:35):

I am not sure what "sorry explode" means or why it is a bad thing. If an active WIP project has a lot of explicitly marked sorry, that means more opportunity for contribution, no?

Weiyi Wang (Sep 25 2025 at 21:37):

Of course, that requires good judgement to write correct statements in the first place

Niels Voss (Sep 26 2025 at 03:07):

Paul C Lee said:

I'd like to share a potential solution for my hobby project (if anyone's interested, since I use LLM and am persona non grata here): adopt a software engineering mindset by treating Lean like a software compiler. This involves relying on metrics, error budgets, guardrails, simple fences, staged activation, and CI monitoring. This approach prevents sorrys from exploding, and prevent codes regression. Hope that is helpful to someone.

I think mathlib already adopts a lot of this. In particular, there is benchmarking for regressions and compile time, CI, linters, etc. Simp lemmas are carefully controlled, and proofs that go into mathlib are written in a way that makes them less likely to break as the rest of the code changes (so things like non-terminal simps are discouraged).

The sorry explosion can easily arise from non-judicious use of LLMs, because they tend to make a lot of statements that are almost correct but are actually wrong, and if you try to get the LLM to fill those in there's nothing it can do except add more lemmas with sorries. Probably the best idea would be to be really careful about what definitions and theorem statements you have and that should avoid a lot of sorry explosion.

suhr (Sep 26 2025 at 04:25):

Paul C Lee said:

I'd like to share a potential solution for my hobby project (if anyone's interested, since I use LLM and am persona non grata here): adopt a software engineering mindset by treating Lean like a software compiler. This involves relying on metrics, error budgets, guardrails, simple fences, staged activation, and CI monitoring. This approach prevents sorrys from exploding, and prevent codes regression. Hope that is helpful to someone.

Not a bad idea. Keep in mind, that there's a difference between a development version and a release.

sorry is like todo!(): when you start doing something there going to be a lot of them, but in the release version there should be none. And the length of iteration depends on how good your intuition is: mathematicians are pretty good at stating true things so they can afford delaying filling sorryies for rather long time. Your intuition (and mine as well, since I'm not a professional mathematician) is less solid, so iterations should be shorter.

since I use LLM

Again, the problem is not usage of LLM per se. It's the matter of understanding. Though, there's one problem with LLMs: when you understand how to do something, you don't need them too much since you can easily do it yourself. But when you don't understand, you should not use them until you do understand.


Last updated: Dec 20 2025 at 21:32 UTC