Zulip Chat Archive

Stream: PhysLean

Topic: Spelling in PhysLean


Joseph Tooby-Smith (Oct 23 2025 at 09:43):

I made a low-tech linter for spelling documentation in PhysLean.

I basically made a list of all the words that appear in PhysLean to date, got an AI to remove the ones which where incorrectly spelled or manifestly do not correspond to code etc., thereby creating a list of allowed words. Then checked all the documentation agains this list of allowed words. Naturally any new words would have to be added to the allowed words list, and it doesn't deal with code snippets in documentation in any smart way.

The corresponding PR is PhysLean#781. If anyone has better ways we could 'lint' for spelling I would be very glad to hear them.

Kenny Lau (Oct 23 2025 at 09:46):

interesting, i was thinking about doing something similar for mathlib (also with the help of GPT) but it ended up very slow so i ended up only correcting 300 files; i think you'll just have to do it every month or so; did you make the corrections manually or with LLM? i was also wondering if there were ways to speed up the process

Joseph Tooby-Smith (Oct 23 2025 at 09:52):

I ended up doing the actual corrections manually, it took about 2 hours to do so wasn't so bad. Though I had to keep pasting words in to MS word to get suggestions for the correct spellings which was a bit of a pain.

I think if one keeps on top of it, it should be much quicker to do it in the future. But yeah, I am hoping there is a much quicker way to go about this. I think the main problem is the number of specialized words in maths/physics, so that a normal dictionary would not cover it.

Kenny Lau (Oct 23 2025 at 09:52):

yeah I just fed entire files to GPT so that it has enough context

Kenny Lau (Oct 23 2025 at 09:53):

I suppose you could isolate the errors and then feed like -2/+2 lines to GPT and have it correct the error

Joseph Tooby-Smith (Oct 24 2025 at 05:55):

Yeah - and then maybe a yes or no type thing to allow someone to easily allow or disallow a change :).

Joseph Tooby-Smith (Oct 24 2025 at 05:57):

Based on @Damiano Testa suggestion I have segregated the meta code for the spelling PR away from the 100s of spelling corrections. The meta code is in PhysLean#783.


Last updated: Dec 20 2025 at 21:32 UTC