Zulip Chat Archive

Stream: general

Topic: Spell check


Patrick Massot (Oct 10 2023 at 15:03):

Does anyone knows how to setup spell checking in VSCode so that it spells-checks only Lean comments? There is extra credit if this exclude things that are quoted in backticks in comments. I know about the cSpell extension and I know it claims to be able to do this, but I haven't been able to understand how.

Utensil Song (Oct 10 2023 at 16:14):

Spell Right looks promising, it can be configured to only spell check comment by something like "spellright.spellContext": "comments" or per language "spellright.spellContextByClass": { "lean4": "comments" }. It has no extra credit for code in comments, though.

It doesn't support Lean out-of-box, adding support is easy (although not well documented, the code supports it):

    "spellright.parserByClass": {
        "lean4": {
            "parser": "code",
            "comment_start": "/-",
            "comment_end": "-/",
            "comment_line": "--"
        }
    }

It confirms to work like this (green is expected, red is less than ideal):

image.png

Patrick Massot (Oct 10 2023 at 16:58):

This already looks very useful but I can't make it do anything. Could you add details?

Utensil Song (Oct 10 2023 at 17:32):

Sorry, I omitted some important details:

  1. the JSONs mentioned above are supposed to be added to user settings of VSCode.

  2. after the extension is installed, the right-bottom corner would have something looks like off, click it, choose the languages like English and French, click OK to confirm

  3. I tested only Mac, on which it uses Apple's spell check API, thus might be less hassle. It uses Windows spell check API on Windows and bundled spell check engine on Linux so they should work too.

Utensil Song (Oct 10 2023 at 17:33):

Not doing 2 is likely why it didn't work for you for any programming language

Utensil Song (Oct 10 2023 at 17:36):

1 is essential for it to work for Lean

Utensil Song (Oct 10 2023 at 17:37):

Better to test with Python first, with only 2 it should work, if not then it might be due to 3: OS

Patrick Massot (Oct 10 2023 at 17:42):

I did 1. I also tried 2. It allows only selecting French and I have no idea how to make English appear there. And selecting French doesn't do anything.

Utensil Song (Oct 10 2023 at 17:43):

Weird, I tested both English and French

Utensil Song (Oct 10 2023 at 17:44):

What's the OS?

Patrick Massot (Oct 10 2023 at 17:47):

Linux

Patrick Massot (Oct 10 2023 at 17:47):

ubuntu 22.04

Utensil Song (Oct 10 2023 at 17:48):

Seems to require extra download: https://github.com/bartosz-antosik/vscode-spellright#linux-and-windows-7

Patrick Massot (Oct 10 2023 at 18:01):

Ok, it required ln -s /usr/share/hunspell/* ~/.config/Code/Dictionaries.

Patrick Massot (Oct 10 2023 at 18:02):

Now it does something. But spelling mistakes are reported exactly as Lean errors, this is very distracting, especially since it flag all code in comments.

Damiano Testa (Oct 10 2023 at 18:05):

If you go to the settings of the extension, there is Spellright: Notification Class and you can choose Hint, which is minimally distracting (maybe too little!).

Patrick Massot (Oct 10 2023 at 18:20):

It is indeed much better. There is also a spellright.ignoreRegExpsByClass setting that looks promising to exclude quoted code, but I can't get it to work.

Utensil Song (Oct 10 2023 at 18:26):

It should work if provided with multiline regex

Utensil Song (Oct 10 2023 at 18:26):

I'll try it tomorrow.

Utensil Song (Oct 10 2023 at 18:37):

"spellright.ignoreRegExpsByClass": {
 "lean4": [ "/```[^]*?```/gm" ]
}

Utensil Song (Oct 10 2023 at 18:37):

may be something like this, can't test at the moment though

Damiano Testa (Oct 10 2023 at 18:47):

@Utensil Song, I tested very minimally your regex and it seems to work.

Patrick Massot (Oct 10 2023 at 19:28):

I can't get it to work.

Damiano Testa (Oct 10 2023 at 19:33):

In .vscode/settings.json I have

    "spellright.parserByClass": {
        "lean4": {
            "parser": "code",
            "comment_start": "/-",
            "comment_end": "-/",
            "comment_line": "--"
        }
    },
    "spellright.documentTypes": [
        "lean4",
        "markdown",
        "latex",
        "plaintext"
    ],
    "spellright.notificationClass": "hint",
    "spellright.ignoreRegExpsByClass": {
        "lean4": [ "/```[^]*?```/gm" ]
    }

and then the spell check automatically ignores text contained between (I guess odd-numbered) ``` and (even-numbered) ``` .

Utensil Song (Oct 11 2023 at 02:36):

image.png

Utensil Song (Oct 11 2023 at 02:37):

It seems to work, also for single `.

    "spellright.spellContextByClass": { "lean4": "comments" },
    "spellright.parserByClass": {
        "lean4": {
            "parser": "code",
            "comment_start": "/-",
            "comment_end": "-/",
            "comment_line": "--"
        }
    },
    "spellright.notificationClass": "hint",
    "spellright.ignoreRegExpsByClass": {
        "lean4": [ "/```[^]*?```/gm", "/`[^]*?`/g"]
    }

Patrick Massot (Oct 11 2023 at 02:38):

Does it work for double backtick? I know the question sounds crazy but this is what rst expects...

Utensil Song (Oct 11 2023 at 02:39):

It can be. Wait a sec...

Utensil Song (Oct 11 2023 at 02:52):

image.png

Utensil Song (Oct 11 2023 at 02:52):

"lean4": [ "/```[^]*?```/gm", "/``[^]*?``/g", "/`[^]*?`/g"]

can work with both md and rst intertwined . Still investigating rst code block...

Patrick Massot (Oct 11 2023 at 02:55):

Thanks a lot for your work on this!

Patrick Massot (Oct 11 2023 at 02:58):

I confirm I can now spell-check #mil!

Scott Morrison (Oct 11 2023 at 02:59):

Could this configuration be stored somewhere?

Scott Morrison (Oct 11 2023 at 02:59):

Perhaps even in Mathlib's .vscode directory?? Otherwise on the community webpage?

Utensil Song (Oct 11 2023 at 03:00):

    "spellright.ignoreRegExpsByClass": {
        "lean4": [
            "/```[^]*?```/gm",
            "/``[^]*?``/g",
            "/`[^]*?`/g",
            "/(?<indentation>[ \\t]*)(?<directiveline>\\.\\. code-block::[^\\n]*\\n)((?<codeline>\\k<indentation>+[ \\t]+[^\\n]*\\n)|(?<emptyline>\\k<indentation>*\\n))*/gm"
        ]
    }

works for rst code block, it's a bit complicated but straight forward if you take a closer look, I named the matches. Or you can past the following non-escape version into https://regex101.com/ to better understand it:

/(?<indentation>[ \t]*)(?<directiveline>\.\. code-block::[^\n]*\n)((?<codeline>\k<indentation>+[ \t]+[^\n]*\n)|(?<emptyline>\k<indentation>*\n))*/gm

Before:

image.png

After:

image.png

Utensil Song (Oct 11 2023 at 04:35):

OK, now it really works. I've updated the configs and screenshots above to avoid confusion.

Utensil Song (Oct 11 2023 at 05:59):

Scott Morrison said:

Could this configuration be stored somewhere?

The UX I'm imagining is VSCode Lean 4 could register a command "Spell check", which would prompt to install "Spell Right" then install it via workbench.extensions.installExtension, then override the configuration via contributes.configurationDefaults, so the installation and configuration would be pretty transparent except for requiring user consent.

But for now maybe a simple configuration guide would be sufficient.

Utensil Song (Oct 11 2023 at 06:35):

https://gist.github.com/utensil/9591618f9cf5a0026d3154d062f36c78

Marc Huisinga (Oct 11 2023 at 07:25):

Utensil Song said:

Scott Morrison said:

Could this configuration be stored somewhere?

The UX I'm imagining is VSCode Lean 4 could register a command "Spell check", which would prompt to install "Spell Right" then install it via workbench.extensions.installExtension, then override the configuration via contributes.configurationDefaults, so the installation and configuration would be pretty transparent except for requiring user consent.

But for now maybe a simple configuration guide would be sufficient.

Generally, I consider workspace recommended extensions to be a good tool for this that doesn't require baking anything into the extension, but it won't install the dictionaries for you. As for the configuration, you should be able to add your spellcheck configuration to the workspace-wide settings.json.

Utensil Song (Oct 11 2023 at 08:47):

I see, then all these can be workspace configurations, no need to bake them into the extension. I had the impression of some extension recommends some other extensions to me, but I don't recall the exact UX and can't find an API to support this directly. I also had the impression that some extension can bundle other extensions. But all these may be too much.

Floris van Doorn (Oct 11 2023 at 09:18):

I am strongly against putting extensions like this in the workspace recommended extensions if that means that all new users see a pop-up that prompts them to install this extension (which I think it does).

Utensil Song (Oct 11 2023 at 09:20):

I believe what was suggested is only a solution to do so project-wise, like only for Mathlib development, or only for books/tutorials authoring, and it should not affect general users of Lean or Mathlib.

Floris van Doorn (Oct 11 2023 at 09:22):

Fair, most new users don't clone mathlib for their first project. Still, I don't think it should be recommended for mathlib. For custom projects it's fine of course.

Eric Wieser (Oct 11 2023 at 10:58):

I'm pretty surprised that a better spell check extension isn't available; having to provide regexes to tell it where to spell check seems ridiculous when it could be looking at the syntax highlighting scopes

Utensil Song (Oct 11 2023 at 11:02):

Eric Wieser said:

I'm pretty surprised that a better spell check extension isn't available; having to provide regexes to tell it where to spell check seems ridiculous when it could be looking at the syntax highlighting scopes

Indeed, that's what I was looking for first, but after some search of VSCode issues, it seems that such syntax information (used in code highlight) is not available to extension developers. So I fall back to this seemingly extendable extension.

Eric Wieser (Oct 11 2023 at 11:03):

https://github.com/microsoft/vscode/issues/580 suggests this is now possibly fixed (by an external library)

Utensil Song (Oct 11 2023 at 11:17):

The mentioned library is in maintainace, it depends on Textmate grammar hat has certain features, and what superseded it is vscode-anycode which in turn relies on tree-sitter parsers. Lean doesn't have both parser configurations, Lean only has Lean itself as a parser and this JSON with many regrexes.

Eric Wieser (Oct 11 2023 at 11:20):

I believe that json file is a textmate grammar, though maybe it's a vscode dialect

Utensil Song (Oct 11 2023 at 11:20):

Even if we managed to feed parser configurations to them, then we need to PR a spellchecker to make use of these services to get syntax highlighting scopes. It looks to me a long shot. But I agree, this should be much simpler than how hard it is now.

Utensil Song (Oct 11 2023 at 11:22):

Sorry, Lean 4 has tree-sitter-lean (not actively maintained it seems), I even starred it but forgot.

Utensil Song (Oct 11 2023 at 11:27):

This reminds me of my failed attempt to use the Lean language server to make a Jupyter kernel. This shortcut seems natural and feasible, especially with jupyterlab-lsp, but it was a deep rabbit hole.

Utensil Song (Oct 11 2023 at 15:24):

image.png

Any single quote will break the regex, I guess this happens a lot for quote-heavy tactic code.

Patrick Massot (Oct 11 2023 at 15:35):

That's true, but you regex is already very useful for teaching material which is not about meta-programming.

Utensil Song (Oct 12 2023 at 08:39):

Utensil Song said:

image.png

Any single quote will break the regex, I guess this happens a lot for quote-heavy tactic code.

It annoyed me while reading meta-heavy code and I realized that it's an easy fix. Fixed in the latest config and added test cases:
image.png

Utensil Song (Oct 12 2023 at 08:47):

Incidentally, I just noticed this: https://github.com/streetsidesoftware/vscode-spell-checker/blob/main/FAQ.md#is-it-possible-to-only-spell-check-comments which the same approach (regex inclusion and exclusion) applies.


Last updated: Dec 20 2023 at 11:08 UTC