Zulip Chat Archive

Stream: lean4

Topic: stripping comments


Damiano Testa (Mar 12 2025 at 09:00):

I would like a function that "reads" a string and removes all comments from it, optionally with an extra bit of processing trailing/leading whitespaces. Does it exist?

Here is a sample of what I would like it to do.

def stripComments (s : String) : String := s

/-  I would like `stripComments to have the following input-output pairs: -/
#guard stripComments  "text /- I am a comment -/ more text" ==
                      "text more text"
#guard stripComments  "text /- I am a comment -/   more text" ==
                      "text more text"
#guard stripComments  "text -- /- I am a comment -/   more text" ==
                      "text " -- bonus if it removes the space after `text`
#guard stripComments  "text /- comment /- nested -/-/" ==
                      "text" -- but ok if it ignores nesting
#guard stripComments  "text /-- doc-string -/" ==
                      "text /-- doc-string -/"

Damiano Testa (Mar 13 2025 at 13:17):

In case anyone else finds this useful, here is an implementation of the above. It is not nesting aware and "trims right", instead of left, but seems to work well on mathlib.

Damiano Testa (Mar 13 2025 at 13:17):

/--
Returns the pair consisting of
* longest initial segment of `s` that does not contain `pattern` as a substring;
* the rest of the string `s`.

In particular, concatenating the two factors yields `s`.
-/
partial
def findString (s pattern : String) : String × String :=
  if pattern.isEmpty then (s, pattern) else
  if s.length < pattern.length then (s, pattern) else
  let candidatePos := s.find ("".push · == pattern.take 1)
  let notContains := {s.toSubstring with stopPos := candidatePos}.toString
  let rest := {s.toSubstring with startPos := candidatePos}.toString
  if rest.startsWith pattern then
    (notContains, rest)
  else
    let (init, tail) := findString (rest.drop 1) pattern
    (notContains ++ (pattern.take 1) ++ init, tail)

/--
`trimComments s` eliminates comments from `s`, disregarding nesting.

If `compressDocs` is `true`, then it also compresses doc-strings that might be present in `s`,
by collapsing consecutive sequences of at least one space into a single space.
-/
partial
def trimComments (s : String) (compressDocs : Bool) : String :=
  if s.length  1 then s else
  let (beforeFirstDash, rest) := findString s "-"
  if rest.length  1 then s else
  match beforeFirstDash.takeRight 1, (rest.take 2).drop 1 with
  | "/", "-" => -- this is a doc-string
    let (takeDocs, rest) := findString (rest.drop 2) "-/"
    let finalDocs :=
      -- Replace each consecutive group of at least one space in `takeDocs` with a single space.
      -- The begin/end `|`-markers take care of preserving initial and terminal spaces, if there
      -- are any.  We remove them in the next step.
      if compressDocs then
        let intermediate := ("|" ++ takeDocs ++ "|").splitOn " " |>.filter (!·.isEmpty)
        " ".intercalate intermediate |>.drop 1 |>.dropRight 1
      else
        takeDocs
    beforeFirstDash ++ "--" ++ finalDocs ++ trimComments rest compressDocs
  | "/", _ => -- this is a multiline comment
    let (_comment, rest) := findString (rest.drop 2) "-/"
    --let rest := if rest.startsWith "-/" then rest.drop 2 else rest
    (beforeFirstDash.dropRight 1).trimRight ++ trimComments (rest.drop 2) compressDocs
  | _, "-" => -- this is a single line comment
    let dropComment := rest.dropWhile (· != '\n')
    beforeFirstDash.trimRight ++ trimComments dropComment compressDocs
  | _, _ => beforeFirstDash ++ "-" ++ trimComments (rest.drop 1) compressDocs

/-  I would like `trimComments to have the following input-output pairs: -/
#guard trimComments  "text /- I am a comment -/ more text" false ==
                      "text more text"
#guard trimComments  "text /- I am a comment -/   more text" false ==
                      "text   more text"
#guard trimComments  "text -- /- I am a comment -/   more text" false ==
                      "text" -- bonus if it removes the space after `text`
#guard trimComments  "text /- comment /- nested -/-/" false ==
                      "text-/" -- it ignores nesting
#guard trimComments  "text /-- doc-string -/" false ==
                      "text /-- doc-string -/"

Eric Wieser (Mar 13 2025 at 13:37):

What's the motivation here?

Damiano Testa (Mar 13 2025 at 13:37):

The linter for whitespace trimming.

Damiano Testa (Mar 13 2025 at 13:38):

The pretty-printer erases comments, but the linter compares the surface syntax to the pretty-printed one.

Damiano Testa (Mar 13 2025 at 13:38):

So, in order to get some comparison, I thought of pre-processing the user syntax by removing the comments.

Damiano Testa (Mar 13 2025 at 13:39):

It then turns out that the pretty-printer also does weird reformatting of the doc-strings, but that was easy to "fix" once I had the code above.

Damiano Testa (Mar 13 2025 at 13:39):

As a consequence, the linter is now seeing a lot of extra violations that before were hidden by declarations that had a doc-string/comment and the linter ignored.

Jz Pan (Mar 13 2025 at 15:40):

Do you consider using docs#Std.Internal.Parsec.String.Parser ?

Damiano Testa (Mar 13 2025 at 16:04):

I am not really sure how to do that...


Last updated: May 02 2025 at 03:31 UTC