Zulip Chat Archive
Stream: lean4
Topic: pre-rfc: zig style multi-line string literals
JJ (Aug 08 2025 at 04:22):
The Zig programming language has an interesting design with regards to its multi-line string literals. Instead of the usual pair of triple-quotes """, the \\ operator (c.f. both twice a string escape backslash and the usual C-style // comment syntax) is used.
const hello_world_in_c =
\\#include <stdio.h>
\\
\\int main(int argc, char **argv) {
\\ printf("hello world\n");
\\ return 0;
\\}
;
This \\ operates similar to a comment -- it takes in as input any characters until the end of the line. It then merges with subsequent invocations of \\, so long as they immediately follow on the next line with only whitespace between (same as Lean's rules for gaps). These are then treated as one string literal: where the new \\s are treated as new newlines. The string literal is ended upon a line that does not start with \\ as the first post-whitespace token.
Zig is currently the only language that does this (because they came up with it). I really like this syntax! It's not very well known and I wish it was. I'm of the opinion it's a much better approach than the usual style. In particular, there are some distinct advantages:
- Newlines are overt. Does the final
"of a multi-line string literal ending on its own line add a\nor not? What about the starting"? What about preceding or trailing whitespace on either of those? Many languages have many different behaviors with the de facto multiline string syntax. Lean's behavior here is very reasonable, but not evident without needing to read the manual. - String blocks appear at the same indentation level of the rest of the code (really, an arbitrary indentation). This helps with readability. It also helps with clarity: I know of some languages that instead implicitly trim leading whitespace within their multi-line strings to support this (which has made me wary of the syntax ever since). Zig appears to use this especially wrt. inline assembly and C, but it's more generally useful.
- This syntax doubles as a raw string literal syntax in an elegant fashion. (See https://github.com/ziglang/zig/issues/8540 for examples.) The
\\character takes in input until a newline. It doesn't matter what follows, so long as it is not a newline -- escapes can be treated literally, a literal\\can be recognized, any number of literal\\s can be recognized.
Would the Lean community be interested in such a feature? Is the double \\ available? I think this is clearer than multiline "" and better than the existing r"", though it could exist alongside or replace either or both. If so, I can write this up into a proper RFC and stick it somewhere (and maybe take a stab at implementing it to learn some about the compiler internals. i don't believe it's doable purely in terms of macros? you would have to recognize this regex pattern: \\\\[^\n]*(\n *\\\\[^\n]*)*)
I'm not a Zig aficionado (just watch it from afar) but my impression is they're all broadly happy with this syntax. (The language creator certainly is.) Since Lean is in a stage where it's not afraid to make breaking changes, I thought I'd pitch this to see what people think.
Kim Morrison (Aug 08 2025 at 05:11):
We already have this, just using \ at the end of a line.
JJ (Aug 08 2025 at 05:49):
It's very different!
The Zig \\ syntax eats whitespace following a newline up until the next \\ character. The Lean string gaps syntax eats whitespace up until the next non-whitespace character. This means that the gaps syntax cannot support code blocks well (since the leading whitespace must be trimmed), and cannot support use as a raw string literal.
The newlines preceding the whitespace-prefixed \\ are also kept in the Zig string syntax, unlike with \.
JJ (Aug 08 2025 at 05:53):
Here's an example that showcases how Zig's \\ differs from Lean's " + \.
def str1 := "String with \
a gap"
def str2 := "String with a gap"
example : str1 = str2 := rfl
def str3 := \\String with
\\ a gap
def str4 := "String with\n a gap"
example : str3 = str4 := rfl
JJ (Aug 08 2025 at 06:17):
Here is another example showing the utility of a \\ syntax.
def prefix_strings := \\ match lctx with
\\ | { fvarIdToDecl := map, decls := decls } =>
\\ match lctx.find? fvarId with
\\ | none => "oops! string"
\\ | some decl =>
\\ let decl := f decl
\\ { fvarIdToDecl := map.insert decl.fvarId decl
\\ decls := decls.set decl.index decl }
def normal_strings := " match lctx with
| { fvarIdToDecl := map, decls := decls } =>
match lctx.find? fvarId with
| none => \"oops! string\"
| some decl =>
let decl := f decl
{ fvarIdToDecl := map.insert decl.fvarId decl
decls := decls.set decl.index decl }"
def gap_strings := " match lctx with\
\n | { fvarIdToDecl := map, decls := decls } =>\
\n match lctx.find? fvarId with\
\n | none => \"oops! string\"\
\n | some decl =>\
\n let decl := f decl\
\n { fvarIdToDecl := map.insert decl.fvarId decl\
\n decls := decls.set decl.index decl }\
"
def raw_strings := r#" match lctx with
| { fvarIdToDecl := map, decls := decls } =>
match lctx.find? fvarId with
| none => "oops! string"
| some decl =>
let decl := f decl
{ fvarIdToDecl := map.insert decl.fvarId decl
decls := decls.set decl.index decl }"#
example : prefix_strings = normal_strings := rfl
example : normal_strings = gap_strings := rfl
example : normal_strings = raw_strings := rfl
The \\ style is more amenable to large strings of verbatium text containing newlines. With \, you lose newlines. Without \, you give up having a consistent indentation level. In either case, you will have to fix escapes, and if you use r"" you both consign yourself to giving up \ and worrying about nested ".
JJ (Aug 08 2025 at 06:20):
(Actually, maybe I should have opened this with discussing r#""#. I didn't realize just how big an improvement this would be over it for multi-line strings until now.)
Michael Rothgang (Aug 08 2025 at 08:00):
To get str4 in Lean, I just write
def str1 := "String with \n\
a gap"
It works very well.
JJ (Aug 08 2025 at 08:07):
Right. That example was for explaining the difference between Zig \\ and Lean \, not for motivating it. See the next set of examples for motivation.
JJ (Aug 08 2025 at 08:10):
(by the way, if not already clear -- this syntax would complement "" and \, could-possibly-replace r#""#)
Kyle Miller (Aug 08 2025 at 14:23):
gap_strings in was one of the intended ways of getting multiline strings (it appears not to have been included in the RFC though, lean4#2838)
Something neat about Lean is that the different string syntaxes give you the tools you need to add your own multiline strings on top of it. In tests/lean/run/string_gaps.lean there's an example with user-defined "d!" strings:
#eval d!"this is \
line 1
| line 2, indented
|line 3"
-- `"this is line 1\n line 2, indented\nline 3"`
This transformation is done at compile time too.
It works with raw strings, though it needs a space between d! and r because d!r is a valid identifier:
#eval d! r#"this is line 1
| line 2, indented
|line 3"#
Niels Voss (Aug 10 2025 at 07:12):
For the zig style strings, if you had a function f : String -> String -> Nat, how would you call f with two multiline strings?
Niels Voss (Aug 10 2025 at 07:14):
It also seems less natural to wrap the strings in parentheses, since you'd need the closing parenthesis on its own line. Although maybe this isn't that big of a deal.
Niels Voss (Aug 10 2025 at 07:27):
In any case, I think the largest problem with trying to use zig strings in Lean is the lack of explicit delimiters, not the indentation markers (kotlin uses indentation markers plus a .trim function, for example) because that causes parsing ambiguity and makes things like dot notation more tedious
JJ (Aug 16 2025 at 03:33):
Niels Voss said:
For the zig style strings, if you had a function
f : String -> String -> Nat, how would you callfwith two multiline strings?
-- With Zig-style multiline strings:
def foo := f \\ This is my {first} string.
\\ It contains multiple lines.
\\ This is my "{second}" \\ string. It contains one line.
-- As stands, using d! strings and r#...# strings to avoid escapes and capture newlines:
def bar := f d! r#"This is my {first} string.
It contains multiple lines."#
r#"This is my "{second}" \\ string. It contains one line."#
In any case, I think the largest problem with trying to use zig strings in Lean is the lack of explicit delimiters, not the indentation markers (kotlin uses indentation markers plus a .trim function, for example) because that causes parsing ambiguity and makes things like dot notation more tedious
Yeah. I have not yet learned Lean's metaprogramming, so I do not know how feasible this is for the parser.
Last updated: Dec 20 2025 at 21:32 UTC