Zulip Chat Archive
Stream: new members
Topic: String.replace with empty pattern
Serena (Oct 26 2025 at 04:49):
Why does String.replace with the empty pattern just return the original string? I feel like it should instead insert the replacement between every character like in other languages.
Ex.
#eval "abc".replace "" "k" -- "abc"
Compare to JavaScript
"abc".replaceAll("","k") // "kakbkck"
Chris Bailey (Oct 26 2025 at 05:33):
This seems like reasonable behavior given the docstring: "replaces all occurrences of pattern with replacement". If the pattern is empty, it seems reasonable to say it doesn't occur. At least more reasonable than the javascript behavior.
Serena (Oct 26 2025 at 05:34):
If the pattern is the empty string, then it occurs between every char
Serena (Oct 26 2025 at 05:34):
This also matches regex behavior
Kenny Lau (Oct 26 2025 at 08:04):
Serena said:
If the pattern is the empty string, then it occurs between every char
but it also occurs twice between every two chars...
Eric Wieser (Oct 26 2025 at 10:08):
I think the Python behavior aligns with JavaScript here
Eric Wieser (Oct 26 2025 at 10:09):
I recommend opening an RFC and listing the behaviors in ~5 other languages
Markus Himmel (Oct 26 2025 at 10:13):
No need to go through the trouble of writing an RFC, I'm about to redefine String.replace anyway and I will make sure that it matches the JavaScript behavior.
Kevin Buzzard (Oct 26 2025 at 18:20):
What is "".replaceAll("","k") supposed to give? Is this consistent across languages?
Aaron Liu (Oct 26 2025 at 18:45):
js says "k", python says "k", Lean says "", I don't know how to do this operation in any other languages
Kenny Lau (Oct 27 2025 at 00:42):
With ChatGPT's help, I've manually run the code in the languages it gave me:
- Python: kakbkck, k
- Java: kakbkck, k
- C#: error (String cannot be of zero length)
- PHP: abc, ""
- Ruby: kakbkck, k
- Kotlin: kakbkck, k
- Dart: kakbkck, k
- R: kakbkck, k
Niels Voss (Oct 27 2025 at 08:13):
It is worth noting that the languages behavior of different languages differs when used on non-ASCII characters.
When I ran something like "ℕ".replace("", "z")and "𝔸".replace("", "z")and "v̂".replace("", "z") on different languages I got:
- Python 3: "zℕz" and "z𝔸z" and "zvẑz"
- Java and Kotlin: "zℕz" and "z?z?z" and "zvẑz"
- Ruby: "zℕz" and "z𝔸z" and "zvẑz"
- Dart: "zℕz" and "z�z�z" and "zvẑz"
- R: "zℕz" and "z𝔸z" and "zvẑz"
- C++: "z�z�z�z" and "z�z�z�z�z" and "zvz�z�z"
(Note: ℕ is in the BMP, 𝔸 is not in the BMP, and v̂ is actually two characters but one grapheme cluster)
Markus Himmel (Oct 27 2025 at 08:22):
To add a language whose strings work with grapheme clusters:
- Swift: "zℕz" and "z𝔸z" and "zv̂z"
Since Lean's strings are based on Unicode scalar values, the desired behavior in Lean is "zℕz" and "z𝔸z" and "zvẑz".
Kenny Lau (Oct 27 2025 at 09:45):
relevant: https://lukeplant.me.uk/blog/posts/breaking-provably-correct-leftpad/
Niels Voss (Oct 27 2025 at 21:32):
I see. I don't quite understand why Lean strings have an api based on unicode scalar values, but given that they do, this behavior makes sense.
Edit: Nevermind, the data model of string apparently changed last month to be what I would expect it to be.
Markus Himmel (Oct 28 2025 at 06:30):
Sorry for being unclear. What I meant was that the conceptual model of strings (basically: the answer to "a string is a sequence of what exactly?") that Lean chooses is that a string is a sequence of Unicode scalar values, just like Python and Rust. How to actually represent those Strings at runtime is a somewhat orthogonal question; here Lean chooses UTF-8.
Last updated: Dec 20 2025 at 21:32 UTC