Zulip Chat Archive

Stream: lean4

Topic: deprecated String.toAsciiByteArray


David Stainton 🦡 (Sep 03 2024 at 05:37):

since String.toAsciiByteArray is now deprecated in stable, what should i use instead?

Kim Morrison (Sep 03 2024 at 06:18):

Can you just copy the declaration into your project? What is the use case?

Kim Morrison (Sep 03 2024 at 06:19):

I can also just move this to Batteries quickly, and undeprecate it there, if that would be useful.

David Stainton 🦡 (Sep 03 2024 at 06:23):

I use it for some of my test cases, as a convenience to produce ByteArray input for testing with. I could easily do something else... but having this live on in Batteries sounds good.

David Stainton 🦡 (Sep 03 2024 at 06:23):

I also wouldn't mind just putting the declaration in my test code

Henrik Böving (Sep 03 2024 at 06:32):

David Stainton 🦡 said:

since String.toAsciiByteArray is now deprecated in stable, what should i use instead?

Are you sure you need the ASCII truncating behavior?

David Stainton 🦡 (Sep 03 2024 at 06:33):

I do not need the ASCII truncating behavior.. I'm just incidentally using that function... I think I can just replace it with literal ByteArrays.

Henrik Böving (Sep 03 2024 at 06:35):

If you don't need that you can also just use String.toUTF8

David Stainton 🦡 (Sep 03 2024 at 06:35):

perfect. thanks!

Kim Morrison (Sep 03 2024 at 06:38):

batteries#943 has a note now that will hopefully avoid anyone else using this when they should be using String.toUTF8!

Henrik Böving (Sep 03 2024 at 06:39):

Can someone tell me a proper use case for the ASCII variant? Either you'll get the toUTF8 behavior or you'll get some unexpected output because of the modulo, why would people want to use this?

Kim Morrison (Sep 03 2024 at 06:40):

If I remember correctly, it was claimed that this function could be faster if you know it's all ASCII.

Henrik Böving (Sep 03 2024 at 06:43):

Sounds unlikely that any lean code is going to out pace a probably SIMD optimized mem copy that doesnt care about the contents at all: https://github.com/leanprover/lean4/blob/4a2458b51d4c5cf7b1a72b10d01907f6ca5cc0a2/src/runtime/object.cpp#L1685

Henrik Böving (Sep 03 2024 at 06:46):

On top of that it doesnt even Pre allocate with the correct capacity if I'm seeing that correctly? So for large strings it will even realloc the byte array in flight. If someone was to measure this function would most likely always be notabky slower than toUTF8

Henrik Böving (Sep 03 2024 at 06:47):

@Kim Morrison I'd say we should just deprecate it? It seems to have no performance benefit (and even a penalty) to me.

Eric Wieser (Sep 03 2024 at 08:43):

I think this isn't about performance, but about having something like Python's str.encode('ascii', errors='replace'), which preserves the invariant of one byte per character.


Last updated: May 02 2025 at 03:31 UTC