Zulip Chat Archive

Stream: Zulip meta

Topic: Encoding of uploaded text files


Joscha Mennicken (Jul 04 2025 at 07:57):

The text file from #Machine Learning for Theorem Proving > MCP Tools for LLMs and Agentic Mathematics @ 💬 is a normal utf-8 encoded text file with some higher Unicode characters. The server only serves it with content-type: text/plain though, which means that browsers (which usually display the text instead of downloading it) tend to guess the wrong encoding. Is there some way to fix this? (In this concrete case, calling the file something.lean would make the browser download it.)

The only related issue I found was https://github.com/zulip/zulip/issues/22682, but when uploading such a text file, my browser doesn't send any encoding info in the headers in the first place. Forcing the browser to download the file using content-disposition would also fix this issue, though it'll be annoying for other kinds of files (images, videos, audio, ...).

Eric Wieser (Jul 04 2025 at 09:45):

To save someone a click, the file is https://leanprover.zulipchat.com/user_uploads/3121/Y8FcCTr653dlldtFhGJQe1K3/pabc_claude.lean.txt

Eric Wieser (Jul 04 2025 at 09:49):

If the server sent a content-type: text/plain; charset=utf-8 header then this would work correctly

Eric Wieser (Jul 04 2025 at 09:50):

It's up to debate whether that's always the right thing to do for .txt files, but certainly it is the right thing to do for .lean files

Eric Wieser (Jul 04 2025 at 09:50):

Most static server configurations have a way to set the content-type by extension

Alex Vandiver (Jul 17 2025 at 15:36):

We don't want to assume utf-8 for all text/plain files -- as common as it is, unilaterally adding that will break the rendering of files which are not in UTF-8.

We trust the content-type of the browser when they upload a file. Unfortunately, those content-types don't come with charsets.

It's a little surprising to me that the default for Chrome (and Safari and Firefox) for content-type: text/plain with no explicit charset is iso-8859-1, and not utf-8.

Alex Vandiver (Jul 17 2025 at 15:37):

This came up recently on chat.zulip.org -- you can follow the further discussion there.

Alex Vandiver (Jul 25 2025 at 20:18):

We've merged #35334 for this; it's not yet deployed to Zulip Cloud yet, but probably will be in the next week or so.

Joscha Mennicken (Jul 25 2025 at 20:26):

Link to the correct repo: https://github.com/zulip/zulip/pull/35334


Last updated: Dec 20 2025 at 21:32 UTC