Zulip Chat Archive
Stream: lean4
Topic: will verso support internationalization?
Bulhwi Cha (Jan 19 2024 at 05:51):
Sphinx uses gettext to generate PO files that can be translated into languages other than English. @David Thrane Christiansen I wonder if Verso will also use localization tools like gettext or Fluent to support internationalization.
Related topic: #lean4 > Internationalization
David Thrane Christiansen (Jan 19 2024 at 06:57):
That's not on the current development roadmap, but it would be a valuable thing to have. Ideally it's something that can be a library, rather than a built in feature - that way I'm not a single blocker for progress. Are you interested in building this? If so, I'd be happy to consult on how it can be done, and whether core implementation changes are needed to make it a good experience.
David Thrane Christiansen (Jan 19 2024 at 07:01):
Have you used Sphinx this way before? What works well and what are the pain points? I'd like to learn from others rather than just hack hack hack
Bulhwi Cha (Jan 19 2024 at 10:33):
David Thrane Christiansen said:
Have you used Sphinx this way before? What works well and what are the pain points? I'd like to learn from others rather than just hack hack hack
I've only used mdBook so far. There's a PR implementing the localization option of mdBook. I found it convenient, but it's never been merged for years. So I'm thinking of using Sphinx to write notes about Mathlib this year.
Bulhwi Cha (Jan 19 2024 at 10:33):
I'm using a translation memory application called OmegaT. It has various features including fuzzy matching and user glossaries. Since there's a Markdown filter plugin for OmegaT, it can open Markdown source files. It can also open PO files.
I'll use OmegaT every time I want to translate English documents into Korean. It helps me translate technical terms consistently and refer to my previous translations easily.
omegat.png (See https://git.sr.ht/~chabulhwi/lean-notes.)
Bulhwi Cha (Jan 19 2024 at 10:40):
David Thrane Christiansen said:
That's not on the current development roadmap, but it would be a valuable thing to have. Ideally it's something that can be a library, rather than a built in feature - that way I'm not a single blocker for progress. Are you interested in building this?
I wish I could develop the library, but I don't know how to program yet. I'm currently learning C programming. Lean will be the second programming language I'll have learned.
David Thrane Christiansen (Jan 22 2024 at 08:56):
Thanks for the additional information and context. I took a look at the docs, and I had a bit of a hard time understanding them with the time I had available, so I hope you can check my understanding and let me know if it's wrong! The way I understand it, supporting these tools with Verso could look something like this:
- While building the document, .pofiles are created with entries for each (sentence? paragraph?). These would be some kind of easily-reparsed markup (perhaps something markdownish or HTMLish).
- The translator would then use OmegaT or Transifex or some other tool to add the translations to the .pofile
- When the document is built again, rather than creating a .po file, it should _synchronize_ it, making sure that any new translatable elements are added but not deleting old ones (as they may have translations in them). Additionally, it generates outputs for each localization based on the contents of the .pofiles post-synchronization.
- As the document is maintained, the collection of strings needing translation in the .pogrows, because added or modified units show up as new ones. Presumably there's tooling to check for coverage and to flag obsolete units to translators.
Is that about right? Once consequence of this workflow is that translations don't know about language extensions, so they're provided with the output of them. For instance, if I defined a Verso macro that inserted the current date, the .po would have the date rather than the call to the macro. This seems needed to me, as a language extension won't be able to e.g. insert gendered second-person pronouns, perform case declension, or adopt capitalization rules. But I'm a total beginner here - my localization experience is limited to some contributions to the Danish translation of DrRacket - so your feedback is very welcome.
Bulhwi Cha (Jan 23 2024 at 12:42):
David Thrane Christiansen said:
- While building the document,
.pofiles are created with entries for each (sentence? paragraph?). These would be some kind of easily-reparsed markup (perhaps something markdownish or HTMLish).- The translator would then use OmegaT or Transifex or some other tool to add the translations to the
.pofile- When the document is built again, rather than creating a .po file, it should _synchronize_ it, making sure that any new translatable elements are added but not deleting old ones (as they may have translations in them). Additionally, it generates outputs for each localization based on the contents of the
.pofiles post-synchronization.- As the document is maintained, the collection of strings needing translation in the
.pogrows, because added or modified units show up as new ones. Presumably there's tooling to check for coverage and to flag obsolete units to translators.Is that about right?
I think so, although I'm also a beginner in localization.
One consequence of this workflow is that translations don't know about language extensions, so they're provided with the output of them. For instance, if I defined a Verso macro that inserted the current date, the .po would have the date rather than the call to the macro. This seems needed to me, as a language extension won't be able to e.g. insert gendered second-person pronouns, perform case declension, or adopt capitalization rules.
I agree that .po files should have the outputs of Verso macros. I'd want to translate "January 23, 2024" to "2024년 1월 23일" while using OmegaT to translate .po files.
David Thrane Christiansen (Jan 29 2024 at 12:59):
Thanks! This is still not on the roadmap for me to do, but it's good to have insights into what would be required.
Last updated: May 02 2025 at 03:31 UTC
