Zulip Chat Archive

Stream: lean4

Topic: Internationalization


Patrick Massot (Nov 16 2023 at 19:13):

Has anyone started to think about internationalization of Lean programs? The minimal version would be defining a new kind of interpolated string, a meta-program creating po files and a way to set language used in the current file. The motivation is that I'm trying to port Lean verbose tactics but I would like error messages and suggestions to be translated. In Lean 3 I had two completely different Lean project for French and English but that led to the English version being a lot less usable than the original one.

Patrick Massot (Nov 16 2023 at 19:13):

It would be great if someone who cares about Lean as a general purpose programming language could be tempted to work on that.

Mario Carneiro (Nov 16 2023 at 19:20):

I would be interested to know what kind of localization tools are used these days. I have been looking into https://projectfluent.org/, this is what firefox and rustc use, I know less about gettext

Mario Carneiro (Nov 16 2023 at 19:21):

Internationalizing lean 4 error messages sounds like a pretty big task

Mario Carneiro (Nov 16 2023 at 19:22):

A framework for writing internationalized programs sounds easier but I don't think it would be that helpful for something like lean-verbose since that's all about tactics and elaboration

Mario Carneiro (Nov 16 2023 at 19:23):

Internationalizing the input syntax of lean 4 is comparatively easy, I think you had a prototype of that in lean 4 at ICERM

Mario Carneiro (Nov 16 2023 at 19:24):

Presumably the current work on #lang would make it even easier since then you could change the keywords altogether instead of being merely additive

Kyle Miller (Nov 16 2023 at 19:25):

#lang français

Patrick Massot (Nov 16 2023 at 19:26):

I really mean I have a bunch of tactic which use throwError and I want to translate those error messages, not all messages coming from Lean (which would be great but is a much bigger task).

Patrick Massot (Nov 16 2023 at 19:27):

And I know only gettext but I have nothing against fancier modern solution especially if they are implemented by someone else.

Mario Carneiro (Nov 16 2023 at 19:28):

If you are okay with conditional compilation you could just have a bunch of string constants and just swap out the file defining them

Patrick Massot (Nov 16 2023 at 19:29):

Sure, I could use such a very low-tech version, but that's pretty painful compared to using gettext.

Mario Carneiro (Nov 16 2023 at 19:29):

is it?

Mario Carneiro (Nov 16 2023 at 19:29):

What's the interface you are after anyway?

Mario Carneiro (Nov 16 2023 at 19:30):

It doesn't have to be string constants either, it can be arbitrary lean functions

Patrick Massot (Nov 16 2023 at 19:31):

Yes it is. gettext will automatically collect all strings that needs translations. When you look at the code you see almost nothing special. You see _("This is an error message") instead of "This is an error message". You can write as many such translatable strings as you want without breaking the programming flow.

Mario Carneiro (Nov 16 2023 at 19:32):

What if you want to interpolate into the string?

Patrick Massot (Nov 16 2023 at 19:32):

Same.

Patrick Massot (Nov 16 2023 at 19:32):

Interpolated strings are of course a crucial part of the story.

Patrick Massot (Nov 16 2023 at 19:33):

You write _("This is an error message about {stuff}") instead of "This is an error message about {stuff}"

Mario Carneiro (Nov 16 2023 at 19:35):

Okay, so something like:

...
  throwError tr% "This is an error message about {e}"

translations/fr.json:

{"This is an error message about {e}": "Ceci est un message d'erreur pour {e}"}

Patrick Massot (Nov 16 2023 at 19:35):

Yes. And fr.json is automatically generated.

Mario Carneiro (Nov 16 2023 at 19:35):

How?

Mario Carneiro (Nov 16 2023 at 19:36):

What's the input?

Mario Carneiro (Nov 16 2023 at 19:36):

That's what I'm asking

Patrick Massot (Nov 16 2023 at 19:36):

Some meta-program crawls your source file and looks for tr%.

Patrick Massot (Nov 16 2023 at 19:36):

The input is your lean files.

Mario Carneiro (Nov 16 2023 at 19:36):

Where are the translations coming from?

Mario Carneiro (Nov 16 2023 at 19:37):

I can see how you can validate this file but not how or where you want to put the translations

Patrick Massot (Nov 16 2023 at 19:37):

The meta-program generates a json file where all translations are missing originally.

Patrick Massot (Nov 16 2023 at 19:37):

Then human beings put in translation. And next time the meta-program is ran (because the software was updated) it doesn't touch things that were already translated of course.

Patrick Massot (Nov 16 2023 at 19:38):

Note that gettext does not use json for this (probably because it long predates json) but its own format po.

Kyle Miller (Nov 16 2023 at 19:45):

I wonder if this behavior could be implemented using an environment extension, where when a tr!"..." string is elaborated it adds this string to the extension (maybe it tags it with the identifier of the current definition too, just in case that is useful), and then the metaprogram that generates the po-analogue just imports the main Lean file and looks at the contents of the environment extension?

Patrick Massot (Nov 16 2023 at 19:46):

Yes, I was thinking about using environment extensions.

Mario Carneiro (Nov 16 2023 at 19:48):

how are the strings keyed? Just by the english interpolation like I showed?

Patrick Massot (Nov 16 2023 at 19:49):

Yes.

Mario Carneiro (Nov 16 2023 at 19:49):

are they ordered in any particular way?

Kyle Miller (Nov 16 2023 at 19:50):

One question is how translation substitutions work. If it's a compile-time transformation, then that's clearer, but if you want it runtime, I suppose you can validate that the substitutions refer to the exact same expressions between the {...}s, up to some permutation.

Patrick Massot (Nov 16 2023 at 19:50):

I don't think there is any particular order. Again the wikipedia page shows examples.

Mario Carneiro (Nov 16 2023 at 19:50):

I was thinking to do it as a compile time transformation

Patrick Massot (Nov 16 2023 at 19:50):

I don't say that we have to use the exact same format, bit those files have tooling such as https://poedit.net/

Mario Carneiro (Nov 16 2023 at 19:50):

you would just look up the interpolation at elaboration time and return the results of interpolating into the translated string

Mario Carneiro (Nov 16 2023 at 19:51):

One reason not to use po is that you would not be able to use lean style interpolated strings with them

Mario Carneiro (Nov 16 2023 at 19:51):

I suppose you could translate the format?

Kyle Miller (Nov 16 2023 at 19:52):

Maybe you could use %1, %2, etc. in place of the {...}s sequentially

Kyle Miller (Nov 16 2023 at 19:53):

And then at the site of the tr! string you use that to permute the interpolated expressions

Mario Carneiro (Nov 16 2023 at 19:55):

I wonder whether we need a typeclass to unify s! ,f!, m! because otherwise this causes issues here

Joachim Breitner (Nov 16 2023 at 19:56):

For general programming, I'd expect that language can be changed at runtime. For that it seems that tr! probably ought to be monadic in a suitable monad for getting the current language, and the translations database.

Mario Carneiro (Nov 16 2023 at 19:57):

That could be a separate mode/macro, I would expect it to be more annoying to use that way

Joachim Breitner (Nov 16 2023 at 19:58):

Yup, but hard to avoid in a pure language, I fear.

Patrick Massot (Nov 16 2023 at 20:05):

Many software require to restart to apply language preferences. My use case is a bit special, I want to translate messages coming from tactics.

Patrick Massot (Nov 16 2023 at 20:06):

Ideally I want to have simply import MyTactics.French and get messages in French.

Joachim Breitner (Nov 16 2023 at 20:23):

Restarting is fine, but (for general programming) not recompilation, I’d say. But if that's not the use-case needed right now, don’t let me distract you at this point :-)

Patrick Massot (Nov 16 2023 at 20:24):

In the long run it would be really nice to have a robust internationalization framework that would accommodate many use cases. But indeed in the short run I'd happy to have something that is good enough for my use case.

Patrick Massot (Nov 16 2023 at 20:25):

And my use case is very important for teaching outside the US, so hopefully this is something the FRO also finds interesting.

Mario Carneiro (Nov 16 2023 at 20:47):

I'm not sure this is something the FRO itself needs to do, this can be an external project well enough

Patrick Massot (Nov 16 2023 at 20:47):

Anyone should feel free to try, I was only trying to encourage Joachim.

Joachim Breitner (Nov 16 2023 at 21:39):

Ah, I didn’t even notice :-D

Bulhwi Cha (Nov 16 2023 at 23:33):

Patrick Massot said:

I don't say that we have to use the exact same format, bit those files have tooling such as https://poedit.net/

Minju Kim, @Shim SeongWoo (Shim is the surname) and I used OmegaT to translate NNG3 into Korean early last year. It's a translation memory application and free software.

Kyle Miller (Nov 16 2023 at 23:33):

@Patrick Massot I got a prototype of this together. It's only for String at the moment. It provides tr!"" strings that are monad-valued, and they read the current value of the lang option. When compiling, the system checks to see if there are translations for every declared language, and if there are not it suggests a new @[translation] definition to add to the environment. This system expects all translations of all keys to exist ahead of their use.

Here's an example of usage:

import Mathlib.I18n.Basic

decl_lang fr "français"

@[translation fr "Expecting an identifier, not {1}"]
def tr1 (x : String) : String :=
  s!"attendu : un identifiant, pas {x}"

elab "foo" x:num : tactic => do
  throwError  tr!"Expecting an identifier, not {x.getNat}"

example : true := by
  foo 1 -- Expecting an identifier, not 1

set_option lang "fr"

example : true := by
  foo 1 -- attendu : un identifiant, pas 1

elab "foo2" : tactic => do
  throwError  tr!"unknown error"
/-
Warning: Missing translation for fr (français) with key "unknown error".

Template:

@[translation fr "unknown error"]
def tr : String :=
  s!"unknown error"
-/

elab "foo3" : tactic => do
  throwError  tr!"The following are not equal: {1} and {2}"
/-
Missing translation for fr (français) with key "The following are not equal:{1} and {2}".

Template:

@[translation fr "The following are not equal:{1} and {2}"]
def tr (s_1 : String) (s_2 : String) : String :=
  s! "The following are not equal: {s_1 } and {s_2}"
-/

Kyle Miller (Nov 16 2023 at 23:33):

Mathlib.I18n.Init

Mathlib.I18n.Basic

Kyle Miller (Nov 16 2023 at 23:37):

These are basically for s!"" string translation, and I was imagining that it could be additionally keyed by whether it's for an s!, f!, or m! translation. You'd need some other prefixes for these options; maybe ts!, tf!, and tm!, or a tr% prefix that's aware of these three types of strings.

Patrick Massot (Nov 16 2023 at 23:53):

Thanks! That's already great. Of course translating before using isn't the most convenient thing to do, but this is definitely usable.

Kyle Miller (Nov 16 2023 at 23:55):

The workflow I'm imagining is this:

  • You want to translate a new language, so you add a decl_lang
  • You compile your project
  • The collection of warnings is new translation tasks
  • You create a module with all these translations and import it
  • You rebuild

Kyle Miller (Nov 16 2023 at 23:56):

Either some tool could look at the output and collect the warnings, or the elaborator could write them to a file (say if there's some option set that you pass in on the command line)

Patrick Massot (Nov 16 2023 at 23:57):

Having some mechanism to write templates to a file would definitely be a huge boost of usability.

Kyle Miller (Nov 16 2023 at 23:58):

Right now it's not convenient for people who want to translate without recompiling, and you also can't use these translations from within the MacroM monad (it doesn't have getOptions)

Kyle Miller (Nov 17 2023 at 00:09):

For m! strings, I believe there's not that sort of limitation, since you can use docs#Lean.MessageData.ofPPFormat to have it lazily translate once it finally gets displayed, rather translated when it's generated.


Last updated: Dec 20 2023 at 11:08 UTC