Zulip Chat Archive

Stream: Lean Together 2024

Topic: Automatic Differentiation - Tomas Skrivan


Eric Wieser (Jan 09 2024 at 16:10):

The relevant file: https://github.com/lecopivo/SciLean/blob/master/SciLean/Core/FunctionTransformations/FDeriv.lean

Kevin Buzzard (Jan 09 2024 at 16:11):

These tactics (fprop etc) are not in mathlib, right?

Eric Rodriguez (Jan 09 2024 at 16:14):

Not as far as I'm aware - they seem amazing!

Eric Wieser (Jan 09 2024 at 16:16):

For anyone else wondering, this fun x =>L[R] x^2 notation is coming from

macro "fun " x:ident " =>L[" R:term "] " b:term : term =>
  `(ContinuousLinearMap.mk' $R (fun $x => $b) (by fprop))

Kevin Buzzard (Jan 09 2024 at 16:16):

Completely dumb question: these outputs are formally verified, or just some kind of #eval trick?

Kevin Buzzard (Jan 09 2024 at 16:18):

(the #check stuff)

Eric Wieser (Jan 09 2024 at 16:24):

@Yury G. Kudryashov's #2202 would make it slightly easier to use fprop with mathlib morphism types

Eric Wieser (Jan 09 2024 at 16:26):

Though it would be a paradigm shift from using operations on bundled morphisms everywhere (via comp variants) to using constructors where the linearity is proved afresh every time

Kevin Buzzard (Jan 09 2024 at 16:31):

Wow! That was fabulous! Patrick and Johan had questions, maybe you could take them in the conference stream?

Kevin Buzzard (Jan 09 2024 at 16:32):

oh lol sorry, that was supposed to be a DM

Kevin Buzzard (Jan 09 2024 at 16:32):

@Tomas Skrivan

Johan Commelin (Jan 09 2024 at 16:34):

My question is that in the final ML example, you were considering functions between floaty types. So in terms of the pictures on one of your first slides, you were already in "approx" column. Is the reason that your AD still works because you work directly on the level of the AST?

Tomas Skrivan (Jan 09 2024 at 16:35):

Kevin Buzzard said:

These tactics (fprop etc) are not in mathlib, right?

They are not, at some point I would like to merge it to mathlib and I think it is getting to the point where it is mature enough.

Patrick Massot (Jan 09 2024 at 16:37):

My question was about the fprop tactic. I guess everyone had this question: can we have a fast continuity and differentiability tactics in Mathlib?

Tomas Skrivan (Jan 09 2024 at 16:37):

Johan Commelin said:

My question is that in the final ML example, you were considering functions between floaty types. So in terms of the pictures on one of your first slides, you were already in "approx" column. Is the reason that your AD still works because you work directly on the level of the AST?

Usually everything is formulated and proven polymorphic in the field, i.e. something like{K} [IsROrC K]. When I want to run the program, I just provide inconsistent instance IsROrC Float. Sometimes, like in the ML example, I do not bother in writing it in field polymorphic way.

Tomas Skrivan (Jan 09 2024 at 16:38):

Patrick Massot said:

My question was about the fprop tactic. I guess everyone had this question: can we have a fast continuity and differentiability tactics in Mathlib?

Yes I would like to merge it to mathlib ;)

Jannis Limperg (Jan 09 2024 at 16:43):

Is there an approximate timeline for this merge? I've been looking on and off into speeding up the Aesop-based continuity and measurability tactics, but if better special-purpose versions are coming soon anyway, I'll shelve this.

Tomas Skrivan (Jan 09 2024 at 16:44):

Kevin Buzzard said:

Completely dumb question: these outputs are formally verified, or just some kind of #eval trick?

All that was done with fderiv and Continuity was fully verified. However, results with ∂ ∇ were not, as they do not use mathlib definition of derivative and adjoint, mainly because X × Y is not inner product space by default. I tried using something like ProdL2 X Y but it is too easy abuse defeq with X × Y that sometimes fails and leads to horrible error that I just gave up on dealing with :)

Eric Wieser (Jan 09 2024 at 16:45):

In theory using WithLp.equiv liberally would save you from the defeq issues, but it's still annoying to do so

Tomas Skrivan (Jan 09 2024 at 16:45):

Here are the slides if anyone is interested

Tomas Skrivan (Jan 09 2024 at 16:47):

Eric Wieser said:

In theory using WithLp.equiv liberally would save you from the defeq issues, but it's still annoying to do so

Another issues is that I want to do automatic differentiation of programs and I really can't tell people to not to use Prod if they want their program to be differentiated.

Eric Wieser (Jan 09 2024 at 16:47):

The choice of norm doesn't actually change the derivative, right? I think @Yury G. Kudryashov has some ideas in the work to remove the norm requirement from docs#fderiv

Yury G. Kudryashov (Jan 09 2024 at 16:49):

Yes, the norm is not relevant. I will look at my branch about f =oTVS[k, l] g tonight.

Tomas Skrivan (Jan 09 2024 at 16:50):

But also for gradient it would be nice to be able to have inner product without requiring norm. Such that you can write gradient fun x : X × Y => x.1

Tomas Skrivan (Jan 09 2024 at 23:22):

Jannis Limperg said:

Is there an approximate timeline for this merge? I've been looking on and off into speeding up the Aesop-based continuity and measurability tactics, but if better special-purpose versions are coming soon anyway, I'll shelve this.

I'm still a bit unsure if my version is good enough for mathlib purposes. I haven't done much work with stuff like ContinuityAt/On.

Within the next month I will try to make a PR to mathlib adding fprop and see how many continuity can be replaced with fprop. Then we will have a better idea if continuity can be replaced by fprop or not.

Yury G. Kudryashov (Jan 10 2024 at 08:49):

Yury G. Kudryashov said:

Yes, the norm is not relevant. I will look at my branch about f =oTVS[k, l] g tonight.

In branch#YK-isO-TVS, you can find a definition of IsLittleOTVS that doesn't depend on the norm with 2 theorems:

lemma isLittleOTVS_iff_tendsto_inv_smul {𝕜 α E : Type*} [NontriviallyNormedField 𝕜]
    [AddCommGroup E] [Module 𝕜 E] [TopologicalSpace E] [TopologicalAddGroup E] [ContinuousSMul 𝕜 E]
    {f : α  𝕜} {g : α  E} {l : Filter α} (h₀ : ∀ᶠ x in l, f x = 0  g x = 0) :
    IsLittleOTVS 𝕜 g f l  Tendsto (fun x  (f x)⁻¹  g x) l (𝓝 0) := _

lemma isLittleOTVS_iff_isLittleO {E F : Type*} [NormedAddCommGroup E] [NormedAddCommGroup F]
    [NormedSpace 𝕜 E] [NormedSpace 𝕜 F] {f : α  E} {g : α  F} {l : Filter α} :
    IsLittleOTVS 𝕜 f g l  f =o[l] g := by

Yury G. Kudryashov (Jan 10 2024 at 08:58):

Before it lants to Mathlib, a few things have to be done:

  • rename docs#gauge to something like rayGauge
  • introduce egauge (as in my branch, thanks to @Anatole Dedecker for formulating the definition in terms of it)
  • introduce gauge K s x := (egaue K s x).toReal
  • rewrite the definition as
def IsLittleOTVS (f : α  E) (g : α  F) (l : Filter α) : Prop :=
   U  𝓝 (0 : E),  V  𝓝 (0 : F), (gauge 𝕜 U <| f ) =o[l] (gauge 𝕜 V <| g )
  • adjust the theorems to the new definition
  • redefine HasFDerivAtFilter etc in terms of IsLittleOTVS without breaking API besides the anonymous constructor/cases
  • gradually generalize theorems to topological vector spaces (as needed).

Yury G. Kudryashov (Jan 10 2024 at 08:58):

Do you want to help with either of these steps?

Tomas Skrivan (Jan 10 2024 at 09:19):

I can give it a try but I have very little experience with actually formalizing mathematics and even less working with filters. So no promises :)

namibj (Jan 10 2024 at 09:32):

Johan Commelin said:

My question is that in the final ML example, you were considering functions between floaty types. So in terms of the pictures on one of your first slides, you were already in "approx" column. Is the reason that your AD still works because you work directly on the level of the AST?

Might be a place/case to have a "perfect"/idealized IEEE-style float, to force respect of Inf/signed zero/NaN without discretizing.

Eric Wieser (Jan 10 2024 at 09:35):

@Yury G. Kudryashov, do you have an example in mind where your new gauge disagrees with the renamed rayGauge? Edit: ah, when s = Icc 0 2 and x = -1, rayGauge s x = 0 but egauge Real s x = 0.5

Eric Wieser (Jan 10 2024 at 10:00):

Though I don't see why that matters here, since you're assuming an topological additive group, so aren't all the sets you use symmetric wrt - anyway?

Anatole Dedecker (Jan 10 2024 at 10:07):

I need to dive back into this (does someone have a link to the original discussion?) but we should think twice before renaming because AFAIK gauge is a standard name for the current docs#gauge

Eric Wieser (Jan 10 2024 at 10:13):

Maybe the motivation was just generalizing the scalar

Eric Wieser (Jan 10 2024 at 10:17):

In which case; would egauge ℝ≥0 be effectively the same as rayGauge?

Jannis Limperg (Jan 10 2024 at 10:21):

Tomas Skrivan said:

Within the next month I will try to make a PR to mathlib adding fprop and see how many continuity can be replaced with fprop. Then we will have a better idea if continuity can be replaced by fprop or not.

Great, then I'll hold off on my experiments for now. Let me know how it goes.

Tomas Skrivan (Jan 10 2024 at 10:32):

namibj said:

Might be a place/case to have a "perfect"/idealized IEEE-style float, to force respect of Inf/signed zero/NaN without discretizing.

Can you expand on this? What do you mean by "idealized IEEE float"? Do you mean actual IEEE float or reals + infinity + NaN + negative zero?

namibj (Jan 10 2024 at 12:49):

Tomas Skrivan said:

Can you expand on this? What do you mean by "idealized IEEE float"? Do you mean actual IEEE float or reals + infinity + NaN + negative zero?

I guess two kinds at least could be useful: the latter you described, and if you want to express certain bit-hacking/binary-float-particularities-exploiting things, at least swapping the exponent to be unlimited (i.e., non-finite Int) and I'd hope somehow making the mantissa be like a rational or probably better a "unspecified, but finite"-length normalized binary as usual.
I'm difference from IEEE, and in consideration of common fast SIMD restrictions, I'd probably actually skip/exclude sub-normals.

The difference between the "bignum"(-ish) floats and the "decorated" reals would be whether one wants to reason about effects like catastrophic cancellation, or just do analysis with functions that (aside from rounding errors!) behave (provably, if desired) correctly when fed actual (true) IEEE floats.

Yury G. Kudryashov (Jan 10 2024 at 13:43):

Eric Wieser said:

In which case; would egauge ℝ≥0 be effectively the same as rayGauge?

AFAIK, we don't have normed semifields and I don't know what's a good definition here.

Yury G. Kudryashov (Jan 10 2024 at 13:43):

Anatole Dedecker said:

I need to dive back into this (does someone have a link to the original discussion?) but we should think twice before renaming because AFAIK gauge is a standard name for the current docs#gauge

Is there a standard name for the "new" gauge?

Yury G. Kudryashov (Jan 10 2024 at 13:45):

Eric Wieser said:

Though I don't see why that matters here, since you're assuming an topological additive group, so aren't all the sets you use symmetric wrt - anyway?

I don't know how to define gauge over a general field so that it agrees with docs#gauge in case of docs#Real.

Kevin Buzzard (Jan 10 2024 at 14:23):

The concept of a gauge exists in nonarchimedean functional analysis: see e.g. top of p5 of https://docenti.math.unipd.it/geometria/sites/default/files/Basic%20notions%20on%20non-archimedian%20functional%20analysis_0.pdf .

Eric Wieser (Jan 10 2024 at 15:21):

Yury G. Kudryashov said:

Eric Wieser said:

In which case; would egauge ℝ≥0 be effectively the same as rayGauge?

AFAIK, we don't have normed semifields and I don't know what's a good definition here.

I suspect dist a (a + b) = norm b is a suitable axiom (on top of all the other easy ones)

Yury G. Kudryashov (Jan 10 2024 at 15:22):

I don't want to redesign normed groups/rings/fields right now.

Eric Wieser (Jan 10 2024 at 15:52):

Maybe we can just stick a norm on NNReal for now, which would be enough to merge the definitions of gauge and eGauge.toReal

Eric Wieser (Jan 10 2024 at 15:52):

We'd still need to duplicate all the theorems

Yury G. Kudryashov (Jan 10 2024 at 15:53):

And move existing lemmas to the NNReal namespace?

Yury G. Kudryashov (Jan 10 2024 at 15:55):

BTW, normed semifields can be useful to unify docs#tangentConeAt and docs#posTangentConeAt

Eric Wieser (Jan 11 2024 at 00:58):

Yury G. Kudryashov said:

I don't want to redesign normed groups/rings/fields right now.

#9642 makes a start on this


Last updated: May 02 2025 at 03:31 UTC