Zulip Chat Archive

Stream: lean4

Topic: Proof relevance

awalterschulze (Jan 21 2024 at 17:53):

We are doing some proof relevant proofs, so that the proofs are proven in the Type universe instead of Prop. For this we have had to redefine quite a bit of Eq and Decidable and we don't have the rewrite tactic working yet. Has anyone ever attempted this in Lean or are we the first?

https://github.com/katydid/proofs/blob/main/Katydid/Std/Tipe.lean
https://github.com/katydid/proofs/blob/main/Katydid/Std/TDecidable.lean

awalterschulze (Jan 21 2024 at 17:54):

Also I this function seems illegal to me

inductive TEq {α : Type u} : α -> α -> Type u where
  | refl (x : α) : TEq x x

-- legal
def eq_of_teq {α : Type u} {a a' : α} (h : TEq a a') : Eq a a' :=
  sorry

-- illlegal
def teq_of_eq {α : Type u} {a a' : α} (h : Eq a a') : TEq a a' :=

since technically I am now defining a function that goes from Prop to Type, which is usually not allowed.

Adam Topaz (Jan 21 2024 at 18:42):

Seems fine to me:

inductive TEq {α : Type u} : α -> α -> Type u where
  | refl (x : α) : TEq x x

-- legal
def eq_of_teq {α : Type u} {a a' : α} (h : TEq a a') : Eq a a' :=
  h.rec rfl

-- illlegal
def teq_of_eq {α : Type u} {a a' : α} (h : Eq a a') : TEq a a' :=
  h.rec <| .refl _

Adam Topaz (Jan 21 2024 at 18:54):

But I also see now that you have essentially the same thing in the file linked above. What do you mean by "illegal"?

Kyle Miller (Jan 21 2024 at 19:05):

Take a look at https://lean-forward.github.io/logical-verification/2018/41_notes.html, the sections about elimination. Usually Prop to Type is disallowed, but Eq is a "syntactic subsingleton" so it's allowed to eliminate to Type.

Andrew Carter (Jan 22 2024 at 03:01):

I've been thinking about this somewhat with asymptotics, in particular I want to keep track of a multiple but I don't really care which. And so I've been trying to work with:

structure ALE [CanonicallyOrderedLatticeCommSemiring θ] (x y: CostFunction α θ) where
  m: θ
  k: θ
  ale: x ≤ ↑m * y + ↑k

instead of ∃ (m k: θ), x ≤ ↑m * y + ↑k but its definitely awkward with not having simp in this case, so it would be kind of cool if we could have simp-equivalent tactics. So I've given it a some thought in terms of, I want to be able to produce a variable of this type, but I'm not super picky about which variable it is (although tighter bounds are "nicer").

That being said, I feel like most of the infrastructure already exists. I'm not sure from a practical perspective TEq differs from Eq. And I think in general most of the infrastructure already exists using normal lean functions, for example I thinkTiso is just Equiv.

Eric Wieser (Jan 22 2024 at 08:01):

Andrew Carter said:

And I think in general most of the infrastructure already exists using normal lean functions, for example I thinkTiso is just Equiv.

This sounds correct to me; I wouldn't expect you to need any new types in your goal to make things "proof" relevant, we already lately have all the types you need (And →PProd, Or → PSum, Exists →PSigma, no change needed for Eq)

Kyle Miller (Jan 22 2024 at 08:06):

Don't you still need to boost Eq to TEq? Otherwise it'll be subject to proof irrelevance.

Kyle Miller (Jan 22 2024 at 08:44):

I'm also not sure about Tiso being Equiv, since Equiv is puts the left- and right-inverse proofs into Prop, so it's hiding proof irrelevance.

Certainly they're equivalent however:

inductive TEq {α : Type u} : α -> α -> Type u where
  | refl (x : α) : TEq x x

def TEq_equiv {x y : α} : TEq x y ≃ (x = y) where
  toFun | .refl _ => rfl
  invFun | rfl => .refl _
  left_inv | .refl _ => rfl
  right_inv | rfl => rfl

Eric Rodriguez (Jan 22 2024 at 08:53):

Doesn't this mean that TEq is basically proof irrelevant too?

Kyle Miller (Jan 22 2024 at 09:01):

What does "basically" proof irrelevant mean? The condition of proof irrelevance would be that given two terms of TEq x y they are defeq.

David Thrane Christiansen (Jan 22 2024 at 09:10):

@Eric Rodriguez As I understand it, your question could be rephrased as "are all inhabitants of TEq x yequal to each other". In Lean, they are propositionally equal (this terminology is a bit confusing here - what's meant is "they can be proven to be equal", not "it's a Prop"):

inductive TEq {α : Type u} : α -> α -> Type u where
  | refl (x : α) : TEq x x

theorem TEq.all_tequal (p q : TEq x y) : TEq p q := by
  cases p
  cases q
  apply refl

theorem TEq.all_equal (p q : TEq x y) : p = q := by
  cases p
  cases q
  rfl

But as @Kyle Miller pointed out, we also care about what the type theory lets you consider equal without requiring a proof:

theorem TEq.all_equal (p q : TEq x y) : p = q := by
  /-
  tactic 'rfl' failed, equality lhs
    p
  is not definitionally equal to rhs
    q

  α✝: Type u_1
  xy: α✝
  pq: TEq x y
  ⊢ p = q
  -/
  rfl

theorem Eq.all_equal (p q : x = y) : p = q := by
  rfl

The fact that TEq.all_equal and TEq.all_tequal are true in Lean is actually an instance of a deep issue in type theory. Everyone used to just assume that this was the case, but nobody succeeded in proving it. Some version of it was often adopted as an axiom (the things to search for are "UIP" and "axiom K"), and it was provable using Coquand's initial formulation of dependent pattern matching, but the proof never went through. Then, Hofmann and Streicher published "The Groupoid Interpretation of Type Theory" in which they give a model in which this axiom is false.

In Lean, we just treat types as sets and so this is fine, and you don't really have to worry about which equality proof you have. But this is not true in every system out there, and it rules out certain type system features that are incompatible with it.

Eric Rodriguez (Jan 22 2024 at 09:24):

I did mean that 'this means that TEq is a subsingleton/TSubsingleton', so I feel like this makes this construction useless in Lean. I'm a bit surprised it's true, however, as I thought systems without UIP/K basically define equality the same way, but without having proof irrelevance in the kernel, it no longer even becomes provable propositionally that all instances of the identity type are equal - why does this fail with TEq here?

Jannis Limperg (Jan 22 2024 at 09:34):

UIP is known to be infectious like this. Otherwise we could do HoTT by just moving to TEq. :(

Junyan Xu (Jan 22 2024 at 15:11):

The Lean 3 HoTT repo does define Type-valued equality and implement a rewrite tactic, but it also disables large elimination altogether. I can't find the same def in this Lean 4 repo.

I think I had a similar idea as awalterschulze here, but realized it doesn't work (inconsistent with univalence + large elimination).

awalterschulze (Jan 22 2024 at 20:04):

Wow amazing responses. Thank you for everyone's help
We have been reading through this and it has been really interesting, but I wouldn't say we understood everything. Definitely more reading to do.

But it seems like there might not be a way to do proof relevance in Lean?

The use case is that we want to represent a language using :

def Lang: Type (u + 1) :=
  List α -> Type u

We want proof relevance to distinguish between different parses.

Andrew Carter (Jan 22 2024 at 23:27):

awalterschulze said:

We want proof relevance to distinguish between different parses.

Yeah, we are facing similar issues when working towards computational Complexity.
The approach I'm currently taking is to build a parallel language in Lean structures,
and an execution model. Then simulate the execution model in Lean. Ideally there would be some sort of nice DSL interface so it looks like you are writing normal lean, but actually you are building structures.

I.e. write a normal Lean function that takes a Lean structure that represents a Lean function and works through proving a proof relevance and provides a trace of all the steps it did (probably some form of Nat -> Step).

This might be a heavy lift for Proof relevance since you'd have to simulate a lot of Lean inside of Lean,
as opposed to Computational Complexity which can start with very simple constructs.

Jannis Limperg (Jan 23 2024 at 11:50):

It's not so clear to me what your respective applications are, but note that UIP doesn't necessarily mean that you can't do proof-relevant developments in Lean. For example, here is a proof-relevant membership predicate on lists:

inductive In : α → List α → Type
  | here : ∀ x xs, In x (x :: xs)
  | later : ∀ y, In x xs → In x (y :: xs)

def In.getIndex : In x xs → Σ' i, xs.get? i = some x
  | here x xs => ⟨0, by simp [List.get?]⟩
  | later y h =>
    let ⟨i, ih⟩ := h.getIndex
    ⟨i + 1, by simp [List.get?, ih]⟩

If you do need specifically proof-relevant equality, I would switch to Coq or Agda. Lean actively works against you in that case.

Shreyas Srinivas (Jan 23 2024 at 13:58):

I don't see why you need a proof relevant prop for your usage. You need an inductive type for counting steps and props on this type.

Shreyas Srinivas (Jan 23 2024 at 14:04):

You can add the traces as a parameter to one of your constructors. Thus objects with different traces are treated unequally.

Shreyas Srinivas (Jan 23 2024 at 14:04):

(deleted)

Shreyas Srinivas (Jan 23 2024 at 14:07):

that being said, I think you might find it easier to create a simple imperative language in which you can embed lean functions with separate complexity annotations and derive conditional results, rather than recreate lean in lean and trace the number of lean steps in executing that function. That would actually be much closer to how modern researchers work (although they do use complexity assertions which are either theorems or conjectures, rather than arbitrary assertions. Alternatively instead of annotating complexities, you could just parametrize them and then write conditional theorem statements with values for those parameters.

Shreyas Srinivas (Jan 23 2024 at 14:12):

I find the current attempts in the ITP community to verify algorithms by writing implementable/executable code pointless, since that is very far from how most algorithms theory research works these days.

Siegmentation Fault (Jan 26 2024 at 08:08):

Junyan Xu said:

The Lean 3 HoTT repo does define Type-valued equality and implement a rewrite tactic, but it also disables large elimination altogether. I can't find the same def in this Lean 4 repo.

It’s here: https://github.com/forked-from-1kasper/ground_zero/blob/a3122579f48dc7aa9a65be33d218249569980327/GroundZero/Types/Id.lean#L16-L17

awalterschulze (Jan 26 2024 at 16:43):

This is amazing, wow

So a little guess work here, but it seems like

If you add the hott prefix to a theorem, the it checks that large elimination is not used and this way proof relevance is retained. Is that true?

Siegmentation Fault (Jan 27 2024 at 10:11):

awalterschulze said:

If you add the hott prefix to a theorem, the it checks that large elimination is not used and this way proof relevance is retained. Is that true?

Yes exactly.

awalterschulze (Jan 30 2024 at 08:23):

We tried hott and it is pretty seamless, where it becomes really hard not to use large elimination

hott def hp : a_or_b ['a'] -> Bool :=
  fun _ => true

Since our regular expressions, seems to have it built in, in Char.ofNat.

uses large eliminator: Eq.rec <- Eq.ndrec <- Bool.noConfusion <- Nat.ble_succ_eq_true.match_1 <- Nat.ble_succ_eq_true <- Nat.ble_eq_true_of_le <- Nat.not_le_of_not_ble_eq_true <- Nat.decLe <- Nat.decLt <- Char.ofNat.proof_2 <- Char.ofNat <- hp

So I think it is also going to be really hard to actually use hott.

awalterschulze (Jan 30 2024 at 10:35):

Jannis Limperg said:

It's not so clear to me what your respective applications are, but note that UIP doesn't necessarily mean that you can't do proof-relevant developments in Lean. For example, here is a proof-relevant membership predicate on lists:
inductive In : α → List α → Type
  | here : ∀ x xs, In x (x :: xs)
  | later : ∀ y, In x xs → In x (y :: xs)

def In.getIndex : In x xs → Σ' i, xs.get? i = some x
  | here x xs => ⟨0, by simp [List.get?]⟩
  | later y h =>
    let ⟨i, ih⟩ := h.getIndex
    ⟨i + 1, by simp [List.get?, ih]⟩
If you do need specifically proof-relevant equality, I would switch to Coq or Agda. Lean actively works against you in that case.

We took a while to understand this, but it has been very helpful.
I understand now, we have use Eq in Prop, if we pair it in a PSigma Σ' with the proof relevant part to preserve how the string was parsed. This is still a work in progress, but already an improvement:

inductive TEq.{tu} {α: Type tu} (a b: α) : Type tu where
  | mk : a = b -> TEq a b

infixl:65 " ≡ " => TEq

def Lang : Type (u + 1) :=
  List α -> Type u

def concat (P : Lang α) (Q : Lang α) : Lang α :=
  fun (w : List α) =>
    Σ' (x : List α) (y : List α), w ≡ (x ++ y) × P x × Q y

With some use of cases we can use rewrite tactics, so that is great.
But it is a bit annoying to have to destruct, I was wondering if there is a way always unfold, but it seems unfolding from Type to Prop in TEq's case, is not really possible without explicitly calling cases.

Siegmentation Fault (Jan 31 2024 at 05:57):

awalterschulze said:

We tried hott and it is pretty seamless, where it becomes really hard not to use large elimination
hott def hp : a_or_b ['a'] -> Bool :=
  fun _ => true
Since our regular expressions, seems to have it built in, in Char.ofNat.
uses large eliminator: Eq.rec <- Eq.ndrec <- Bool.noConfusion <- Nat.ble_succ_eq_true.match_1 <- Nat.ble_succ_eq_true <- Nat.ble_eq_true_of_le <- Nat.not_le_of_not_ble_eq_true <- Nat.decLe <- Nat.decLt <- Char.ofNat.proof_2 <- Char.ofNat <- hp
So I think it is also going to be really hard to actually use hott.

Lean’s standard library and many standard tactics (like simp) heavily depend on the large elimination, so yes, HoTT way won’t be a cakewalk.
However, you can try either rewrite all Char-related machinery yourself avoiding large elimination (what is probably not that easy) or mark required machinery as @[hottAxiom] to use it (I don’t think that stuff like Chat.ofNat or Nat.decLt is incompatible with the univalence, so it should be safe to do that, but you need to work very carefully here).

Siegmentation Fault (Jan 31 2024 at 06:18):

Actually we can do both: construct without large elimination (i.e. in the hott environment) term of the same type as the problematic one (like Nat.decLe) as the evidence of its compatibility with the univalence and then mark problematic term as @[hottAxiom].

Kim Morrison (Jan 31 2024 at 06:36):

In fact you could probably introduce an attribute @[hottReplacement foo], that would check that foo has the same type as the current declaration. Then later invocations of @[hott] could take this into account.

Siegmentation Fault (Jan 31 2024 at 06:44):

There will be no problems in this approach with Prop’s and (probably) hprop’s, but what to do with the rest?

Siegmentation Fault (Jan 31 2024 at 11:10):

Anyway I tried: https://github.com/forked-from-1kasper/ground_zero/blob/805cee1e60c020ab6c6469be8217943f9a86bfd5/GroundZero/Theorems/Nat.lean#L328-L417

awalterschulze (Feb 03 2024 at 17:45):

Using @Jannis Limperg 's technique we found a way to preserve proof relevance, but still use normal equality = and also not require HoTT. So this simplifies things a lot.

def concat (P : Lang α) (Q : Lang α) : Lang α :=
  fun (w : List α) =>
    Σ' (x : List α) (y : List α), (_: P x) ×' (_: Q y) ×' w = (x ++ y)

awalterschulze (Feb 03 2024 at 17:50):

We do still have very few use cases when we need Type Equality:

infixl:65 " ≡ " => TEq

def emptyStr : Lang α :=
  fun w => w ≡ []

def char (a : α): Lang α :=
  fun w => w ≡ [a]

We are struggling to decide between two definitions of TEq

1) Allows us to use rfl, but we cannot use rw

inductive TEq.{u} {α : Type u} : α -> α -> Type u where
  | refl (x : α) : TEq x x

attribute [refl] TEq.refl

Hopefully we are doing something wrong and there is a way to help this definition to work with rw?

2) Doesn't allow us to use rfl, but allows us to use rw after constructor or cases, which gets annoying, but it useful.

inductive TEq.{tu} {α: Type tu} (a b: α) : Type tu where
  | mk : a = b -> TEq a b

theorem rewrite_test:
  ∀ (_: a ≡ b) (_: b ≡ c),
    a ≡ c := by
  intro ab bc
  constructor
  cases ab with
  | mk ab =>
    rw [ab]
    cases bc with
    | mk bc =>
      rw [bc]

Jannis Limperg (Feb 04 2024 at 12:48):

awalterschulze said:

def concat (P : Lang α) (Q : Lang α) : Lang α :=
  fun (w : List α) =>
    Σ' (x : List α) (y : List α), (_: P x) ×' (_: Q y) ×' w = (x ++ y)

I think the ×' in there could be \ands.

Jannis Limperg (Feb 04 2024 at 12:49):

Why do you need TEq in the definitions of emptyStr and char?

awalterschulze (Feb 04 2024 at 13:24):

Jannis Limperg said:

Why do you need TEq in the definitions of emptyStr and char?

I think what might be relevant is the definition of Lang, sorry for not sharing that earlier

def Lang : Type (u + 1) :=
  List α -> Type u

As far as I understand Lang has to be in Type, because otherwise you can't do pattern matching (proof relevance) on the Sigma parts that are returned by concat.

Sebastian Ullrich (Feb 04 2024 at 13:33):

I think an example of what you actually want to do with/prove about concat in the end would be very helpful, this is too vague to really help you

awalterschulze (Feb 04 2024 at 13:38):

Fair enough, hard to find the balance between sharing too much and too little and TBH I only figured out what I want this week. Sorry for not sharing earlier.

This is not an exact goal, but it gives an idea. I want to be able to print out the parse tree, which is represented by the proof.
For this I need to be able to do cases on the resulting expression to see which paths were contradictions and which were successful.

def example_of_proof_relevant_parse : (or (char 'a') (char 'b')) (toList "a") -> Nat := by
  intro x
  cases x with
  | inl xa =>
    cases xa with
    | mk eq =>
      cases eq with
      | refl =>
        exact 0
  | inr xb =>
    cases xb with
    | mk eq =>
      contradiction

I am still working on a print function.

Sebastian Ullrich (Feb 04 2024 at 13:52):

Thanks, it makes more sense now to me. But is your set of operations really open, or do you know all of them beforehand? If it's the latter, you should define them as a standard inductive type, and then parse trees of them as a standard inductive family of types.

awalterschulze (Feb 04 2024 at 14:23):

Glad it makes sense, phew. I was struggling to quickly write a better function.

We tried the Inductive Predicate route in Coq a few years ago, but ended up running into the strictly positive limitation, because of the not operator. So we didn't even try it again in Lean ... until now.

Here is a small reproduction of the issue in Lean:

inductive Regex : Type where
  | emptyset : Regex
  | not : Regex → Regex
  deriving Repr

inductive Lang : Regex -> List α -> Prop where
  | lang_emptyset (str : List α):
    False ->
    Lang emptyset str
  | not (str: List α) (r: Regex):
    ((Lang r) str -> False) ->
    Lang (not r) str

In gives the error:

(kernel) arg #4 of 'Lang.not' has a non positive occurrence of the datatypes being declared

Keeping things open solves this strictly positive problem, but also less importantly opens up the operators to be easily extendible by users. For example, there are many variants of the shuffle operator, I know which one I prefer, but I don't know which a user might prefer.

Another important reason is that we are trying replicate a paper in Agda, so we are trying to stick as closely as possible to the definitions, so we can have a fair comparison between the two languages.

Sebastian Ullrich (Feb 04 2024 at 14:32):

Ah, not is an interesting addition but now I'm even more confused about your ultimate goals - what even is a parse tree of not? Essentially that is the question Lean is failing to answer when you give it that definition.

Another important reason is that we are trying replicate a paper in Agda, so we are trying to stick as closely as possible to the definitions, so we can have a fair comparison between the two languages.

Sounds like that paper may be relevant for us to better understand your goal :)

awalterschulze (Feb 04 2024 at 14:51):

Good points and also now that you mentioned it, that is a very good question and I see now that I got my requirements mixed up.

I had and have the requirement for the not operator from a previous and future project.
But it doesn't seem as though the not operator is present in this Agda reproving project, so I got confused.

I'll have to do some homework to find out why the Agda paper without the not operator, chose to create an open definition. Here is the paper http://conal.net/papers/language-derivatives/ we are trying to reprove in Lean.

Sebastian Ullrich (Feb 04 2024 at 15:26):

Ah, sections 4.2/4.3 with inductive definition of the language vs coinductive definition of parsing is quite nice, though of course the latter is less easily replicated in Lean

awalterschulze (Mar 09 2024 at 10:08):

Apparently Conal has an idea for how to get around the coinductive part in Lean, but we are aware that this might be a problem in future.

awalterschulze (Mar 09 2024 at 10:14):

In the mean time I tried

inductive Regex : (α: Type u) -> Type (u + 1) where
  | emptyset : Regex α
  | emptystr : Regex α
  | char : α → Regex α
  | or : Regex α → Regex α → Regex α
  | concat : Regex α → Regex α → Regex α
  | star : Regex α → Regex α

inductive Lang : {α : Type u} -> Regex α -> List α -> Type (u + 2) where
  | lang_emptyset (str : List α):
    False ->
    Lang Regex.emptyset str
  | lang_emptystr:
    Lang Regex.emptystr ([]: List α)
  | lang_char (c: α):
    Lang (Regex.char c) [c]
  | lang_or (str: List α) (r1 r2: Regex α):
    Lang r1 str ⊕ Lang r2 str ->
    Lang (Regex.or r1 r2) str

But I get the error

(kernel) invalid nested inductive datatype 'Sum', nested inductive datatypes parameters cannot contain local variables.

This also has the same problem in Prop

In this case, I am pretty sure I am making a noob mistake

Arthur Adjedj (Mar 09 2024 at 12:10):

This error is to be expected, and there are no easy way to use Sum the way you want to here. This definition is equivalent, altought it might be more bothersome to work with:

inductive Regex : (α: Type u) -> Type (u + 1) where
  | emptyset : Regex α
  | emptystr : Regex α
  | char : α → Regex α
  | or : Regex α → Regex α → Regex α
  | concat : Regex α → Regex α → Regex α
  | star : Regex α → Regex α

inductive Lang : {α : Type u} -> Regex α -> List α -> Type (u + 2) where
  | lang_emptyset (str : List α):
    False ->
    Lang Regex.emptyset str
  | lang_emptystr:
    Lang Regex.emptystr ([]: List α)
  | lang_char (c: α):
    Lang (Regex.char c) [c]
  | lang_or (str: List α) (r1 r2: Regex α):
    (Σ r : Regex α, (r = r1 ⊕' r = r2))→
    Lang r str ->
    Lang (Regex.or r1 r2) str

Arthur Adjedj (Mar 09 2024 at 12:16):

As a sidenote, your inductive definition would be accepted in Coq, since they do not check for free variables occurences in nested applications like Lean does.

Inductive Regex (A : Type) : Type :=
  | regex_or : Regex A -> Regex A -> Regex A.

Inductive Lang (A : Type) : Regex A -> list A -> Type :=
  | lang_or {str r1 r2} : sum (Lang A r1 str) (Lang A r2 str) -> Lang A (regex_or _ r1 r2) str.

Would anyone know what the justification for such a check is ? My intuition tells me that such a check is not useful on "small" parameters, since they can arbitrarily be lifted to indices without reducing the logical expressivity of the inductive type.

Arthur Adjedj (Mar 09 2024 at 12:31):

Another way to make this work would be to manually translate your nested inductive type into a mutual one as such:

mutual
inductive Lang {α : Type u}: Regex α -> List α -> Type (u + 2) where
  | lang_emptyset (str : List α):
    False ->
    Lang Regex.emptyset str
  | lang_emptystr:
    Lang Regex.emptystr ([]: List α)
  | lang_char (c: α):
    Lang (Regex.char c) [c]
  | lang_or (str: List α) (r1 r2: Regex α):
    SumLang r1 r2 str ->
    Lang (Regex.or r1 r2) str

inductive SumLang {α : Type u} : Regex α → Regex α → List α → Type (u + 2) where
  | inl : Lang r1 str → SumLang r1 r2 str
  | inr : Lang r2 str → SumLang r1 r2 str
end

awalterschulze (Mar 09 2024 at 13:05):

This one is probably the simplest

(Σ r : Regex α, (r = r1 ⊕' r = r2))

But I would also like to know why it was decided to check these free variables, when it isn't done in Coq

François G. Dorais (Mar 09 2024 at 13:09):

Just do two rules for or:

inductive Lang : {α : Type u} -> Regex α -> List α -> Type (u + 2) where
  | lang_emptyset (str : List α):
    False ->
    Lang Regex.emptyset str
  | lang_emptystr:
    Lang Regex.emptystr ([]: List α)
  | lang_char (c: α):
    Lang (Regex.char c) [c]
  | lang_or1 (str: List α) (r1 r2: Regex α):
    Lang r1 str -> Lang (Regex.or r1 r2) str
  | lang_or2 (str: List α) (r1 r2: Regex α):
    Lang r2 str -> Lang (Regex.or r1 r2) str

Arthur Adjedj (Mar 09 2024 at 13:20):

awalterschulze said:

This one is probably the simplest
(Σ r : Regex α, (r = r1 ⊕' r = r2))
But I would also like to know why it was decided to check these free variables, when it isn't done in Coq

The main reason, as far as I can see, is that to check for nested positivity in the kernel, Lean first converts nested inductive types into equivalent mutual inductive types, similarly to what I've done above with SumLang. The issue seems to be that converting nested inductives with free variables in the nested appeareances is just harder to do, I don't think there is any theoretical reason behind it.

Mario Carneiro (Mar 11 2024 at 08:34):

@Arthur Adjedj said:

As a sidenote, your inductive definition would be accepted in Coq, since they do not check for free variables occurences in nested applications like Lean does.
Inductive Regex (A : Type) : Type :=
  | regex_or : Regex A -> Regex A -> Regex A.

Inductive Lang (A : Type) : Regex A -> list A -> Type :=
  | lang_or {str r1 r2} : sum (Lang A r1 str) (Lang A r2 str) -> Lang A (regex_or _ r1 r2) str.
Would anyone know what the justification for such a check is ? My intuition tells me that such a check is not useful on "small" parameters, since they can arbitrarily be lifted to indices without reducing the logical expressivity of the inductive type.

This doesn't really address the main question, but I want to remark on the polarity of this implication: A justification is required to omit a check. The baseline position is that everything even slightly weird is disallowed because we only want to permit those things we understand and have a reasonable soundness argument for. One does not need a justification for why some variation is disallowed.

Arthur Adjedj (Mar 11 2024 at 09:01):

I totally agree with this, and perhaps I worded my remark wrong. To rephrase my thoughts: the way nested inductive types are currently checked in the kernel is that nested inductives are translated into mutual ones before checking for strict positivity, universe sizes, generating recursors… The current implementation of the nested => mutual preprocessor does not handle free variables in parameters of nested appearances, and I am now fairly confident its algorithm could be extended to support that. Indeed, any local variables appearing in parameters of a nested appearance could be translated into indices of the translated mutual type. Such a preprocessor would be fairly more complex than the current one, but wouldn't endanger the soundness of the kernel in any way, since the fundamental checks would still be performed on the mutual translation.

Mario Carneiro (Mar 11 2024 at 10:12):

Could you do a worked example? I'm still not sure how you are generalizing the variables

Mario Carneiro (Mar 11 2024 at 10:13):

I think it also runs into issues identifying when you have generated the type already

Arthur Adjedj (Mar 11 2024 at 10:26):

The translation made earlier in the thread where SumLang appears in such an example, though I plan on making a more detailed explanation of this process when I have time to do so. I am also already working on such an implementation for my own curiosity, and hope to have something working in the following weeks, even if this was to never be merged. I can get back to you later today with more details.

Mario Carneiro (Mar 11 2024 at 10:28):

What would you do if it was Sum (Lang r1 "a") (Lang r2 str)? And if it shows up twice or with variations would it count as a new mutual inductive or not?

Arthur Adjedj (Mar 11 2024 at 11:11):

Since the kernel already handles such cases where the indices may vary in the nesting (i.e writing Lang .emptystr "a" ⊕ Lang .emptystr "b" already works), I don't think handling such things would be a problem.

As for checking variations of the inductive, it is true that the kernel checks if the mutual equivalent of the nested appearance has already been created so at to not duplicate those. I believe a safe, altough expensive way to make sure all needed variations are made without overlaps would be to either not check for duplicates, and always create a new inductive in the mutual block when encountering a nested appearance, or to add a notion of equivalence between inductive types in the kernel, where two inductives would be equivalent iff they have the same type signatures, same constructor names, and same constructor types, all modulo alpha-equivalence.

Arthur Adjedj (Mar 11 2024 at 11:12):

Again, I am short on time now, but should be able to give a more detailed presentation of my idea later today.

Arthur Adjedj (Mar 11 2024 at 18:08):

Consider an inductive type Foo Ps Is with parameters Psand indices Is and a constructor foo which type contains a term of the form I As Bs, where I is a previously defined inductive type, As are parameters of I, and Bs indices of I, such that Foo Ps occurs in As.

Currently, the kernel creates an auxiliary inductive type I_aux, with parameters Ps and with the same indices as I, for which the type signature, (resp. the types of its constructors) is the type signature of I in which its original parameters are replaced with As. However, if As was to contain local variables (read: free variables that aren't parameters Ps), this would lead to an ill-formed and ill-typed auxiliary inductive. Thus, a good solution would be to collect the local variables in As, and to prepend the indices of I_aux with types corresponding to those variables.
In practice, the following changes need to be made:

In is_nested_inductive_app, return not only the inductive val, but also the loose bvars collected in the parameters of the term.
In replace_if_nested, add a parameter to the function, that being a buffer of the types of variables encountered during the traversal of the constructor, prepend auxJ_type
and each auxJ_cnstr_type with the types associated with the loose bvars
Modify replace_all_nested to store the types of variables encountered during the traversal, and to give those to replace_if_nested

Furthermore, regarding your remark for duplications, I believe storing not only the parameters, but also the indices of nested occurences in m_nested_aux, and adapting the check in replace_if_nested accordingly would be enough to ensure the same behaviour than the current one in that regard.

As for restoring the nested results back from the mutual ones, I believe there should be little work to do on that part, if all goes well.

awalterschulze (Aug 28 2024 at 20:54):

We got it working: https://github.com/katydid/symbolic-automatic-derivatives

I just want to thank everyone for their help and say sorry for disappearing.
It is tough to find time to have fun in Lean in between work, etc.

I say working, but we still have one big problem. We are using unsafe, to get around the coinduction issue and would like to find another solution, but this is probably a discussion for a new thread. The rest is trivial, just a few small proofs, that we probably know how to do, but just need to find the time to complete.

All the type definitions are otherwise working. Thank you for all your help

Last updated: May 02 2025 at 03:31 UTC