Zulip Chat Archive

Stream: new members

Topic: Way to Automate Code

Colin Jones ⚛️ (Feb 03 2025 at 04:32):

Is there a way to automate / simplify this piece of code?

import Mathlib

/-
# Genetic code
This file defines 2 functions called `codon` and `anticodon` that match a 3 letter string comprised
  of 4 letters (`U`, `A`, `G`, or `C`) with their respective amino acid.

open Function

/- # The Requirements for a String to be a Triplet or an Amino Acid -/
def rna_triplet := {s : String // s.length == 3 && s.dropWhile "UCGA".toList.contains == ""}

/- # The Triplet Code # -/
@[match_pattern]
def UUU : rna_triplet := ⟨"UUU", by aesop⟩
@[match_pattern]
def UUC : rna_triplet := ⟨"UUC", by aesop⟩
-- so on

I want to say that that each three-letter, combination of "UCGA" is within rna_triplet. I was wondering if Lean has a way to do that automatically without the lengthy proofs

Yakov Pechersky (Feb 03 2025 at 05:04):

You might want to first define isRnaBase predicate on Char. Then, you can make a isRna predicate on string which would be String.toList.Forall isRnaBase

Yakov Pechersky (Feb 03 2025 at 05:10):

Then, an RNA codon is isRna and length 3.

Yakov Pechersky (Feb 03 2025 at 05:11):

We don't currently have a list combination / outer product function

Kyle Miller (Feb 03 2025 at 05:13):

I think it would be a lot better to avoid String, since that brings in all the details of unicode.

What about defining an enumeration for bases and then work with lists of these?

inductive Base
  | U | A | G | C
open Base

def rna_triplet := { s : List Base // s.length = 3 }

def UUU : rna_triplet := ⟨[U, U, U], rfl⟩

Yakov Pechersky (Feb 03 2025 at 05:17):

One might avoid the enum inductive since nucleotide sequences can have degeneracy or "superposition" bases like supporting both U and T for translation, X as a wildcard, etc.

Kyle Miller (Feb 03 2025 at 05:21):

The enumeration could have these additional constructors. Adding constructors for different cases is surely better than bringing the details of UTF-8 into the situation, right?

Kyle Miller (Feb 03 2025 at 06:28):

Example dealing with non-bases, and making use of a bit of automation:

inductive DBase
  | U | A | G | C | X
open DBase

def DBase.isBase (b : DBase) : Bool := b matches U | A | G | C

def rna_triplet := { s : List DBase // s.length = 3 ∧ s.all DBase.isBase }
def rna_triplet.mk (s : List DBase)
    (h : s.length = 3 ∧ s.all DBase.isBase := by exact ⟨rfl, rfl⟩) :
    rna_triplet :=
  ⟨s, h⟩

def UUU : rna_triplet := rna_triplet.mk [U, U, U]

Colin Jones ⚛️ (Feb 04 2025 at 22:35):

Thank you all! I'll try this out and let you know if I have any questions

Colin Jones ⚛️ (Feb 17 2025 at 13:44):

From the suggestions I made a background system that fits my project needs, but I'm having difficulty fixing two eval errors.

open Function Classical

/- # Definition of Nucleotide Base #
  U := Uracil
  A := Adenosine
  G := Guanosine
  C := Cytosine
  T := Thymine
  I := Inosine
  N := Anything else
-/
inductive NucBase
  | U | A | G | C | T | I | N
open NucBase

def NucBase.isRNABase (b : NucBase) : Bool := b matches U | A | G | C | I

def NucBase.isDNABase (b : NucBase) : Bool := b matches T | A | G | C

def redundant (f : α → β) : Prop := ¬ Injective f


def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A
#eval dna_to_rna_singlet A

def dna_to_rna (L : List NucBase) : List NucBase :=
  match L with
  | [] => []
  | y :: ys => [dna_to_rna_singlet y] ++ dna_to_rna ys
#eval dna_to_rna [A, T, T] -- Error 1 here


def seq_to_triplet (s : List NucBase) := s.toChunks 3
#eval seq_to_triplet [U, A, T, I, T, C, G, C] -- Error 2 here

Could someone help debug?

Aaron Liu (Feb 17 2025 at 13:47):

It doesn't know how to print your lists of bases, you can deriving instance Repr for NucBase to fix it

Aaron Liu (Feb 17 2025 at 13:48):

Or you could write your own

Colin Jones ⚛️ (Feb 17 2025 at 13:49):

Thank you it fixed

Colin Jones ⚛️ (Feb 20 2025 at 17:44):

I'm wondering on how to prove how my function does terminate:

inductive NucBase
  | U | A | G | C | T | I | N
open NucBase
deriving instance Repr for NucBase

def redundant (f : α → β) : Prop := ¬ Injective f

/- # DNA Function Definitions # -/
def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A

def dna_to_rna (s : List NucBase) : List NucBase :=
  let t := s.reverse
  match t with
  | [] => []
  | y :: ys => [dna_to_rna_singlet y] ++ dna_to_rna ys
termination_by ys => ys.length

I'm confused on how to use termination_by and not experienced with using that clause. If I define the s.reverse as a separate function it doesn't have any issue because Lean is able to prove that both terminate e.g.:

def dna_to_rna_p1 (s : List NucBase) : List NucBase := s.reverse

def dna_to_rna_p2 (t : List NucBase) : List NucBase :=
  match t with
  | [] => []
  | y :: ys => [dna_to_rna_singlet y] ++ dna_to_rna_p2 ys

Philippe Duchon (Feb 20 2025 at 18:07):

(That's a strange function you are trying to write, eating a list from both ends alternatively)

I have not tried to solve your specific problem, but I did look into similar ones in Coq.

My understanding (from Theorem Proving in Lean) is that termination_byshould provide a well-founded relation (and apparently mapping lists to Nat is sufficient, because Lean knows that the usual order on naturals is well-founded), and then (if needed) decreasing_by should provide the proof that your recursive argument is indeed smaller according to this relation.

So, adding decreasing_by _ should lead to a proof state where you are asked to prove something like ys.length < s.length. I tried this, but in this case Lean doesn't remember anything of the relationship between s and ys, making the proof impossible.

Another thing that I don't understand is that in the proof state, the identifier for the original list becomes inaccessible (as shown by a dagger symbol).

(And here I am reaching the limits of my current understanding)

Aaron Liu (Feb 20 2025 at 18:31):

termination_by should be looking at the function's arguments, but are you sure this is the definition you want? You are reversing the list every time the function is called recursively.

inductive NucBase
  | U | A | G | C | T | I | N
deriving Repr
open NucBase

def redundant (f : α → β) : Prop := ¬ Injective f

/- # DNA Function Definitions # -/
def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A

def dna_to_rna (s : List NucBase) : List NucBase :=
  let t := s.reverse
  match _h : t with
  | [] => []
  | y :: ys => [dna_to_rna_singlet y] ++ dna_to_rna ys
termination_by s.length
decreasing_by
  simp_all (zetaDelta := true)

-- [A, U, G, C]
#eval dna_to_rna [A, G, C, T]

Yakov Pechersky (Feb 20 2025 at 18:34):

Are you sure you don't really want a (dna.map to_rna).reverse?

Matteo Cipollina (Feb 20 2025 at 19:03):

import Mathlib

inductive NucBase
  | U | A | G | C | T | I | N
open NucBase
deriving instance Repr for NucBase

def redundant {α β : Type} (f : α → β) : Prop := ¬ Function.Injective f

/- # DNA Function Definitions # -/
def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A

def dna_to_rna (s : List NucBase) : List NucBase :=
  match s with
  | [] => []
  | y :: ys => dna_to_rna_singlet y :: dna_to_rna ys
termination_by s  -- Lean can infer s.length

def dna_to_rna' (s : List NucBase) : List NucBase :=
  s.map dna_to_rna_singlet

theorem dna_to_rna_eq (s : List NucBase) :
  dna_to_rna s = dna_to_rna' s := by
  induction s with
  | nil =>
    simp [dna_to_rna, dna_to_rna']  -- simp works for the base case
  | cons x xs ih =>
    simp only [dna_to_rna, dna_to_rna', List.map, ih]

theorem dna_to_rna_singlet_redundant :
  redundant dna_to_rna_singlet := by
  simp [redundant, Function.Injective]
  use I, N
  constructor
  · simp [dna_to_rna_singlet]
  · simp [dna_to_rna_singlet]

theorem dna_to_rna_length (s : List NucBase) :
  (dna_to_rna s).length = s.length := by
  induction s with
  | nil =>
    simp [dna_to_rna]  -- simp handles the base case
  | cons x xs ih =>
    simp [dna_to_rna, ih]  -- simp uses the induction hypothesis

theorem dna_to_rna_composition (s t : List NucBase) :
  dna_to_rna (s ++ t) = dna_to_rna s ++ dna_to_rna t := by -- corrected order!
  induction s with
  | nil =>
    simp [dna_to_rna]
  | cons x xs ih =>
    simp [dna_to_rna, ih]

inductive ValidNucBase
  | A | T | G | C
deriving DecidableEq

def dna_to_rna_safe (n : ValidNucBase) : ValidNucBase :=
  match n with
  | ValidNucBase.T => ValidNucBase.A
  | ValidNucBase.A => ValidNucBase.T
  | ValidNucBase.G => ValidNucBase.C
  | ValidNucBase.C => ValidNucBase.G

def dna_to_rna'' (s : List ValidNucBase) : List ValidNucBase :=
  match s with
  | [] => []
  | y :: ys => dna_to_rna_safe y :: dna_to_rna'' ys
termination_by s

Matteo Cipollina (Feb 20 2025 at 19:52):

here with some easy and perhaps redundant extra bits that yet may be helpful to maintain type safety and to explore some more abstract perspective on this basic setting:

import Mathlib

inductive NucBase
  | U | A | G | C | T | I | N
open NucBase
deriving instance Repr for NucBase
deriving instance DecidableEq for NucBase

def redundant {α β : Type} (f : α → β) : Prop := ¬ Function.Injective f

/- # DNA Function Definitions # -/
def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A

def dna_to_rna (s : List NucBase) : List NucBase := -- Converts a list of DNA bases to RNA (recursive).
  match s with
  | [] => []
  | y :: ys => dna_to_rna_singlet y :: dna_to_rna ys
termination_by s  -- Termination proof: the length of 's' decreases in each recursive call.

-- Converts a list of DNA bases to RNA (using map).
def dna_to_rna' (s : List NucBase) : List NucBase :=
  s.map dna_to_rna_singlet

-- Proves equivalence of recursive and map-based DNA->RNA conversion.
theorem dna_to_rna_eq (s : List NucBase) :
  dna_to_rna s = dna_to_rna' s := by
  induction s with
  | nil =>
    simp [dna_to_rna, dna_to_rna']
  | cons x xs ih =>
    simp only [dna_to_rna, dna_to_rna', List.map, ih]

-- Shows that dna_to_rna_singlet is not injective (redundant).
theorem dna_to_rna_singlet_redundant :
  redundant dna_to_rna_singlet := by
  simp [redundant, Function.Injective]
  use I, N
  constructor
  · simp [dna_to_rna_singlet]
  · simp [dna_to_rna_singlet]

-- Proves that DNA->RNA conversion preserves list length.
theorem dna_to_rna_length (s : List NucBase) :
  (dna_to_rna s).length = s.length := by
  induction s with
  | nil =>
    simp [dna_to_rna]
  | cons x xs ih =>
    simp [dna_to_rna, ih]

-- Shows DNA->RNA conversion distributes over list concatenation.
theorem dna_to_rna_composition (s t : List NucBase) :
  dna_to_rna (s ++ t) = dna_to_rna s ++ dna_to_rna t := by
  induction s with
  | nil =>
    simp [dna_to_rna]
  | cons x xs ih =>
    simp [dna_to_rna, ih]

-- Enables decidable equality for ValidNucBase.
inductive ValidNucBase
  | A | T | G | C
deriving DecidableEq

-- Converts a single valid DNA base to its valid RNA complement.
def dna_to_rna_safe (n : ValidNucBase) : ValidNucBase :=
  match n with
  | ValidNucBase.T => ValidNucBase.A
  | ValidNucBase.A => ValidNucBase.T
  | ValidNucBase.G => ValidNucBase.C
  | ValidNucBase.C => ValidNucBase.G

-- Converts a list of valid DNA bases to RNA (recursive).
def dna_to_rna'' (s : List ValidNucBase) : List ValidNucBase :=
  match s with
  | [] => []
  | y :: ys => dna_to_rna_safe y :: dna_to_rna'' ys
termination_by s -- Termination: the length of 's' decreases.

-- Defines a type class for biological bases with complement and validity.
class BiologicalBase (α : Type) where
  complement : α → α
  valid : α → Prop

-- Checks if a sequence contains only valid DNA bases (A, T, G, C).
def is_valid_sequence (s : List NucBase) : Prop :=
  ∀ n, n ∈ s → n ∉ [I, N]

-- Shows DNA->RNA conversion maintains sequence validity.
theorem dna_to_rna_preserves_validity (s : List NucBase) :
  is_valid_sequence s → is_valid_sequence (dna_to_rna s) := by
  intro h
  induction s with
  | nil =>
    simp [dna_to_rna, is_valid_sequence]
  | cons x xs ih =>
      simp [dna_to_rna, is_valid_sequence] at *
      cases h with
      | intro hx hxs =>
        constructor
        · simp [dna_to_rna_singlet]
          cases x <;> simp [hx]
        · intros n hmem
          simp_all only [not_false_eq_true, and_self, implies_true, forall_const]

-- Declares List as a Functor, allowing mapping functions over lists.
instance : Functor (λ α => List α) where
  map := List.map

-- Shows dna_to_rna' respects the functorial property of list concatenation.
theorem dna_to_rna_functorial :
  ∀ s t, dna_to_rna' (s ++ t) = dna_to_rna' s ++ dna_to_rna' t := by
  intros s t
  simp [dna_to_rna']

-- Shows that applying dna_to_rna'' twice is the identity.
theorem dna_to_rna_idempotent (s : List ValidNucBase) :
  dna_to_rna'' (dna_to_rna'' s) = s := by
  induction s with
  | nil =>
    simp [dna_to_rna'']
  | cons x xs ih =>
    have h_x : dna_to_rna_safe (dna_to_rna_safe x) = x :=
      by cases x <;> simp [dna_to_rna_safe]
    simp [dna_to_rna'']
    rw [h_x, ih]
    aesop

-- Proves that dna_to_rna_safe is bijective (injective and surjective).
theorem dna_to_rna_safe_bijective :
  Function.Bijective dna_to_rna_safe := by
  constructor
  · -- Prove injectivity
    intro x y h
    cases x <;> cases y <;>
    simp [dna_to_rna_safe] at h <;>
    try contradiction
    · rfl
    · rfl
    · rfl
    · rfl
  · -- Prove surjectivity
    intro y
    cases y
    · exact ⟨ValidNucBase.T, by simp [dna_to_rna_safe]⟩
    · exact ⟨ValidNucBase.A, by simp [dna_to_rna_safe]⟩
    · exact ⟨ValidNucBase.C, by simp [dna_to_rna_safe]⟩
    · exact ⟨ValidNucBase.G, by simp [dna_to_rna_safe]⟩

-- Defines a structure representing a nucleic acid with bases, complement, and validity.
structure NucleicAcid where
  bases : Type
  complement : bases → bases
  valid : bases → Prop
  complement_involution : ∀ b, valid b → complement (complement b) = b

-- Defines a type class for DNA categories, capturing functorial properties.
class DNACategory (C : Type → Type) where
  map : ∀ {α β}, (α → β) → C α → C β
  preserves_id : ∀ {α}, map (id : α → α) = id
  preserves_comp : ∀ {α β γ} (f : α → β) (g : β → γ),
    map (g ∘ f) = map g ∘ map f

-- Defines what it means for two NucBase values to be complements.
def is_complement_pair (x y : NucBase) : Prop :=
  dna_to_rna_singlet x = y ∧ dna_to_rna_singlet y = x

-- Shows the complement relation is symmetric.
theorem complement_symmetry :
  ∀ x y, is_complement_pair x y → is_complement_pair y x := by
  intro x y h
  exact ⟨h.right, h.left⟩

-- Defines a subtype: lists of NucBase with a specific length and validity.
def ValidSequence (n : Nat) := {s : List NucBase // s.length = n ∧ is_valid_sequence s}

-- Shows that dna_to_rna maps valid sequences to valid sequences, preserving length.
theorem valid_sequence_closure :
  ∀ (n : Nat) (s : ValidSequence n),
  ∃ (t : ValidSequence n), dna_to_rna s.val = t.val := by
  intros n s
  exists ⟨dna_to_rna s.val, (And.intro (Eq.trans (dna_to_rna_length s.val) s.property.left) (dna_to_rna_preserves_validity s.val s.property.right))⟩

-- Extends NucleicAcid with additional properties for complement.
structure NucleicAcid' extends NucleicAcid where
  complement_preserves_validity :
    ∀ b, valid b → valid (complement b)
  complement_faithful : -- Injective on a Valid domain
    ∀ b₁ b₂, valid b₁ → valid b₂ →
    complement b₁ = complement b₂ → b₁ = b₂

-- Type alias: Functors preserving validity when mapping over subtypes.
def ValidityFunctor (F : Type → Type) [Functor F] :=
  ∀ {α} (p : α → Prop), F {x : α // p x} → F α

-- Shows how to map a list of valid subtypes to a list of the underlying type.
def dna_to_rna_validity_functor :
  ValidityFunctor List := by
  intro α p xs
  exact xs.map Subtype.val

Colin Jones ⚛️ (Feb 24 2025 at 02:04):

Aaron Liu said:

termination_by should be looking at the function's arguments, but are you sure this is the definition you want? You are reversing the list every time the function is called recursively.

inductive NucBase
  | U | A | G | C | T | I | N
deriving Repr
open NucBase

def redundant (f : α → β) : Prop := ¬ Injective f

/- # DNA Function Definitions # -/
def dna_to_rna_singlet (n : NucBase) :=
  match n with
  | T => A
  | A => U
  | G => C
  | C => G
  | _ => A

def dna_to_rna (s : List NucBase) : List NucBase :=
  let t := s.reverse
  match _h : t with
  | [] => []
  | y :: ys => [dna_to_rna_singlet y] ++ dna_to_rna ys
termination_by s.length
decreasing_by
  simp_all (zetaDelta := true)

-- [A, U, G, C]
#eval dna_to_rna [A, G, C, T]

OHHH Okay yeah I did not realize I was reversing the list every time it was called. My bad. Thank you for pointing this out.

Colin Jones ⚛️ (Feb 24 2025 at 02:06):

@Matteo Cipollina Thank you for the proof, but I realize I've made a mistake in my code that Yakov Phillippe pointed out. Sorry all for bothering with a silly comp. sci. mistake.

Matteo Cipollina (Feb 24 2025 at 12:36):

@Colin Jones ⚛️ Thanks for the heads up, indeed in the version I shared I've already got rid of s.reverse, so that should be correct in that respect :)

Last updated: May 02 2025 at 03:31 UTC