Zulip Chat Archive

Stream: general

Topic: Fintype Char?

Bjørn Kjos-Hanssen (Jun 18 2024 at 00:42):

It seems that a

Fintype Char

instance should be an easy goal? A

Fintype UInt32

instance would probably suffice, although maybe that is too large somehow.

Kim Morrison (Jun 18 2024 at 00:52):

What is the use case?

Bjørn Kjos-Hanssen (Jun 18 2024 at 01:05):

I want to look for patterns in words over an alphabet like {A,...,Z}.

Eric Wieser (Jun 18 2024 at 01:28):

Here's a prototype:

import Mathlib



instance : Fintype Char where
  elems := ((Finset.Ico 0 0xd800).disjUnion (Finset.Ioo 0xdfff 0x110000: Finset <| Fin (2 ^ 32)) <| by
    simp_rw [Finset.disjoint_left, Finset.mem_Ico, Finset.mem_Ioo, Not, and_imp]
    intro a ha ha' hb hb'
    have := hb.trans ha'
    rw [Fin.lt_def] at this
    change 57343 < 55296 at this
    norm_num at this
    ).attach.map ⟨ fun x => {
      val := ⟨x.val⟩
      valid := by simpa using x.prop
    }, fun x y h => by ext1; cases x; cases y; simp at *; injection h with h; injection h with h⟩

  complete c := by
    simp [Char.ext_iff]
    sorry

#eval (Fintype.card Char) -- 1112064

Eric Wieser (Jun 18 2024 at 01:29):

The proofs are a bit sloppy because the dance between Char/UInt32/Nat/Fin is a little awkward

Bjørn Kjos-Hanssen (Jun 18 2024 at 01:39):

Thanks @Eric Wieser
It does look complex!

Eric Wieser (Jun 18 2024 at 01:40):

I think it would be a lot cleaner with a LocallyFiniteOrder UInt32 instance

Eric Wieser (Jun 18 2024 at 01:42):

( @Yaël Dillies, is there a one-liner to pull that along an equiv, here from Fin?)

Eric Wieser (Jun 18 2024 at 01:44):

Ah yes, docs#OrderIso.locallyFiniteOrder

Eric Wieser (Jun 18 2024 at 09:55):

Ok, a cleaned up version:

import Mathlib

section shouldBeInCore

theorem UInt32.val_lt_val {a b : UInt32} : a.val < b.val ↔ a < b := Iff.rfl
theorem UInt32.val_le_val {a b : UInt32} : a.val ≤ b.val ↔ a ≤ b := Iff.rfl

@[simp]
theorem UInt32.val_ofNat (n : ℕ) :
  (no_index (OfNat.ofNat n) : UInt32).val = OfNat.ofNat n := rfl

theorem UInt32.val_max (a b : UInt32) : (max a b).val = max a.val b.val := by
  simp [max_def, instMaxUInt32, maxOfLe, instLEUInt32]
  aesop

theorem UInt32.val_min (a b : UInt32) : (min a b).val = min a.val b.val := by
  simp [min_def, instMinUInt32, minOfLe, instLEUInt32]
  aesop


theorem UInt32.compare_def (a b : UInt32) : compare a b = compare a.val b.val := by
  simp only [instOrdUInt32, instOrdFin, compareOfLessAndEq, instOrdNat, instLTUInt32,
  Fin.val_injective.eq_iff, UInt32.val_injective.eq_iff]
  rfl

theorem UInt32.isValidChar_iff {a : UInt32} :
    a.isValidChar ↔ a < 0xd800 ∨ 0xdfff < a ∧ a < 0x110000 := rfl

@[simp]
theorem UInt32.zero_le (x : UInt32) : 0 ≤ x := Nat.zero_le _

theorem Char.ext_iff {a b : Char} : a = b ↔ a.val = b.val :=
  ⟨fun h => h ▸ rfl, Char.ext⟩

end shouldBeInCore

instance : Sup UInt32 where sup := max
instance : Inf UInt32 where inf := min

instance : LinearOrder UInt32 :=
  LinearOrder.liftWithOrd UInt32.val UInt32.val_injective
    UInt32.val_max UInt32.val_min UInt32.compare_def

def UInt32.orderIsoFin : UInt32 ≃o Fin (2 ^ 32) where
  toFun := UInt32.val
  invFun := UInt32.mk
  map_rel_iff' := Iff.rfl
  left_inv _ := rfl
  right_inv _ := rfl

instance : LocallyFiniteOrder UInt32 :=
  UInt32.orderIsoFin.locallyFiniteOrder

instance : Fintype Char where
  elems := (
    (Finset.Ico 0 (0xd800 : UInt32)).disjUnion (Finset.Ioo 0xdfff 0x110000) <| by
        simp_rw [Finset.disjoint_left, Finset.mem_Ico, Finset.mem_Ioo, Not, and_imp]
        intro a _ ha hb _
        have := hb.trans ha
        revert this
        decide
      ).attach.map
    ⟨fun x => .mk x.val <| by simpa [UInt32.isValidChar_iff] using x.prop,
      fun x y h => Subtype.ext <| congr_arg Char.val h⟩
  complete x := by
    have := x.valid
    rw [UInt32.isValidChar_iff] at this
    simpa [Char.ext_iff]

Eric Wieser (Jun 18 2024 at 09:56):

@Kim Morrison, do you agree with my shouldBeInCore judgement above?

Kim Morrison (Jun 18 2024 at 10:03):

Yes, seems reasonable. (Although we're hoping to replace UIntX with a BitVec based implementation, so ... everything will change. :-)

Kyle Miller (Jun 18 2024 at 14:59):

I didn't expect this to work:

#eval Fintype.card Char
-- 1112064

I guess everything is suitably tail recursive when building a list with 1,112,064 elements!

Eric Wieser (Jun 18 2024 at 15:02):

I was very careful to not invoke quadratic behavior with union or image :)

Eric Wieser (Jun 19 2024 at 23:48):

I had a very quick go at upstreaming, but:

The max and min lemmas cannot be upstreamed, because core has no Min (Fin _) instance (it comes from the linear order)
It is unclear which file the compare lemmas would go in; in mathlib the rfl lemmas would go immediately after the definition, but this seems not to be the style used in core

Bjørn Kjos-Hanssen (Jun 20 2024 at 00:29):

Maybe it can go in "Mathematical aspects of Unicode"

Eric Wieser (Jun 20 2024 at 00:31):

All the Uint32 stuff isn't really about unicode at all though; that's just "Here's how the thing lean defined is defined"

Kim Morrison (Jun 20 2024 at 01:21):

Eric Wieser said:

It is unclear which file the compare lemmas would go in; in mathlib the rfl lemmas would go immediately after the definition, but this seems not to be the style used in core

This style is in flux, and I'm actually switching towards this, so if you would like to put these with their definitions that will be fine.

François G. Dorais (Jun 20 2024 at 01:58):

@Eric Wieser In the mean time, consider adding the missing bits to Batteries.Data.UInt.

I just added some batteries#853 batteries#854. Note that most of the theorems use toNat instead of val to avoid relying on the implementation of UInt types.

Eric Wieser (Jun 20 2024 at 07:49):

Arguably the solution here is to rename val to toFin, since I don't think the existence of a canonical map to fin is an implementation detail

Eric Wieser (Jun 20 2024 at 07:50):

(the implementation detail is the fact this happens to be a projection)

François G. Dorais (Jun 20 2024 at 09:16):

Indeed, that would be better. It's a reference implementation anyhow so the fact that it's a projection is not actually significant.

Eric Wieser (Jun 20 2024 at 09:21):

I would imagine the proposed refactor is something like:

structure UInt32 where
   toBitvec : Bitvec 32

@[deprecated] def UInt32.val (u : UInt32) : Fin (2 ^ 32) := u.toBitvec.toFin

in which case the val lemmas would be replaced by toBitvec lemmas, hopefully with little change in proof.

François G. Dorais (Jun 20 2024 at 09:59):

We'll have to see what actually happens. I think it's fine for a scalar type like this to have multiple views so long as they are backed by efficient code. Of course, only one can be chosen as the reference. Right now, the Batteries.Data.UInt file favors the toNat/toNat_lt view, but that doesn't mean that the goal is to ignore the toFin and toBitvec views.

Eric Wieser (Jun 20 2024 at 10:13):

I think it's fine for a scalar type like this to have multiple views so long as they are backed by efficient code.

Not every view needs to be efficient for computation; there is value to views that are only efficient for proofs

François G. Dorais (Jun 20 2024 at 10:17):

So long as they are marked noncomputable so that they aren't used in code.

Last updated: May 02 2025 at 03:31 UTC