Zulip Chat Archive
Stream: general
Topic: Fintype Char?
Bjørn Kjos-Hanssen (Jun 18 2024 at 00:42):
It seems that a
Fintype Char
instance should be an easy goal? A
Fintype UInt32
instance would probably suffice, although maybe that is too large somehow.
Kim Morrison (Jun 18 2024 at 00:52):
What is the use case?
Bjørn Kjos-Hanssen (Jun 18 2024 at 01:05):
I want to look for patterns in words over an alphabet like {A,...,Z}.
Eric Wieser (Jun 18 2024 at 01:28):
Here's a prototype:
import Mathlib
instance : Fintype Char where
elems := ((Finset.Ico 0 0xd800).disjUnion (Finset.Ioo 0xdfff 0x110000: Finset <| Fin (2 ^ 32)) <| by
simp_rw [Finset.disjoint_left, Finset.mem_Ico, Finset.mem_Ioo, Not, and_imp]
intro a ha ha' hb hb'
have := hb.trans ha'
rw [Fin.lt_def] at this
change 57343 < 55296 at this
norm_num at this
).attach.map ⟨ fun x => {
val := ⟨x.val⟩
valid := by simpa using x.prop
}, fun x y h => by ext1; cases x; cases y; simp at *; injection h with h; injection h with h⟩
complete c := by
simp [Char.ext_iff]
sorry
#eval (Fintype.card Char) -- 1112064
Eric Wieser (Jun 18 2024 at 01:29):
The proofs are a bit sloppy because the dance between Char
/UInt32
/Nat
/Fin
is a little awkward
Bjørn Kjos-Hanssen (Jun 18 2024 at 01:39):
Thanks @Eric Wieser
It does look complex!
Eric Wieser (Jun 18 2024 at 01:40):
I think it would be a lot cleaner with a LocallyFiniteOrder UInt32
instance
Eric Wieser (Jun 18 2024 at 01:42):
( @Yaël Dillies, is there a one-liner to pull that along an equiv, here from Fin?)
Eric Wieser (Jun 18 2024 at 01:44):
Ah yes, docs#OrderIso.locallyFiniteOrder
Eric Wieser (Jun 18 2024 at 09:55):
Ok, a cleaned up version:
import Mathlib
section shouldBeInCore
theorem UInt32.val_lt_val {a b : UInt32} : a.val < b.val ↔ a < b := Iff.rfl
theorem UInt32.val_le_val {a b : UInt32} : a.val ≤ b.val ↔ a ≤ b := Iff.rfl
@[simp]
theorem UInt32.val_ofNat (n : ℕ) :
(no_index (OfNat.ofNat n) : UInt32).val = OfNat.ofNat n := rfl
theorem UInt32.val_max (a b : UInt32) : (max a b).val = max a.val b.val := by
simp [max_def, instMaxUInt32, maxOfLe, instLEUInt32]
aesop
theorem UInt32.val_min (a b : UInt32) : (min a b).val = min a.val b.val := by
simp [min_def, instMinUInt32, minOfLe, instLEUInt32]
aesop
theorem UInt32.compare_def (a b : UInt32) : compare a b = compare a.val b.val := by
simp only [instOrdUInt32, instOrdFin, compareOfLessAndEq, instOrdNat, instLTUInt32,
Fin.val_injective.eq_iff, UInt32.val_injective.eq_iff]
rfl
theorem UInt32.isValidChar_iff {a : UInt32} :
a.isValidChar ↔ a < 0xd800 ∨ 0xdfff < a ∧ a < 0x110000 := rfl
@[simp]
theorem UInt32.zero_le (x : UInt32) : 0 ≤ x := Nat.zero_le _
theorem Char.ext_iff {a b : Char} : a = b ↔ a.val = b.val :=
⟨fun h => h ▸ rfl, Char.ext⟩
end shouldBeInCore
instance : Sup UInt32 where sup := max
instance : Inf UInt32 where inf := min
instance : LinearOrder UInt32 :=
LinearOrder.liftWithOrd UInt32.val UInt32.val_injective
UInt32.val_max UInt32.val_min UInt32.compare_def
def UInt32.orderIsoFin : UInt32 ≃o Fin (2 ^ 32) where
toFun := UInt32.val
invFun := UInt32.mk
map_rel_iff' := Iff.rfl
left_inv _ := rfl
right_inv _ := rfl
instance : LocallyFiniteOrder UInt32 :=
UInt32.orderIsoFin.locallyFiniteOrder
instance : Fintype Char where
elems := (
(Finset.Ico 0 (0xd800 : UInt32)).disjUnion (Finset.Ioo 0xdfff 0x110000) <| by
simp_rw [Finset.disjoint_left, Finset.mem_Ico, Finset.mem_Ioo, Not, and_imp]
intro a _ ha hb _
have := hb.trans ha
revert this
decide
).attach.map
⟨fun x => .mk x.val <| by simpa [UInt32.isValidChar_iff] using x.prop,
fun x y h => Subtype.ext <| congr_arg Char.val h⟩
complete x := by
have := x.valid
rw [UInt32.isValidChar_iff] at this
simpa [Char.ext_iff]
Eric Wieser (Jun 18 2024 at 09:56):
@Kim Morrison, do you agree with my shouldBeInCore
judgement above?
Kim Morrison (Jun 18 2024 at 10:03):
Yes, seems reasonable. (Although we're hoping to replace UIntX
with a BitVec
based implementation, so ... everything will change. :-)
Kyle Miller (Jun 18 2024 at 14:59):
I didn't expect this to work:
#eval Fintype.card Char
-- 1112064
I guess everything is suitably tail recursive when building a list with 1,112,064 elements!
Eric Wieser (Jun 18 2024 at 15:02):
I was very careful to not invoke quadratic behavior with union
or image
:)
Eric Wieser (Jun 19 2024 at 23:48):
I had a very quick go at upstreaming, but:
- The
max
andmin
lemmas cannot be upstreamed, because core has noMin (Fin _)
instance (it comes from the linear order) - It is unclear which file the
compare
lemmas would go in; in mathlib therfl
lemmas would go immediately after the definition, but this seems not to be the style used in core
Bjørn Kjos-Hanssen (Jun 20 2024 at 00:29):
Maybe it can go in "Mathematical aspects of Unicode"
Eric Wieser (Jun 20 2024 at 00:31):
All the Uint32
stuff isn't really about unicode at all though; that's just "Here's how the thing lean defined is defined"
Kim Morrison (Jun 20 2024 at 01:21):
Eric Wieser said:
- It is unclear which file the
compare
lemmas would go in; in mathlib therfl
lemmas would go immediately after the definition, but this seems not to be the style used in core
This style is in flux, and I'm actually switching towards this, so if you would like to put these with their definitions that will be fine.
François G. Dorais (Jun 20 2024 at 01:58):
@Eric Wieser In the mean time, consider adding the missing bits to Batteries.Data.UInt.
I just added some batteries#853 batteries#854. Note that most of the theorems use toNat
instead of val
to avoid relying on the implementation of UInt types.
Eric Wieser (Jun 20 2024 at 07:49):
Arguably the solution here is to rename val
to toFin
, since I don't think the existence of a canonical map to fin is an implementation detail
Eric Wieser (Jun 20 2024 at 07:50):
(the implementation detail is the fact this happens to be a projection)
François G. Dorais (Jun 20 2024 at 09:16):
Indeed, that would be better. It's a reference implementation anyhow so the fact that it's a projection is not actually significant.
Eric Wieser (Jun 20 2024 at 09:21):
I would imagine the proposed refactor is something like:
structure UInt32 where
toBitvec : Bitvec 32
@[deprecated] def UInt32.val (u : UInt32) : Fin (2 ^ 32) := u.toBitvec.toFin
in which case the val
lemmas would be replaced by toBitvec
lemmas, hopefully with little change in proof.
François G. Dorais (Jun 20 2024 at 09:59):
We'll have to see what actually happens. I think it's fine for a scalar type like this to have multiple views so long as they are backed by efficient code. Of course, only one can be chosen as the reference. Right now, the Batteries.Data.UInt file favors the toNat
/toNat_lt
view, but that doesn't mean that the goal is to ignore the toFin
and toBitvec
views.
Eric Wieser (Jun 20 2024 at 10:13):
I think it's fine for a scalar type like this to have multiple views so long as they are backed by efficient code.
Not every view needs to be efficient for computation; there is value to views that are only efficient for proofs
François G. Dorais (Jun 20 2024 at 10:17):
So long as they are marked noncomputable
so that they aren't used in code.
Last updated: May 02 2025 at 03:31 UTC