I observed that the lean-regex library slowed down by 10-30% for regexes with alternations after upgrading to v4.22.0-rc3. I wonder if the new compiler introduced a regression. I'd appreciate any comments!
According to perf, this SparseSet.insert function seems to spend more time in v4.22.0-rc3. Upon inspecing the generated C code, v4.22.0-rc3 emits more lean_dec in the exclusive branch (one per SparseSet field), which might have caused the regression.
Traces
I collected perf samples for the two versions in Pop!_OS 22.04 (Ubuntu-like distribution). You can see the stack traces below:
Checkout v4.21.0-benchmark and v4.22.0-rc3-benchmark branches from the lean-regex repository. In the regex directory, execute lake exe Bench -e 'def|have|push|wf|nfa' -n 1000 Regex/Backtracker/Basic.lean to see how long each regex execution takes. In my environment, v4.22.0-rc3 takes roughly 20% more time.
Yes, it's very likely that this is due to the new compiler. We're also experiencing some regressions of a similar order of magnitude in the standard library. Thanks for preparing these benchmarks and for investigating! cc @Cameron Zwarich
Thanks for reporting this! From your description that this specifically affects alt patterns and specifically involves ref-counting operations, this seems very likely to be due to a known regression with the new compiler involving the increased use of join points (essentially non-escaping local functions called in tail position from two distinct places).
The new compiler generates more of these, which is generally speaking more optimal, but the ref-counting passes at the back end of the compiler (which did not change) are a bit pessimistic in the face of join points.
Apologies for the temporary regression. I will probably use lean-regex to generate test cases when I fix this issue.
Thanks. I hope the codegen improves in the newer versions!
I tried to isolate the affected component, but it shows only a few percentage of slowdown and has a higher variance. It also depends on a bit long proofs, so it might be only useful as a starting point.
(note: this reproduction is quadratic in -n. I used -n 30000, but you might need to find a good -n that works in your environment)
Reproducer
-- Minimal theory about the bijection between `Fin n`.namespaceRegex.Data.SparseSet.Bijectiondefinj{αβ}(f:α→β):=∀xy,fx=fy→x=ydefsurj{αβ}(f:α→β):=∀y,∃x,fx=ydefbij{αβ}(f:α→β):=injf∧surjftheorem_root_.Fin.eq_of_ge{n}{i:Fin(n+1)}(h:i≥n):i=⟨n,Nat.lt_succ_selfn⟩:=byapplyFin.eq_of_val_eqexactNat.le_antisymm(Nat.le_of_succ_le_succi.isLt)htheorem_root_.Fin.eq_of_not_lt{n}{i:Fin(n+1)}(h:¬i<n):i=⟨n,Nat.lt_succ_selfn⟩:=Fin.eq_of_ge(Nat.ge_of_not_lth)theoremsurj_of_inj{n}(f:Finn→Finn)(h:injf):surjf:=byinductionnwith|zero=>introyhave:y.val<0:=y.isLtcontradiction|succnih=>letn':Fin(n+1):=⟨n,Nat.lt_succ_selfn⟩letf'(x:Finn):Finn:=letx':Fin(n+1):=⟨x.val,Nat.lt_transx.isLt(Nat.lt_succ_selfn)⟩lety:=fx'ifisLt:y.val<nthen⟨y.val,isLt⟩elsehaveeqx:fx'=n':=Fin.eq_of_not_ltisLthaveisLt:fn'<n':=byrefineDecidable.byContradictionfunnlt=>?_haveeqn:fn'=n':=Fin.eq_of_not_ltnlthave:fx'=fn':=byrw[eqx,eqn]have:x'=n':=h__thishave:x.val=n:=byhave:x'.val=n'.val:=byrw[this]exactthisexactNat.lt_irrefl_(this▸x.isLt)⟨f⟨n,Nat.lt_succ_selfn⟩,isLt⟩have:injf':=byintroxyeqletx':Fin(n+1):=⟨x.val,Nat.lt_transx.isLt(Nat.lt_succ_selfn)⟩lety':Fin(n+1):=⟨y.val,Nat.lt_transy.isLt(Nat.lt_succ_selfn)⟩ifhx:fx'<nthenifhy:fy'<nthensimp[f',hx,hy,x',y']ateqhave:=h__(Fin.eq_of_val_eqeq)simpatthisexactFin.eq_of_val_eqthiselsesimp[f',hx,hy,x',y']ateqhave:=h__(Fin.eq_of_val_eqeq)simpatthisexactabsurd(this▸x.isLt)(Nat.lt_irrefl_)elseifhy:fy'<nthensimp[f',hx,hy,x',y']ateqhave:=h__(Fin.eq_of_val_eqeq)simpatthisexactabsurd(this.symm▸y.isLt)(Nat.lt_irrefl_)elsehaveeqx:=Fin.eq_of_not_lthxhaveeqy:=Fin.eq_of_not_lthyhave:fx'=fy':=byrw[eqx,eqy]have:x'=y':=h__thishave:x.val=y.val:=byhave:x'.val=y'.val:=byrw[this]exactthisexactFin.eq_of_val_eqthishavesurj:surjf':=ih_thisintroyhave:y.val≤n:=Nat.le_of_succ_le_succy.isLtifisLt:y.val<nthenlet⟨x,eq⟩:=surj⟨y.val,isLt⟩simp[f']ateqsplitateqcaseisTrue=>simpateqexact⟨⟨x.val,Nat.lt_transx.isLt(Nat.lt_succ_selfn)⟩,Fin.eq_of_val_eqeq⟩caseisFalse=>simpateqexact⟨n',Fin.eq_of_val_eqeq⟩elsehave:=Fin.eq_of_not_ltisLtsimp[this]ifisLt':fn'<nthenlet⟨x,eq⟩:=surj⟨(fn').val,isLt'⟩simp[f']ateqsplitateqcaseisTrue=>simpateqhave:=Fin.val_eq_of_eq(h__(Fin.eq_of_val_eqeq))simpatthisexactabsurd(this▸x.isLt)(Nat.lt_irrefl_)caseisFalsenlt=>have:=Fin.eq_of_not_ltnltexact⟨⟨x.val,Nat.lt_transx.isLt(Nat.lt_succ_selfn)⟩,this⟩elseexact⟨n',Fin.eq_of_not_ltisLt'⟩endRegex.Data.SparseSet.BijectionnamespaceRegex.DatastructureSparseSet(n:Nat)wherecount:Natdense:Vector(Finn)nsparse:Vector(Finn)nsparse_dense:∀i:Finn,i<count→sparse[dense[i.val].val]=ile_count:count≤nnamespaceSparseSetvariable{n:Nat}{s:SparseSetn}{ij:Finn}openBijectiondefempty{n:Nat}:SparseSetn:=letv:=Vector.ofFn(funx:Finn=>⟨0,x.pos⟩)⟨0,v,v,fun__=>bycontradiction,Nat.zero_le_⟩theoremsparse_dense_fin(h:i<s.count):s.sparse[s.dense[i]]=i:=s.sparse_denseih@[inline]defmem(s:SparseSetn)(i:Finn):Bool:=s.sparse[i]<s.count&&s.dense[s.sparse[i]]==iinstance:Membership(Finn)(SparseSetn)wherememsi:=s.memi@[simp]theoremmem_mem:i∈s↔s.memi:=Iff.rfl@[inline]instance:Decidable(i∈s):=matchh:s.memiwith|true=>isTrueh|false=>isFalse(bysimp[h])theoremmem_dense_of_lt(h:i<s.count):s.dense[i]∈s:=bysimp[mem,sparse_dense,h]theoremdense_inj(hi:i<s.count)(hj:j<s.count)(eq:s.dense[i]=s.dense[j]):i=j:=byhave:s.sparse[s.dense[i]]=s.sparse[s.dense[j]]:=byrw[eq]simp[s.sparse_denseihi,s.sparse_densejhj]atthisexactthistheoremdense_surj(h:j∈s):∃i:Finn,i<s.count∧s.dense[i]=j:=bysimp[mem]athexistss.sparse[j]theoremdense_sparse_of_full(h:n≤s.count):s.dense[s.sparse[j]]=j:=byletf(i:Finn):Finn:=s.dense[i]haveinj:injf:=byintroxyeqexactdense_inj(Nat.lt_of_lt_of_lex.isLth)(Nat.lt_of_lt_of_ley.isLth)eqhavesurj:surjf:=surj_of_inj_injhave⟨i,eq⟩:=surjjsimp[f,←eq,s.sparse_densei(Nat.lt_of_lt_of_lei.isLth)]theoremlt_of_mem(i:Finn)(h:¬i∈s):s.count<n:=bysimp[SparseSet.mem]athrefineDecidable.byContradictionfunnlt=>?_havege:=Nat.le_of_not_ltnltapplyh(Nat.lt_of_lt_of_les.sparse[i].isLtge)exactdense_sparse_of_fullgedefinsert(s:SparseSetn)(i:Finn):SparseSetn:=ifmem:i∈sthenselselet⟨count,dense,sparse,sparse_dense,_⟩:=shaveisLt:count<n:=lt_of_memimemletdense':=dense.setcountiletsparse':=sparse.seti⟨count,isLt⟩havesparse_dense'(j:Finn)(h:j<count+1):sparse'[dense'[j]]=j:=byhave:j≤count:=Nat.le_of_succ_le_succhcasesNat.eq_or_lt_of_lethiswith|inleq=>simp[dense',sparse',eq,Vector.getElem_set_self]exactFin.eq_of_val_eqeq.symm|inrlt=>have:dense'[j]=dense[j]:=bysimp[dense']rw[Vector.getElem_set_neisLt(byomega)(byomega)]simp[sparse',this]rw[Vector.getElem_set]splitcaseisTrueeq=>simp[SparseSet.mem]atmemhave:sparse[i.val]=j:=bysimp[eq,sparse_densejlt]simp[this,lt]atmemexactabsurd(Fin.eq_of_val_eqeq.symm)memcaseisFalse=>exactsparse_densejlt⟨count+1,dense',sparse',sparse_dense',isLt⟩endRegex.Data.SparseSetopenRegex.Data(SparseSet)@[noinline]definsertMany(n:Nat)(s:SparseSetn):IO(SparseSetn):=doletmuts:=sforh:iin[:n]dos:=s.insert⟨i,bysimp[Membership.mem]ath;omega⟩returnsdefbenchmark(iterations:Nat):IOUnit:=doletmuttotalTime:=0fornin[:iterations]doletstartTime←IO.monoNanosNowlet_←insertManyn.emptyletendTime←IO.monoNanosNowtotalTime:=totalTime+(endTime-startTime)lettotalTimeMs:=totalTime.toFloat/1_000_000IO.printlns!"Ran {iterations} iterations in {totalTimeMs}ms"IO.printlns!"Average time: {totalTimeMs / iterations.toFloat}ms per iteration"defparseArgs(args:ListString):ExceptStringNat:=domatchargswith|"-n"::n::_=>doletn←n.toNat?.getDM(throw"Iterations must be a number")ifn<0thenthrow"Iterations must be a positive number"elsereturnn|[]=>return1000|_::args=>parseArgsargsdefmain(args:ListString):IOUInt32:=domatchparseArgsargswith|.okn=>benchmarknreturn0|.errorerr=>IO.eprintlnerrreturn1