Zulip Chat Archive

Stream: CSLib

Topic: Query model for Algorithms complexity : updates

Shreyas Srinivas (Jan 22 2026 at 23:22):

I thought it might be easier to use a single thread to post updates about our query model for systematically formalising algorithmic complexity in a lightweight, yet very structured way.

Currently it is WIP in PR cslib#275 . This is essentially a continuation of cslib#201 in a fresh PR for github permissions reasons

It already supports creating custom query types whose costs depend on input size of each query.
I just added support for custom cost structures. There are some examples in the QueryModel.lean file.
Currently, please ignore the Mergesort.lean file

Shreyas Srinivas (Jan 22 2026 at 23:23):

What remains to be done :

Implement a notion of reduction.
Implement some non-trivial examples such as sorting algorithms

Shreyas Srinivas (Jan 22 2026 at 23:24):

CC : @Tanner Duve started the basic idea by implementing mergesort in a specific query model.

Shreyas Srinivas (Jan 22 2026 at 23:26):

One limitation, that I expect any shallow DSL to have is the ability to sneak in pure computations cost free. But it improves on the TimeM model in several ways:

No explicit ticks needed, so the chances of making mistakes like forgetting to count specific calls is removed.
The operations which are counted are made explicit as an inductive type and a model of this type.
The model of a query type accepts custom cost structures so long as you can define addition and a 0 operation on them (basically monoids).

Shreyas Srinivas (Jan 22 2026 at 23:57):

Next : I am getting a very strange error on running CI. The editor shows no error, only CI. Not evenlake build. See the error message here :

https://github.com/leanprover/cslib/actions/runs/21269168918/job/61215304680?pr=275

 error: Cslib/Algorithms/QueryModel.lean:284:0: Could not find native implementation of external declaration 'Cslib.FreeM.lift._redArg' (symbols 'lp_cslib_Cslib_FreeM_lift___redArg___boxed' or 'lp_cslib_Cslib_FreeM_lift___redArg').
  For declarations from `Init`, `Std`, or `Lean`, you need to set `supportInterpreter := true` in the relevant `lean_exe` statement in your `lakefile.lean`.
  error: Cslib/Algorithms/QueryModel.lean:285:0: Could not find native implementation of external declaration 'Cslib.FreeM.lift._redArg' (symbols 'lp_cslib_Cslib_FreeM_lift___redArg___boxed' or 'lp_cslib_Cslib_FreeM_lift___redArg').
  For declarations from `Init`, `Std`, or `Lean`, you need to set `supportInterpreter := true` in the relevant `lean_exe` statement in your `lakefile.lean`.
  error: Cslib/Algorithms/QueryModel.lean:286:0: Could not find native implementation of external declaration 'Cslib.FreeM.lift._redArg' (symbols 'lp_cslib_Cslib_FreeM_lift___redArg___boxed' or 'lp_cslib_Cslib_FreeM_lift___redArg').
  For declarations from `Init`, `Std`, or `Lean`, you need to set `supportInterpreter := true` in the relevant `lean_exe` statement in your `lakefile.lean`

Shreyas Srinivas (Jan 23 2026 at 00:15):

Never mind. Removing #evals fixed it

Eric Wieser (Jan 23 2026 at 00:17):

I left a few comments; I think you should be making much more use of FreeM.liftM, which is ultimately the eval function you are defining there

Shreyas Srinivas (Jan 23 2026 at 00:18):

Copied from DM : Likewise the evalQuery is defined that way because I expect that there will be proofs like evalQuery Model_1 and reduce Model_1 Model_2 gives an equivalent evaluation in model 2

Shreyas Srinivas (Jan 23 2026 at 00:18):

and the proof will be an induction on Prog

Shreyas Srinivas (Jan 23 2026 at 00:47):

Anyway, my point is if TimeM is just CostM, then there is no reason to not just call it a Model

Shreyas Srinivas (Jan 23 2026 at 00:47):

If TimeM is just WriterT id (Multiplicative Nat), then there is no point in using it either

Shreyas Srinivas (Jan 23 2026 at 00:47):

I changed Model into a structure btw

Shreyas Srinivas (Jan 23 2026 at 00:49):

Sorry these messages were in drafts before my DMs with Eric. But basically TimeM is an entirely extraneous definition, It is just WriterT Id (Multiplicative Nat) and we can extend it to Write Id Cost for a suitable cost type

Shreyas Srinivas (Jan 23 2026 at 00:59):

EDIT : Eric says the default monoid instance on Nat is multiplication and Multiplicative Nat makes the multiplication operation behave as addition.

Eric Wieser (Jan 23 2026 at 01:05):

I think WriterT Id (Multiplicative Nat) probably ends up being pretty unergonomic; I'd perhaps suggest we should have a separate CostT which is a to_additiveized version of WriterT.

Eric Wieser (Jan 23 2026 at 01:06):

(or we should change WriterT to accumulate with + instead of *, since so far I think this is the only use of WriterT, and the symbol is inconvenient)

Shreyas Srinivas (Jan 23 2026 at 01:09):

It's valid, but if TimeM is redundant, I don't see the value of introducing two layers of monads for running Prog monad

Eric Wieser (Jan 23 2026 at 01:10):

If they're abbrevs then they don't really count as layers

Shreyas Srinivas (Jan 23 2026 at 01:10):

Also I vote for WriterT using AddMonoid by default. This is the natural interpretation of writer monads in haskell. Consider a list of log strings. One doesn't construct a new list of tupled log strings in the writer monad. One appends new entries into the log (that's addition operation for lists)

Eric Wieser (Jan 23 2026 at 01:11):

But Strings aren't AddMonoids or Monoids anyway, so that's not a great argument

Shreyas Srinivas (Jan 23 2026 at 01:11):

Logs are lists of strings. not strings

Shreyas Srinivas (Jan 23 2026 at 01:11):

and strings are monoids.

Eric Wieser (Jan 23 2026 at 01:11):

Shreyas Srinivas said:

and strings are monoids.

but

Eric Wieser said:

Strings aren't AddMonoids or Monoids

according to Mathlib

Does haskell allow "foo" + "bar"? Remember, Lean doesn't

Eric Wieser (Jan 23 2026 at 01:12):

Shreyas Srinivas said:

Also I vote for WriterT using AddMonoid by default.

This is a topic for #mathlib4

Eric Wieser (Jan 23 2026 at 01:13):

Let's make a new thread with a summary, rather than dumping the entire argument in mathlib4

Shreyas Srinivas (Jan 23 2026 at 01:14):

Okay but I want to leave this thread for Prog. The writer monad's monoid issue is independent

Eric Wieser (Jan 23 2026 at 01:14):

Ah apologies, I missed that you didn't move it to mathlib4

Shreyas Srinivas (Jan 23 2026 at 01:14):

I can't

Shreyas Srinivas (Jan 23 2026 at 01:14):

No permissions

Shreyas Srinivas (Jan 23 2026 at 01:17):

Anyway, to return to Prog. If TimeM is redundant, I don't see any point in interpreting Query models and Prog to TimeM either. QueryModel.lean can just use WriterT. I am not too comfortable unfolding these monad transformers and do_lifts. Induction proofs are simpler. which is why I have written the current definitions. Once I am comfortable with the reduction business between models, I'll see how far I can use liftM

Shreyas Srinivas (Jan 23 2026 at 01:23):

As it exists currently (with only Nat time complexity), it makes no sense to lift Prog to TimeM. Prog in this PR handles more general cost structures. Ideally that should be a separate PR. (see how I use two different cost structures for the same program here : https://github.com/leanprover/cslib/pull/275/changes#diff-c53c80c7c59623eecc9954c581f434662ffdfcf9278f48f637934332728fe29fR270)

Shreyas Srinivas (Jan 23 2026 at 01:52):

Another update : I set up a monadic version of linear search and ran into a forIn proof. mvcgen won't work on this monad yet (cc : @Sebastian Graf could we get it to?). I'd be grateful for any help in teaching me how to handle these kinds of proofs. The theorem statements are at the end. of the file QueryModel.lean. Examples help

Eric Wieser (Jan 23 2026 at 02:15):

Shreyas Srinivas said:

One limitation, that I expect any shallow DSL to have is the ability to sneak in pure computations cost free.

Right, you have to be very careful to expose no information exception the ones captured by the query; so for instance, for a sorting algorithm you must not provide the algorithm with a ~~decidable~~ comparison operator, so it has no choice but to act through the queries

Shreyas Srinivas (Jan 23 2026 at 02:53):

Eric Wieser said:

Shreyas Srinivas said:

One limitation, that I expect any shallow DSL to have is the ability to sneak in pure computations cost free.

Right, you have to be very careful to expose no information exception the ones captured by the query; so for instance, for a sorting algorithm you must not provide the algorithm with a ~~decidable~~ comparison operator, so it has no choice but to act through the queries

We only need to add typeclass instances that could potentially provide the comparison operator in the model. In the program itself, we don’t need to and we shouldn’t provide any instances. Then the only way to get comparisons will be through queries.

Shreyas Srinivas (Jan 23 2026 at 02:54):

But this is not a fool proof method if someone decides to write an algorithm for a specific type like Nat.

Sebastian Graf (Jan 23 2026 at 09:00):

Shreyas Srinivas said:

Another update : I set up a monadic version of linear search and ran into a forIn proof. mvcgen won't work on this monad yet (cc : Sebastian Graf could we get it to?). I'd be grateful for any help in teaching me how to handle these kinds of proofs. The theorem statements are at the end. of the file QueryModel.lean. Examples help

In order to use mvgen, you need to define a WP instance for your use of Prog. I would try and focus on a particular instantiation first before doing the parameterized free monad (if that is even possible). You can read up in the reference manual how to define such as instance. The logging example is instructive; it's basically Writer String and thus more or less the same as TimeM.

I hypothesize that you need two kinds of lemmas: One for .eval (e.g., (read c i).evaldoes something to c) and one for .cost (e.g., (read c i).cost increases cost by x). Do that for all the operations, define an adequacy lemma and you should be able to use mvcgen to prove

lemma linearSearch_correct_true [DecidableEq α] (v : Vector α n) :
∀ x : α, x ∈ v → (linearSearch v x).eval VecSearch_Nat = true := sorry

lemma linearSearch_time_complexity [DecidableEq α] (v : Vector α n) :
  ∀ x : α, (linearSearch v x).time VecSearch_Nat ≤ n := sorry

Shreyas Srinivas (Jan 23 2026 at 13:00):

I just tried to write a tail recursive version of linear search to avoid using for loops. But ofc we are still programming in a monad. I need to learn how to fill the kind of goals I have in the the lines with done

Shreyas Srinivas (Jan 24 2026 at 11:22):

Another update :

I was thinking about the issue that any lightweight monadic DSL must necessarily suffer: sneaking in computations inside pure. Circumventing it with another layer of translation from queries which need the model to acess important data structures makes things too complex. It will also make it impossible to specify invariants in-situ with the mvcgen framework later, because the program spec cannot depend on the model which might contain the relevant data structures.

So I added a PureCosts typeclass, which any cost type must have an instance of. In principle this allows us to track (maybe, depending on the instance, even log) each pure operation. As a reviewer that's a good sanity check to have. See lines 222 and 401 for examples

Shreyas Srinivas (Jan 24 2026 at 15:40):

Another design thought : compiling Prog to Time allows the following : the compiled version of Prog can be composed with plain TimeM programs. So an AI that is tasked with discovering an efficient algorithm can use incorrectly marked purely TimeM sub procedures to save on time complexity. Checking for this involves extra steps. Without TimeM we only need to check for sneaky pure operations, which, as I show in the PR, can be logged or counted

Eric Wieser (Jan 24 2026 at 15:53):

I don't understand what you think you are protecting against there, if you are required to produce a term or type Prog then you can't cheat by embedding a TimeM, because the conversion only goes in the other direction.

Shreyas Srinivas (Jan 24 2026 at 16:21):

Okay fair. I guess we only need to check the top level theorem statement and Prog procedure

Shreyas Srinivas (Jan 25 2026 at 02:29):

Uploaded a linear search example on lists. The proofs were much simpler.

Shreyas Srinivas (Jan 26 2026 at 01:39):

Another update, but in another thread this time, we can elegantly express circuits and prove stuff about circuit complexity within this framework. I am still working out how to hide the IDs from prog spec and have them generated for every monadic call to a new circuit combinator, but this is already looking good. See more : #CSLib > Parity is not in AC0 @ 💬

Shreyas Srinivas (Jan 26 2026 at 01:45):

I think at this point I am going to try writing WP instances, and see if I can't knock out the sorries in some of the monadic search procedures.

Shreyas Srinivas (Jan 26 2026 at 01:48):

Another thing I would like to do is to add attributes which tag an algorithm (Prog) and its correctness and complexity theorems with an algorithm_ID and labels like correctness and complexity for the theorems, and then write some metaprogs to query all relevant declarations of an by the algorithm_id

Shreyas Srinivas (Jan 28 2026 at 15:27):

Another update to the query model : By way of reduction from another query model called CircuitQuery, I have eliminated the need to manually annotate circuit IDs. Cross-posted for relevance from here : #CSLib > Parity is not in AC0 @ 💬

Shreyas Srinivas (Jan 28 2026 at 15:31):

If someone is up for it, I would like to recommend creating a Query model for Turing machines. The input type of each query is the alphabet of the TM (both tape and input) and whose output is a TMConfig.

Shreyas Srinivas (Jan 28 2026 at 15:32):

the resultant Prog acts as the transition function of a Turing machine, so eval query must be implemented accordingly. Each query trivially costs (1 : Nat).

Jakub Nowak (Jan 28 2026 at 15:40):

I don't understand what would the query cost represent with approach you described? Wouldn't it just be some measure of the size of the outputed Turing Machine? I though, that we want to somehow measure the time complexity of the resulting Turing Machine?

Shreyas Srinivas (Jan 28 2026 at 15:40):

cost is a generic word for complexity

Shreyas Srinivas (Jan 28 2026 at 15:41):

In the examples, you can see how I have multiple ways of accounting costs, sometimes for the same model. In circuits, I count the size and depth. In some examples, I only count a subset of the queries. In some examples, I count some queries separately. I can also optionally keep track of uses of pure operations, trivially by counting them (but more advanced options are possible).

Jakub Nowak (Jan 28 2026 at 15:42):

I understand that. But the notion of complexity defined by the cost you just described doesn't seem particulary interesting to be honest. At least it is not related in any way to standard notions of time or size complexity for turing machines (or do you claim it is?).

Shreyas Srinivas (Jan 28 2026 at 15:42):

I don't see how it doesn't?

Shreyas Srinivas (Jan 28 2026 at 15:43):

You could count

The number of calls to queries that read or write

Shreyas Srinivas (Jan 28 2026 at 15:43):

That's time complexity

Shreyas Srinivas (Jan 28 2026 at 15:43):

The number of empty cells you write to.

That's space complexity

Jakub Nowak (Jan 28 2026 at 15:43):

Ah, sorry, than I misunderstood the approach you described.

Shreyas Srinivas (Jan 28 2026 at 15:44):

When writing a Model you describe the costs of queries.

Jakub Nowak (Jan 28 2026 at 15:44):

Although, in that case, I don't know how to implement it I think.

Shreyas Srinivas (Jan 28 2026 at 15:44):

The monad extends that to the entire program

Jakub Nowak (Jan 28 2026 at 15:46):

It is not clear to me, how to you want to "compile" monad to TMConfig.

Shreyas Srinivas (Jan 28 2026 at 15:46):

Model.eval

Shreyas Srinivas (Jan 28 2026 at 15:47):

I recommend checking out the examples

Shreyas Srinivas (Jan 28 2026 at 15:47):

I think I completed the linearSearch example on lists

Bolton Bailey (Jan 28 2026 at 16:39):

I am also not sure what is being asked here, but I would like to see TMs and query model based models of computation linked if possible. I have a PR that defines a kind of TM at cslib#269, but I treat the tape alphabet as being slightly different from the input alphabet. Is it possible to integrate this with the query model you are describing?

Shreyas Srinivas (Jan 28 2026 at 16:40):

Yes definitely.

Bolton Bailey (Jan 28 2026 at 16:41):

In terms of the decls you define in cslib#275 and the decls I define in cslib#269, what is the type of the term you are looking to see defined?

Shreyas Srinivas (Jan 28 2026 at 16:41):

I’ll have to do some trial and error on that. If 269 gets merged I will work on that

Shreyas Srinivas (Jan 28 2026 at 16:43):

I think the first parameter to the query type would be the TM itself. The index type will be the configuration type of the TM

Shreyas Srinivas (Jan 28 2026 at 16:44):

Each query will produce a configuration of the TM

Shreyas Srinivas (Jan 28 2026 at 16:45):

evalQuery will be roughly identical to step

Shreyas Srinivas (Jan 28 2026 at 16:46):

there will be queries for initialisation, stepping, and termination.

Shreyas Srinivas (Jan 28 2026 at 16:46):

For non deterministic TMs, we use Classical.choice to pick the next step, the next state set is Nonempty.

Shreyas Srinivas (Feb 04 2026 at 16:04):

I will do my best to wrap this PR up asap. Only the merge sort example remains. I will however be tied with work until the middle of next week. I already have a linear search and a sorted insertion example in there. I will move the circuit examples to a separate PR with more results for later.

In the meantime @Chris Henson mentioned that he would prefer to have one model for lightweight algorithms verification. Since this model subsumes TimeM and TimeM is already just WriterT (Multiplicative Nat), should I delete TimeM within this PR?

Whatever this discussion throws up, I will implement it next week.
CC @Fabrizio Montesi.

Last updated: Feb 28 2026 at 14:05 UTC