Zulip Chat Archive

Stream: mathlib4

Topic: RePred


Martin Dvořák (Jul 11 2023 at 15:20):

We have docs4#RePred that characterizes recursively enumerable sets using partial recursive functions.
Do we have a characterization of recursively enumerable sets using Turing machines as well?

Martin Dvořák (Jul 12 2023 at 07:10):

We have Turing machines that compute stuff in the sense that the machine changes symbols on the tape (or stacks).
Do we also have the notion of a Turing machine that can accept or reject the input?

James Gallicchio (Jul 12 2023 at 16:54):

I don’t think so. Encoding data to TM tape is a bit more complicated than encoding to Nat, and I’m not even sure we have a notion of this encoding.

James Gallicchio (Jul 12 2023 at 16:57):

I was working for a while on a register machine formalization that could be used as an intermediary to transport results with PRFs to results with TMs. So I would be interested in helping build a notion of TM tape encodings and computability

James Gallicchio (Jul 12 2023 at 17:00):

Oh, we do have tm encodings: https://leanprover-community.github.io/mathlib_docs/computability/encoding.html

James Gallicchio (Jul 12 2023 at 17:02):

Is there a reason you want to characterize RE sets in terms of TMs? I think PRFs are almost always nicer to work with, no?

Martin Dvořák (Jul 12 2023 at 17:05):

I think it will be easier to connect Grammars to Turing machines than to connect Grammars to PRFs.
https://github.com/madvorak/chomsky

Martin Dvořák (Jul 12 2023 at 17:06):

Ultimately, I believe we will want to have all three versions of computability theory (TMs, PRFs, grammars) and possibly more.

James Gallicchio (Jul 12 2023 at 17:07):

Totally. When I get home I’ll push my RM stuff to a repository in case anyone wants to pick it up again.

Martin Dvořák (Jul 12 2023 at 17:08):

Martin Dvořák said:

Do we also have the notion of a Turing machine that can accept or reject the input?

We don't have this, do we?

Martin Dvořák (Jul 12 2023 at 17:20):

Please, prove me wrong; it seems that docs#Language is connected to neither PRFs nor TMs.

Mario Carneiro (Jul 13 2023 at 05:14):

Martin Dvořák said:

Martin Dvořák said:

Do we also have the notion of a Turing machine that can accept or reject the input?

We don't have this, do we?

well you can just leave a 0 or 1 on the tape, or depending on the situation you can also use halting/non-halting to signal acceptance

James Gallicchio (Jul 13 2023 at 05:57):

Maybe it's worth adding definitions for deciders & semideciders

James Gallicchio (Jul 13 2023 at 05:58):

where a decider always halts and we interpret 0/1 as reject/accept

Martin Dvořák (Jul 13 2023 at 07:03):

We should definitely add semideciders. I hope the API will be in terms of a docs#Language recognized by a Turing machine.

James Gallicchio (Jul 13 2023 at 15:25):

I'm not thrilled by the idea of fixing it to TMs when they are so much harder to work with formally... but having TM-specific definitions of "turing machine decider" and "TM semidecider" seems reasonable

Martin Dvořák (Jul 13 2023 at 21:04):

James Gallicchio said:

I'm not thrilled by the idea of fixing it to TMs

Fixing what?

I would like docs#Language to be the central notion about which we speak in computability theory (and later also in complexity theory). Turing machines would be just one of the ways to characterize languages.

James Gallicchio (Jul 14 2023 at 15:19):

Sorry, I meant fixing (the definition of deciding/semideciding a language) to TMs

James Gallicchio (Jul 14 2023 at 15:19):

Or I guess I mean it should be in the turing machine namespace if it is specialized to TMs

James Gallicchio (Jul 14 2023 at 15:21):

It seems reasonable though to specialize the notion of deciding/semideciding a language to each computational model, since they each have differences in terms of input encoding/output decoding

James Gallicchio (Jul 14 2023 at 15:22):

and then eventually we will have constructions for turning a TM decider on L to a PRF decider on L

Martin Dvořák (Jul 14 2023 at 16:42):

It sounds like we have conflated two decisions (both of which are important):
(1) What is the object we want to characterize?
(2) What is the computational model we want to use for the characterization?

Possibilities for (1) include:
(a) α → Prop where α is docs#Primcodable
(b) List α → Prop where α is docs#Fintype
‣ perhaps expressed as Language α
(c) ℕ → Prop
‣ perhaps expressed as Set ℕ

Possibilities for (2) include:
(a) Partial recursive functions
(b) Turing machine variants
(c) RAM-like programs

We will eventually have several versions of (1) and (2).
However, we should decide what is the version we want to see in top-level theorems.
The existence of docs#RePred suggests (1a) and (2a).
There are not many theorems about docs#RePred currently, so we can still change direction if we want to.

In question (1), I have a strong preference towards (1b).
In question (2), I don't have any significant preference.

Please, let's talk about both (1) and (2) without conflating the issues.

James Gallicchio (Jul 14 2023 at 17:31):

Thank you for splitting out the issues. I'm not convinced Language A is the right choice for anything other than TM as the computational model, right?

Do we have to select a version for "top-level theorems?" Do we want to generalize across computational models yet?

Martin Dvořák (Jul 14 2023 at 21:41):

James Gallicchio said:

I'm not convinced Language A is the right choice for anything other than TM as the computational model, right?

I think that Language α is perfect for TMs, RAMs, and grammars alike!

James Gallicchio (Jul 14 2023 at 22:25):

But for PRFs you have to encode Nat as List Bool or something

Martin Dvořák (Jul 17 2023 at 07:57):

Yes. Each version comes with a price.

However, there is an important advantage of Language over other (1). In addition to the above, you can use Language to characterize models of computation that are not Turing complete, including those that are too weak to perform necessary encoding. If we define "recursively enumerable" in terms of Language over finite alphabet, we can make meaningful theorems like "every regular language is recursively enumerable" (in fact, there would be several intermediate inclusions).

James Gallicchio (Jul 17 2023 at 15:47):

It definitely makes that one definition nicer. But I'm wondering if there's a generalization of computational models that would let us use any of these definitions to characterize RE sets.

Like, say you have an arbitrary computational model over some state type σ, and you have a set s : S -> Prop where S is computably encodable in σ (computable encoding can be generalized). Then s is RE in terms of this model iff there exists a program that decides s.

And then maybe we pick one model as the "base" computational model and expect these computational models to be proven equivalent to that base.

That way we can characterize Languages as RE, or subsets of Nat, or whatever else we want to characterize, without having to go through a single specific definition.

I don't think we have fleshed out the computational models enough at this point to make such a generalization, but I think it would be reasonable to make a RE definition in terms of Language and TM, without presenting it as "the" definition of recursive enumerability in mathlib.

Martin Dvořák (Jul 17 2023 at 17:30):

I don't think we need to make it an exercise on abstract programming. Note that the relationship between various (1) is independent of which (2) you use.

Let me illustrate my idea on a small example with some pseudocode. For simplicity, consider now only (1b) and (1c) with (2a) and (2b).

We make two model-independent conversion functions for (1):
SetN_of_Lang : Language Bool → Set ℕ
Lang_of_SetN : Set ℕ → Language Bool
We establish they are inverse of each other.

We define these two notions manually:
TmReLang : Language Bool → Prop using Turing machines
PrfReSetN : Set ℕ → Prop using Partial recursive functions

We get these two notions for free:
TmReSetN : Set ℕ → Prop := TmReLang ∘ Lang_of_SetN
PrfReLang : Language Bool → Prop := PrfReSetN ∘ SetN_of_Lang

Now a hard work comes:
∀ L, TmReLang L ↔ PrfReLang L

Then we trivially prove:
∀ S, PrfReSetN S ↔ TmReSetN S

Martin Dvořák (Jul 17 2023 at 17:52):

The advantage becomes important when broadening our library. In order to have N versions of (1) and M versions of (2), we have to manually:

  • write N conversion functions
  • write M definitions of RE
  • prove M implications about a computational model simulating a different computational model

Having done that, we can easily have:

  • all M*N definitions of RE
  • all (M choose 2) theorems about equivalence of computational models (à la Church-Turing thesis)

Martin Dvořák (Jul 17 2023 at 17:58):

PS: We don't have to flood Mathlib with all M*N definitions of RE. We can settle for those M definitions, like:
TmReLang : Language Bool → Prop using Turing machines
PrfReSetN : Set ℕ → Prop using Partial recursive functions

Then the theorems will be like:
∀ L, PrfReSetN (SetN_of_Lang L) ↔ TmReLang L

James Gallicchio (Jul 17 2023 at 18:06):

I agree completely. It seems like an apt place to have a typeclass to manage the conversions to whatever base computational model we choose. For now I'd support adding TmReLang without necessarily worrying about all the conversions.

Martin Dvořák (Jul 17 2023 at 18:08):

Unfortunately, I don't understand typeclasses enough. Nevertheless, adding TmReLang is the logical step to do now.

Mario Carneiro (Jul 17 2023 at 18:09):

I think that as elsewhere in mathlib we should be using one definition and N equivalences, not N definitions and N^2 equivalences

Martin Dvořák (Jul 17 2023 at 18:14):

Would you agree with the one definition to be in terms of Language Bool using PRFs?

James Gallicchio (Jul 17 2023 at 18:14):

I really think it's easiest for it to be Set Nat with PRFs...

James Gallicchio (Jul 17 2023 at 18:15):

but I haven't tried it, so am conjecturing

James Gallicchio (Jul 17 2023 at 18:15):

just that the Set Nat definition would be a single line in today-mathlib

Martin Dvořák (Jul 17 2023 at 18:16):

Martin Dvořák said:

It sounds like we have conflated two decisions (both of which are important):
(1) What is the object we want to characterize?
(2) What is the computational model we want to use for the characterization?

Possibilities for (1) include:
(a) α → Prop where α is docs#Primcodable
(b) List α → Prop where α is docs#Fintype
‣ perhaps expressed as Language α
(c) ℕ → Prop
‣ perhaps expressed as Set ℕ

Possibilities for (2) include:
(a) Partial recursive functions
(b) Turing machine variants
(c) RAM-like programs

We will eventually have several versions of (1) and (2).
However, we should decide what is the version we want to see in top-level theorems.
The existence of docs#RePred suggests (1a) and (2a).
There are not many theorems about docs#RePred currently, so we can still change direction if we want to.

In question (1), I have a strong preference towards (1b).
In question (2), I don't have any significant preference.

Please, let's talk about both (1) and (2) without conflating the issues.

We should return here. We want one definition. Which of them will be the one?

James Gallicchio (Jul 17 2023 at 18:16):

The reason I don't want to get into this yet is cuz we haven't proven the equivalence of TMs and PRFs yet, so if you're gonna be working in TMs there's no reason for you to introduce a definition in terms of PRFs.

Martin Dvořák (Jul 17 2023 at 18:18):

I'll just repeat that I have strong interest in being able to state theorems like "every regular language is recursively enumerable" smoothly (i.e., the theorem statement is as simple as possible (having conversion in a statement is usually not nice), even if it makes the proofs longer).

James Gallicchio (Jul 17 2023 at 18:22):

RE is a property of sets. Language A = Set (List A). So long as the elements are encodable to Nat (which List A is for any encodable A, which includes any FinType A). So with a very slight generalization of the Set Nat definition it would be a very smooth theorem statement.

James Gallicchio (Jul 17 2023 at 18:22):

(the generalization being to make it over Set A for any primcodable A)

James Gallicchio (Jul 17 2023 at 18:23):

so I guess 1(a)/2(a)

Martin Dvořák (Jul 17 2023 at 18:28):

It seems that you two (James and Mario) prefer to keep docs#RePred as the definition of RE. Then the theorem would be something like the following:

 p, ( M : TM, M.semidecides (Lang_of_Pred p))  RePred p

Is it what you want? For the sake of not introducing new definitions?

James Gallicchio (Jul 17 2023 at 18:32):

or just ∀ L, (∃ M : TM, M.semidecides L) ↔ RePred L since languages are already sets of primcodable objects.

Martin Dvořák (Jul 17 2023 at 18:32):

(possibly left-hand side Skolemized)

James Gallicchio (Jul 17 2023 at 18:33):

Now, I have no clue how we're going to prove that theorem any time soon :joy: so we'll have to live with a sorry for now

James Gallicchio (Jul 17 2023 at 18:34):

the forward direction is at least probably provable with sufficient effort right now

Martin Dvořák (Jul 17 2023 at 18:48):

If we don't see ourselves proving equivalence between two concepts anytime soon, shouldn't we rather write two definitions?

Martin Dvořák (Jul 17 2023 at 19:17):

James Gallicchio said:

For now I'd support adding TmReLang without necessarily worrying about all the conversions.

If @Mario Carneiro doesn't want to see two equivalent definitions in Mathlib, we should perhaps forget about TmReLang now and define TM.semidecides only.

James Gallicchio (Jul 17 2023 at 19:28):

Yeah, I think \exists M, M.semidecides L is already quite clean

Martin Dvořák (Jul 17 2023 at 21:04):

You mean for ∃ M, M.semidecides L to appear in a theorem, or in a definition?

James Gallicchio (Jul 17 2023 at 21:35):

The former. But that's up to you.

Martin Dvořák (Jul 17 2023 at 22:01):

If it is really up to me, I'd rather have a definition TmReLang wrapping it. However, I don't think @Mario Carneiro would approve.

Mario Carneiro (Jul 17 2023 at 22:02):

I think in most cases you can have more precise theorems that trivially imply the desired equivalences

Martin Dvořák (Jul 17 2023 at 22:05):

You mean, imply in our head (an interpretation of what it means)?

Martin Dvořák (Jul 18 2023 at 11:04):

Martin Dvořák said:

James Gallicchio said:

For now I'd support adding TmReLang without necessarily worrying about all the conversions.

define TM.semidecides

In textbooks, there is often a definition of a TM that can accept, reject, or go on forever. The language of the TM is the set of words for which it accepts.

I suppose we don't want it because it [having three behaviours] is harder for formalization. The desired definition TM.semidecides for Mathlib is that it either halts or goes on forever. The language of the TM is the set of words for which it halts. Do we all agree on this approach?

Martin Dvořák (Jul 19 2023 at 09:28):

On the other hand, the three-way definition of semideciders has a big advantage:

In this framework, a decider is (a special case of) a semidecider. You don't have to convert a decider to a semidecider by making it loop instead of rejecting a word.

James Gallicchio (Jul 19 2023 at 15:59):

Yep. I think you want a definition with three outcomes.

James Gallicchio (Jul 19 2023 at 16:00):

I think it's also probably nice to have both the "yes-or-halt" and "no-or-halt" options accessible, just because people use both pretty interchangeably

Martin Dvořák (Jul 19 2023 at 16:46):

What do you mean by "yes-or-halt" and "no-or-halt" please?

James Gallicchio (Jul 19 2023 at 16:47):

err, yes-or-doesn't-halt :joy: words are hard

Martin Dvořák (Jul 21 2023 at 11:03):

Should I implement TM.semidecides as follows?

When the machine halts (i.e., when step returns none), if the tape symbol under the head is default Γ then the word [what was written on the tape before the computation started] gets accepted (regardless of Cfg.q a.k.a. the state of the head). We will have a machine that can accept, reject, or go on forever.


Last updated: Dec 20 2023 at 11:08 UTC