Zulip Chat Archive

Stream: Is there code for X?

Topic: computable breadth first traversal

Paige Thomas (May 07 2024 at 04:13):

Is there code for a computable breadth first traversal of a directed graph, ie for computing the epsilon closure of a non deterministic finite state automaton?

Johan Commelin (May 07 2024 at 04:43):

You mean for mathlib graphs? I'm not aware of this

Paige Thomas (May 07 2024 at 04:51):

Ok. Thank you.

Shreyas Srinivas (May 07 2024 at 05:30):

The mathlib simple graph is not defined in terms of any computable graph representation

Shreyas Srinivas (May 07 2024 at 05:33):

The translation to and from an adjacency matrix is also defined using Props

Shreyas Srinivas (May 07 2024 at 05:39):

Edit: the closest i could find after a more recent search now was this: https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/EpsilonNFA.html#%CE%B5NFA.stepSet

Shreyas Srinivas (May 07 2024 at 05:39):

(deleted)

Shreyas Srinivas (May 07 2024 at 05:40):

The step function is defined as a set

Paige Thomas (May 07 2024 at 05:45):

Just what I am looking for. Thank you!

Paige Thomas (May 07 2024 at 06:03):

Am I right in that that is not computable because εNFA.εClosure is not computable? Or is εClosure computable?

Shreyas Srinivas (May 07 2024 at 06:15):

It is an inductive Prop

Shreyas Srinivas (May 07 2024 at 06:15):

Not computable in any meaningful sense

Paige Thomas (May 07 2024 at 06:18):

I see. I guess that means the NFA computed from the εNFA is not computable either. Which means that a regular expression translated to a εNFA is also not going to be computable.

Shreyas Srinivas (May 07 2024 at 06:19):

The mathlib docs for NFA say that for actual NFAs, fintype instances must be provided for the state and alphabet type

Shreyas Srinivas (May 07 2024 at 06:20):

Fintype implies that there is an explicit list of elements of the type

Shreyas Srinivas (May 07 2024 at 06:20):

That being said, I don't think it provides a representation that we would normally work with in CS.

Shreyas Srinivas (May 07 2024 at 06:20):

https://leanprover-community.github.io/mathlib4_docs/Mathlib/Computability/NFA.html

Paige Thomas (May 07 2024 at 06:22):

That would be ok, except that the only way I currently (and just recently learned) to compute a regexp is to first translate it to a εNFA.

Shreyas Srinivas (May 07 2024 at 06:24):

There is an old algorithm that skips the epsilon nfa step and directly computes a dfa. Let me see if I can recall the name. Basically it constructs a tree from the regexp respecting the priority rules for operators and computes states as subsets of the tree's leaf nodes

Shreyas Srinivas (May 07 2024 at 06:26):

Please check this, but I think it is called the McNaughton Yamada construction

Paige Thomas (May 07 2024 at 06:26):

The DFA version has the potential for an exponential expansion of the number of states though, right? The NFA can be computed in O(m), where m is the length of the string to match, at the expense of greater mem usage?

Paige Thomas (May 07 2024 at 06:28):

That is the one I learned, but it goes to a εNFA first I think.

Shreyas Srinivas (May 07 2024 at 06:28):

The McNaughton yamada construction is often confused with the Thomson construction but is different.

Shreyas Srinivas (May 07 2024 at 06:28):

The epsilon NFA construction is Thompson's

Paige Thomas (May 07 2024 at 06:30):

Hmm. Ok.

Shreyas Srinivas (May 07 2024 at 06:30):

NFAs can be much more succinct that DFAs at the cost of computation.

Shreyas Srinivas (May 07 2024 at 06:30):

I mean, when evaluating the possible outcomes of the NFA, one is practically re doing the powerset construction

Paige Thomas (May 07 2024 at 06:32):

The NFA will run faster though right, at the cost of mem usage? (my reference: https://swtch.com/~rsc/regexp/regexp1.html)

Paige Thomas (May 07 2024 at 06:34):

Or maybe that is what you meant.

Paige Thomas (May 07 2024 at 06:36):

Maybe what they are doing there is running what is essentially the NFA converted to a DFA.

Shreyas Srinivas (May 07 2024 at 06:36):

Basically with an NFA, in order to test acceptance, you have to try out every non deterministic choice when the automaton is run

Paige Thomas (May 07 2024 at 06:37):

Yes.

Shreyas Srinivas (May 07 2024 at 06:40):

There are two practical factors to consider:

how many queries are you making against this automaton ? The dfa construction precomputes the reachable state set, whereas with an NFA one is potentially repeating this computation with each query.
Quite often the reachable state set in the dfa won't be exponentially larger in cardinality than the nfa

Paige Thomas (May 07 2024 at 06:42):

Yes, I see, the DFA would probably be better for the use case I have in mind.

Shreyas Srinivas (May 07 2024 at 06:45):

Anyway, to return to your original question, I would suggest: ~~Construct a fintype based epsilon nfa~~ (add a fintype constraint to the state type and alphabet type) and define a computable BFS based epsilon closure function, and then there are two options:

Prove that the definition is equivalent to the existing inductive prop definition and get the downstream theorems from mathlib for free. But proving this equivalence might be hard. Perhaps the reflection feature of LeanSSR is useful for this.
Freshly prove those downstream theorems directly from definition. This might be more legwork, but the proofs could be more "natural".

Paige Thomas (May 07 2024 at 06:47):

Sounds reasonable. What is LeanSSR?

Shreyas Srinivas (May 07 2024 at 06:48):

It is a new library from Vladimir Goldstein and Ilya Sergey, which provides a library for small scale reflection tactics

Shreyas Srinivas (May 07 2024 at 06:49):

This tactic language is used in Coq's mathcomp

Shreyas Srinivas (May 07 2024 at 06:49):

LeanSSR brings a version with better UX to Lean

Paige Thomas (May 07 2024 at 06:49):

Ok, I'm not familiar with reflection tactics.

Paige Thomas (May 07 2024 at 06:50):

I'll have to look it up.

Paige Thomas (May 07 2024 at 06:50):

Thank you.

Shreyas Srinivas (May 07 2024 at 06:51):

I don't have deep familiarity myself. This library is very new

Shreyas Srinivas (May 07 2024 at 06:51):

But the authors are on this zulip, and might be able to provide more help with that

Eric Wieser (May 07 2024 at 17:45):

Arguably that reflection tactic is largely the decide that mathlib already has

Shreyas Srinivas (May 07 2024 at 18:00):

I am not going to argue that point. @Eric Wieser the better UX claim is not as spicy as it sounds. It is mentioned in the paper (points 1,2, and 3 in page 2). This was also mentioned in zulip

Eric Wieser (May 07 2024 at 18:19):

Indeed, let's not rehash the discussion in that thread any further

Shreyas Srinivas (May 08 2024 at 22:04):

It might be nice, performance-wise, to compare a lazily unfolded epsilon closure using Thunks, with an eagerly precomputed one. It would be harder to prove anything about it though.

Paige Thomas (May 09 2024 at 05:33):

This is an attempt at code to do a breadth first traversal of a graph, beginning at a given vertex. I'm wondering if anyone might know a more efficient/clearer/simpler way to do it. I didn't find a lot of resources online that I could easily follow for how to do it functionally/recursively. I also haven't figured out how to prove that it terminates yet.

partial
def breadth_first_traversal_aux
  {Vertex : Type}
  [DecidableEq Vertex]
  (outside : List Vertex)
  (boundary : List Vertex)
  (inside : List Vertex)
  (edges : List (Vertex × Vertex)) :
  List Vertex :=
  -- All of the vertices that are outside and adjacent to at least one vertex in the boundary.
  let next : List Vertex := List.filter (fun (out : Vertex) => ∃ (b : Vertex), b ∈ boundary ∧ (b, out) ∈ edges) outside
  if next = []
  then inside ∪ boundary
  else breadth_first_traversal_aux (outside \ next) next (inside ∪ next) edges

/--
  `breadth_first_traversal V E start` := All of the vertices in `V` that are reachable from the vertex `start` by following a sequence of zero or more edges in `E`.
-/
def breadth_first_traversal
  {Vertex : Type}
  [DecidableEq Vertex]
  (vertices : List Vertex)
  (edges : List (Vertex × Vertex))
  (start : Vertex) :
  List Vertex :=
  breadth_first_traversal_aux vertices [start] [start] edges

Shreyas Srinivas (May 09 2024 at 06:57):

You don't need a separate inside and boundary set within a bfs function. The batteries (formerly std) library would almost certainly have a queue data structure

Shreyas Srinivas (May 09 2024 at 06:58):

@loogle Std.Queue

loogle (May 09 2024 at 06:58):

:search: Std.Queue, Std.Queue.empty, and 17 more

Shreyas Srinivas (May 09 2024 at 07:01):

You can implement a bfs in the usual way with a queue. Since you would add at least one element each iteration to the list of visited nodes, the termination parameter could be (no. Of vertices - length of visited nodes list)

Shreyas Srinivas (May 09 2024 at 07:05):

@loogle Batteries.Queue

loogle (May 09 2024 at 07:05):

:exclamation: unknown identifier 'Batteries.Queue'
Did you mean "Batteries.Queue"?

Notification Bot (May 09 2024 at 07:40):

4 messages were moved from this topic to #Is there code for X? > loogle on newly renamed Batteries by Shreyas Srinivas.

Shreyas Srinivas (May 09 2024 at 07:51):

With the exisiting implementation, the recursive call is only made when the cardinality of outside strictly decreases

Shreyas Srinivas (May 09 2024 at 07:51):

so something like the following might work:

import Mathlib

def breadth_first_traversal_aux
  {Vertex : Type}
  [DecidableEq Vertex]
  (outside : List Vertex)
  (boundary : List Vertex)
  (inside : List Vertex)
  (edges : List (Vertex × Vertex)) :
  List Vertex :=
  -- All of the vertices that are outside and adjacent to at least one vertex in the boundary.
  let next : List Vertex := List.filter (fun (out : Vertex) => ∃ (b : Vertex), b ∈ boundary ∧ (b, out) ∈ edges) outside
  if h : next = []
  then inside ∪ boundary
  else breadth_first_traversal_aux (outside \ next) next (inside ∪ next) edges
termination_by outside.length
decreasing_by
  simp_wf
  sorry

Shreyas Srinivas (May 09 2024 at 07:53):

But it will be complicated to show this, because you also have to prove that the next set atually came from outside using lemmas for List.filter

Paige Thomas (May 10 2024 at 06:12):

Thank you. Like this?

partial
def breadth_first_traversal_aux
  {Vertex : Type}
  [DecidableEq Vertex]
  (edges : List (Vertex × Vertex))
  (visited : List Vertex)
  (queue : Std.Queue Vertex) :
  List Vertex :=
  match queue.dequeue? with
  | Option.some (current, queue) =>
    let frontier := (edges.filter (fun (e : Vertex × Vertex) => e.fst = current ∧ e.snd ∉ visited)).map Prod.snd

    breadth_first_traversal_aux edges (visited ++ frontier) (queue.enqueueAll frontier)
  | Option.none => visited


/--
  `breadth_first_traversal E start` := All of the vertices that are reachable from the vertex `start` by following a sequence of zero or more edges in `E`.
-/
def breadth_first_traversal
  {Vertex : Type}
  [DecidableEq Vertex]
  (edges : List (Vertex × Vertex))
  (start : Vertex) :
  List Vertex :=
  breadth_first_traversal_aux edges [start] (Std.Queue.empty.enqueue start)


#eval breadth_first_traversal [] 0 = [0]
#eval breadth_first_traversal [(0, 1)] 0 = [0, 1]
#eval breadth_first_traversal [(0, 1), (1, 0)] 0 = [0, 1]
#eval breadth_first_traversal [(0, 1), (1, 2)] 0 = [0, 1, 2]
#eval breadth_first_traversal [(0, 1), (1, 2), (2, 0)] 0 = [0, 1, 2]

I'm not sure how to set it up to prove that it terminates. I would guess it would be on the number of elements in the queue, but I'm not sure how to say that.

Paige Thomas (May 10 2024 at 06:45):

I guess the size of the queue is not strictly decreasing.

Shreyas Srinivas (May 10 2024 at 07:45):

The number of non-visited vertices decreases

Shreyas Srinivas (May 10 2024 at 08:09):

As long as the graph is connected it reaches zero

Paige Thomas (May 10 2024 at 18:37):

We don't want to assume it is completely connected though right? If that was the case, wouldn't the result always just be the set of all of the vertices in the graph?

Paige Thomas (May 10 2024 at 19:53):

This is almost there, except where the number of edges is 0.

def breadth_first_traversal_aux
  {Vertex : Type}
  [DecidableEq Vertex]
  (edges : List (Vertex × Vertex))
  (visited : List Vertex)
  (queue : Std.Queue Vertex) :
  List Vertex :=
  match queue.dequeue? with
  | Option.some (current, next) =>
    let frontier := (edges.filter (fun (e : Vertex × Vertex) => e.fst = current ∧ e.snd ∉ visited)).map Prod.snd

    breadth_first_traversal_aux edges (visited ++ frontier) (next.enqueueAll frontier)
  | Option.none => visited
  termination_by edges.length - visited.length
  decreasing_by
    simp_wf

Vertex : Type
inst✝ : DecidableEq Vertex
edges : List (Vertex × Vertex)
visited : List Vertex
queue : Std.Queue Vertex
current : Vertex
next : Std.Queue Vertex
frontier : List Vertex := List.map Prod.snd (List.filter (fun e => decide (e.1 = current ∧ e.2 ∉ visited)) edges)
⊢ edges.length -
    (visited.length + (List.filter (fun e => decide (e.1 = current) && !decide (e.2 ∈ visited)) edges).length) <
  edges.length - visited.length

Paige Thomas (May 10 2024 at 20:25):

In truth I don't think I need to use a breadth first traversal of the graph per se. For the epsilon closure I just need to compute the subset of the vertices in the graph that are connected to a given vertex. I am assuming that there may be, and likely are, parts of the graph that are disjoint.

Paige Thomas (May 11 2024 at 23:51):

Got it for depth first traversal using this paper and help from @Mario Carneiro .
https://leanprover.zulipchat.com/#narrow/stream/348111-batteries/topic/depth.20first.20search/near/439481437

Yuyang Zhao (May 23 2024 at 13:59):

I have some different DFS implementations (two of them have proofs) at https://negiizhao.github.io/Algorithm/Algorithm/Graph/DFS.html, one of which may be adapted to a more generalized form in the future to support BFS and other vertex visit orders.

Yuyang Zhao (May 23 2024 at 14:04):

(They are built on some heavier infrastructure... But probably could be modified to a lighter version)

Last updated: May 02 2025 at 03:31 UTC