Zulip Chat Archive

Stream: lean4

Topic: query API


Arthur Paulino (Nov 25 2021 at 15:16):

Hi! I'm trying to build a query API à la Spark for Scala (and I am learning a lot in the process). So far, this is where I'm at:

abbrev Col := Lean.Name

inductive ColProp
  | EqE (c : Col) (e : Entry)
  | EqC (c : Col) (c' : Col)
  | NeqE (c : Col) (e : Entry)
  | NeqC (c : Col) (c' : Col)
  | And (cp : ColProp) (cp' : ColProp)
  | Or (cp : ColProp) (cp' : ColProp)

-- constant colEqEntry (c : Nat) (e : Entry) : Type

infix:50 " =e " => ColProp.EqE
infix:50 " =c " => ColProp.EqC
infix:50 " ≠e " => ColProp.NeqE
infix:50 " ≠c " => ColProp.NeqC
infix:75 " ∧ " => ColProp.And
infix:75 " ∨ " => ColProp.Or

inductive QueryStep
  | table (n : String)
  | select (l : List Col)
  | filter (cp : ColProp)
  | joinₗ (t : String) (on : List Col)

structure Query where
  steps : List QueryStep
  deriving Inhabited

constant table (n : String) : Query :=
⟨[QueryStep.table n]⟩

namespace Query

constant select (q : Query) (l : List Col) : Query :=
q.steps.concat (QueryStep.select l)⟩

constant filter (q : Query) (cp : ColProp) : Query :=
q.steps.concat (QueryStep.filter cp)⟩

constant joinₗ (q : Query) (t : String) (on : List Col) : Query :=
q.steps.concat (QueryStep.joinₗ t on)⟩

end Query
#check (((table "cars").select [`id, `name]).joinₗ "prices" [`id]).filter (`price e 500)
-- Query.filter (Query.joinₗ (Query.select (table "cars") [`id, `name]) "prices" [`id]) (`price ≠e 500) : Query

Two things:

  • Is it possible to avoid the need for this process of going back and adding parentheses everytime I want to add a new transformation?
  • I'd like the joinₗ query step to receive another Query in the first parameter, but I got into some kind of definition chicken/egg race. Is it possible to avoid this problem?

Mario Carneiro (Nov 25 2021 at 16:53):

you can use mutual inductive, although you will have to convert the structure to an inductive like inductive Query | mk (steps : List QueryStep)

Arthur Paulino (Nov 25 2021 at 16:57):

Thanks! Let me try that :thinking:

Arthur Paulino (Nov 25 2021 at 17:00):

Is this updated? It's a bit hard to read

Mario Carneiro (Nov 25 2021 at 17:03):

doesn't look like it

Mario Carneiro (Nov 25 2021 at 17:04):

the cliff notes version is that to write a mutual inductive you say

mutual

inductive A
| mk : B -> A

inductive B
| mk : A -> B

end

Arthur Paulino (Nov 25 2021 at 17:08):

Thanks!! This is looking clean:

mutual
  inductive Query
    | mk (steps : List QueryStep)
  inductive QueryStep
    | table (n : String)
    | select (l : List Col)
    | filter (cp : ColProp)
    | joinₗ (q : Query) (on : List Col)
end

Arthur Paulino (Nov 25 2021 at 17:16):

But how do I access steps of a certain q : Query? q.steps is causing

invalid field 'steps', the environment does not contain 'Query.steps'
  q
has type
  Query

Henrik Böving (Nov 25 2021 at 17:32):

Since its an inductive now you would pattern match on it, you could also define a utility function like

namespace Query
  def steps : Query  List QueryStep
  | mk steps => steps
end Query

Arthur Paulino (Nov 25 2021 at 17:38):

It's looking good!

#check table "cars"  select [`id, `name]  joinₗ (table "prices") [`id]
-- table "cars" ↠ select [`id, `name] ↠ joinₗ (table "prices") [`id] : Query

Henrik Böving (Nov 25 2021 at 17:40):

If you are feeling particulary fancy you could maybe even define join with \bowtie like in relational algebra (https://en.wikipedia.org/wiki/Relational_algebra#Natural_join_(%E2%8B%88)) which would spare you the need of typing ↠ joinₗ every time you want to join i guess.

Arthur Paulino (Nov 25 2021 at 17:48):

Hm, I was planning to use joinᵢ joinₗ joinᵣ joinₒ to differentiate inner, left, right and outer joins. If I use I think it will be a bit hard to remember the code for all variations

Arthur Paulino (Nov 25 2021 at 17:51):

Or I could just accept a third parameter (how) like most APIs do :thinking:

Mac (Nov 25 2021 at 21:46):

@Arthur Paulino fyi, you could avoid the mutual and use a structure by doing the following:

structure BaseQuery (Step : Type) where
  steps : List Step

inductive QueryStep
| table (n : String)
| select (l : List Col)
| filter (cp : ColProp)
| joinₗ (q : BaseQuery QueryStep) (on : List Col)

abbrev Query := BaseQuery QueryStep

Arthur Paulino (Nov 28 2021 at 03:20):

Small victory today:

import Ffi

def main : IO Unit := do
  let mysql  MySQL.mk
  mysql.login "localhost" "root" "root"
  mysql.useDB "test_db"
  mysql.querySQL "select car.id, car.name, price.price from car join price on car.id = price.id"
  IO.println <| mysql.getQueryResult.toString

Outputs:

id|name|price|
1|'A'|3|
2|'B'|2|
3|'C'|1|

Arthur Paulino (Nov 28 2021 at 12:40):

@Tomas Skrivan Wanna join forces? This work of yours is closely related. My representation of a query result is the bottleneck of my pipeline.

My first approach, just to get the funnel working, was to return the query result as a beautifully formatted string and parse it in Lean. But something more similar to numpy should be way faster

Tomas Skrivan (Nov 28 2021 at 13:00):

@Arthur Paulino I do not see the immediate relation between your project and mine, could be due to my complete ignorance of databases. I have no clue what SQL is, what exactly database is and so on.

The connection I can see is that I want to work with tensors/multidimensional arrays and you work with databases that are also multidimensional arrays of sorts. Not sure if there is a big overlap of what we want to do with these objects. Probably a common and hard one is slicing.

Apart of that, right now I do not need too fancy stuff with tensors and my current focus is still on automatic/symbolic differentiation.

Arthur Paulino (Nov 28 2021 at 13:01):

How are you storing data?

Tomas Skrivan (Nov 28 2021 at 13:06):

That is currently not a big concert to me, but I will last long with FloatArray which I just wrap into column major matrix/tensor. I do not see the need for anything much more complicated in a near future. In a long run, I see the need for writing wrappers for Eigen matrices, Cuda arrays and OpenVDB(sparse voxel grid).

Arthur Paulino (Nov 28 2021 at 13:09):

Indeed, I need something more versatile for the context of query results. Something that supports different types (including strings) and the presence of NULL values (or absence of values)

Tomas Skrivan (Nov 28 2021 at 13:09):

First, I want to stabilize the interface on top of which all of this will be build. That is where I believe Lean can shine and if I do not manage to get a nice, versatile and powerful interface then there is no point in my project. So currently I'm really not concerned with run-time performance, it is still in the back of my mind but not in focus.

Tomas Skrivan (Nov 28 2021 at 13:12):

Yeah, I think our needs are very different. Looking at other libraries, is there a library that is advertised as database library and numerical linear algebra package? I'm not aware of it, the uses are way too different.

Arthur Paulino (Nov 28 2021 at 13:15):

In Python, numpy is used to implement pandas, which is largely used in pyspark to return results of queries. But numpy was primarily a package for heavy matricial operations

Tomas Skrivan (Nov 28 2021 at 13:23):

Ahh ok, that just reveals my complete lack of understanding databases.

Tomas Skrivan (Nov 28 2021 at 13:29):

In my recent thread , I have discussed interface for multi dimensional arrays. @Kyle Miller shared some interesting ideas how to approach it. I will be playing around with it a bit and maybe coming up with something that might be of use to you.

Arthur Paulino (Nov 28 2021 at 23:54):

Expanded the API:

def main : IO Unit := do
  let mysql  MySQL.mk
  IO.println $ mysql.version -- queries mysql version
  mysql.login "localhost" "root" "root"
  mysql.close -- closes connection with mysql
  mysql.login "localhost" "root" "root"

  mysql.createDB "test_db" -- create database
  mysql.dropDB "test_db" -- drop database
  mysql.createDB "test_db" -- create it again

  mysql.useDB "test_db" -- use database

  mysql.createTable "job" [
    "id", "INT PRIMARY KEY"⟩,
    "name", "VARCHAR(255)"
  ]
  mysql.insertIntoTable "job" [1, "Computer Scientist"]
  mysql.insertIntoTable "job" [2, "Mathematician"]

  mysql.createTable "person" [
      "id", "INT PRIMARY KEY"⟩,
      "name", "VARCHAR(255)"⟩,
      "age", "INT"⟩,
      "job_id", "INT"
    ]

  -- inserting data
  mysql.insertIntoTable "person" [1, "Alice", 20, 1]
  mysql.insertIntoTable "person" [2, "Bod", 21, 2]
  mysql.insertIntoTable "person" [3, "Craig", 22, NULL]

  mysql.querySQL ("select person.name, person.age, job.name " ++
    "from person left join job on person.job_id = job.id")
  mysql.close
  -- connection is closed, but the query result is still available in memory
  IO.println $ mysql.getQueryResult

Prints out:

8.0.27
name|age|name|
'Alice'|20|'Computer Scientist'|
'Bod'|21|'Mathematician'|
'Craig'|22|NULL|

Last updated: Dec 20 2023 at 11:08 UTC