Zulip Chat Archive
Stream: lean4
Topic: query API
Arthur Paulino (Nov 25 2021 at 15:16):
Hi! I'm trying to build a query API à la Spark for Scala (and I am learning a lot in the process). So far, this is where I'm at:
abbrev Col := Lean.Name
inductive ColProp
| EqE (c : Col) (e : Entry)
| EqC (c : Col) (c' : Col)
| NeqE (c : Col) (e : Entry)
| NeqC (c : Col) (c' : Col)
| And (cp : ColProp) (cp' : ColProp)
| Or (cp : ColProp) (cp' : ColProp)
-- constant colEqEntry (c : Nat) (e : Entry) : Type
infix:50 " =e " => ColProp.EqE
infix:50 " =c " => ColProp.EqC
infix:50 " ≠e " => ColProp.NeqE
infix:50 " ≠c " => ColProp.NeqC
infix:75 " ∧ " => ColProp.And
infix:75 " ∨ " => ColProp.Or
inductive QueryStep
| table (n : String)
| select (l : List Col)
| filter (cp : ColProp)
| joinₗ (t : String) (on : List Col)
structure Query where
steps : List QueryStep
deriving Inhabited
constant table (n : String) : Query :=
⟨[QueryStep.table n]⟩
namespace Query
constant select (q : Query) (l : List Col) : Query :=
⟨q.steps.concat (QueryStep.select l)⟩
constant filter (q : Query) (cp : ColProp) : Query :=
⟨q.steps.concat (QueryStep.filter cp)⟩
constant joinₗ (q : Query) (t : String) (on : List Col) : Query :=
⟨q.steps.concat (QueryStep.joinₗ t on)⟩
end Query
#check (((table "cars").select [`id, `name]).joinₗ "prices" [`id]).filter (`price ≠e 500)
-- Query.filter (Query.joinₗ (Query.select (table "cars") [`id, `name]) "prices" [`id]) (`price ≠e 500) : Query
Two things:
- Is it possible to avoid the need for this process of going back and adding parentheses everytime I want to add a new transformation?
- I'd like the
joinₗ
query step to receive anotherQuery
in the first parameter, but I got into some kind of definition chicken/egg race. Is it possible to avoid this problem?
Mario Carneiro (Nov 25 2021 at 16:53):
you can use mutual inductive
, although you will have to convert the structure to an inductive like inductive Query | mk (steps : List QueryStep)
Arthur Paulino (Nov 25 2021 at 16:57):
Thanks! Let me try that :thinking:
Arthur Paulino (Nov 25 2021 at 17:00):
Is this updated? It's a bit hard to read
Mario Carneiro (Nov 25 2021 at 17:03):
doesn't look like it
Mario Carneiro (Nov 25 2021 at 17:04):
the cliff notes version is that to write a mutual inductive you say
mutual
inductive A
| mk : B -> A
inductive B
| mk : A -> B
end
Arthur Paulino (Nov 25 2021 at 17:08):
Thanks!! This is looking clean:
mutual
inductive Query
| mk (steps : List QueryStep)
inductive QueryStep
| table (n : String)
| select (l : List Col)
| filter (cp : ColProp)
| joinₗ (q : Query) (on : List Col)
end
Arthur Paulino (Nov 25 2021 at 17:16):
But how do I access steps
of a certain q : Query
? q.steps
is causing
invalid field 'steps', the environment does not contain 'Query.steps'
q
has type
Query
Henrik Böving (Nov 25 2021 at 17:32):
Since its an inductive now you would pattern match on it, you could also define a utility function like
namespace Query
def steps : Query → List QueryStep
| mk steps => steps
end Query
Arthur Paulino (Nov 25 2021 at 17:38):
It's looking good!
#check table "cars" ↠ select [`id, `name] ↠ joinₗ (table "prices") [`id]
-- table "cars" ↠ select [`id, `name] ↠ joinₗ (table "prices") [`id] : Query
Henrik Böving (Nov 25 2021 at 17:40):
If you are feeling particulary fancy you could maybe even define join with \bowtie like in relational algebra (https://en.wikipedia.org/wiki/Relational_algebra#Natural_join_(%E2%8B%88)) which would spare you the need of typing ↠ joinₗ every time you want to join i guess.
Arthur Paulino (Nov 25 2021 at 17:48):
Hm, I was planning to use joinᵢ joinₗ joinᵣ joinₒ
to differentiate inner, left, right and outer joins. If I use ⋈
I think it will be a bit hard to remember the code for all variations
Arthur Paulino (Nov 25 2021 at 17:51):
Or I could just accept a third parameter (how
) like most APIs do :thinking:
Mac (Nov 25 2021 at 21:46):
@Arthur Paulino fyi, you could avoid the mutual
and use a structure by doing the following:
structure BaseQuery (Step : Type) where
steps : List Step
inductive QueryStep
| table (n : String)
| select (l : List Col)
| filter (cp : ColProp)
| joinₗ (q : BaseQuery QueryStep) (on : List Col)
abbrev Query := BaseQuery QueryStep
Arthur Paulino (Nov 28 2021 at 03:20):
Small victory today:
import Ffi
def main : IO Unit := do
let mysql ← MySQL.mk
mysql.login "localhost" "root" "root"
mysql.useDB "test_db"
mysql.querySQL "select car.id, car.name, price.price from car join price on car.id = price.id"
IO.println <| mysql.getQueryResult.toString
Outputs:
id|name|price|
1|'A'|3|
2|'B'|2|
3|'C'|1|
Arthur Paulino (Nov 28 2021 at 12:40):
@Tomas Skrivan Wanna join forces? This work of yours is closely related. My representation of a query result is the bottleneck of my pipeline.
My first approach, just to get the funnel working, was to return the query result as a beautifully formatted string and parse it in Lean. But something more similar to numpy
should be way faster
Tomas Skrivan (Nov 28 2021 at 13:00):
@Arthur Paulino I do not see the immediate relation between your project and mine, could be due to my complete ignorance of databases. I have no clue what SQL is, what exactly database is and so on.
The connection I can see is that I want to work with tensors/multidimensional arrays and you work with databases that are also multidimensional arrays of sorts. Not sure if there is a big overlap of what we want to do with these objects. Probably a common and hard one is slicing.
Apart of that, right now I do not need too fancy stuff with tensors and my current focus is still on automatic/symbolic differentiation.
Arthur Paulino (Nov 28 2021 at 13:01):
How are you storing data?
Tomas Skrivan (Nov 28 2021 at 13:06):
That is currently not a big concert to me, but I will last long with FloatArray
which I just wrap into column major matrix/tensor. I do not see the need for anything much more complicated in a near future. In a long run, I see the need for writing wrappers for Eigen matrices, Cuda arrays and OpenVDB(sparse voxel grid).
Arthur Paulino (Nov 28 2021 at 13:09):
Indeed, I need something more versatile for the context of query results. Something that supports different types (including strings) and the presence of NULL values (or absence of values)
Tomas Skrivan (Nov 28 2021 at 13:09):
First, I want to stabilize the interface on top of which all of this will be build. That is where I believe Lean can shine and if I do not manage to get a nice, versatile and powerful interface then there is no point in my project. So currently I'm really not concerned with run-time performance, it is still in the back of my mind but not in focus.
Tomas Skrivan (Nov 28 2021 at 13:12):
Yeah, I think our needs are very different. Looking at other libraries, is there a library that is advertised as database library and numerical linear algebra package? I'm not aware of it, the uses are way too different.
Arthur Paulino (Nov 28 2021 at 13:15):
In Python, numpy
is used to implement pandas
, which is largely used in pyspark
to return results of queries. But numpy
was primarily a package for heavy matricial operations
Tomas Skrivan (Nov 28 2021 at 13:23):
Ahh ok, that just reveals my complete lack of understanding databases.
Tomas Skrivan (Nov 28 2021 at 13:29):
In my recent thread , I have discussed interface for multi dimensional arrays. @Kyle Miller shared some interesting ideas how to approach it. I will be playing around with it a bit and maybe coming up with something that might be of use to you.
Arthur Paulino (Nov 28 2021 at 23:54):
Expanded the API:
def main : IO Unit := do
let mysql ← MySQL.mk
IO.println $ mysql.version -- queries mysql version
mysql.login "localhost" "root" "root"
mysql.close -- closes connection with mysql
mysql.login "localhost" "root" "root"
mysql.createDB "test_db" -- create database
mysql.dropDB "test_db" -- drop database
mysql.createDB "test_db" -- create it again
mysql.useDB "test_db" -- use database
mysql.createTable "job" [
⟨"id", "INT PRIMARY KEY"⟩,
⟨"name", "VARCHAR(255)"⟩
]
mysql.insertIntoTable "job" [1, "Computer Scientist"]
mysql.insertIntoTable "job" [2, "Mathematician"]
mysql.createTable "person" [
⟨"id", "INT PRIMARY KEY"⟩,
⟨"name", "VARCHAR(255)"⟩,
⟨"age", "INT"⟩,
⟨"job_id", "INT"⟩
]
-- inserting data
mysql.insertIntoTable "person" [1, "Alice", 20, 1]
mysql.insertIntoTable "person" [2, "Bod", 21, 2]
mysql.insertIntoTable "person" [3, "Craig", 22, NULL]
mysql.querySQL ("select person.name, person.age, job.name " ++
"from person left join job on person.job_id = job.id")
mysql.close
-- connection is closed, but the query result is still available in memory
IO.println $ mysql.getQueryResult
Prints out:
8.0.27
name|age|name|
'Alice'|20|'Computer Scientist'|
'Bod'|21|'Mathematician'|
'Craig'|22|NULL|
Last updated: Dec 20 2023 at 11:08 UTC