Zulip Chat Archive

Stream: Equational

Topic: Network insights


Michael Bucko (Oct 23 2024 at 10:50):

I calculated (free neo4j instance, approx. 10k nodes), the average degree in the graph:

MATCH (n)
OPTIONAL MATCH (n)--()
WITH n, COUNT(*) AS degree

WITH avg(degree) AS avgDegree
RETURN avgDegree

It's 24.

Then I found the nodes above the average (53 in that dataset -- it's a good approximation, but the free instance has perhaps limitations).

MATCH (n)
OPTIONAL MATCH (n)--()
WITH n, COUNT(*) AS degree
WITH avg(degree) AS avgDegree

MATCH (n)
OPTIONAL MATCH (n)--()
WITH n, COUNT(*) AS degree, avgDegree
WHERE degree > 2 * avgDegree
RETURN n, degree
ORDER BY degree DESC

Then I used standard deviation and found those that are beyond 2sd (36):

MATCH (n)
OPTIONAL MATCH (n)--()
WITH n, COUNT(*) AS degree

WITH avg(degree) AS mean, stdev(degree) AS sd

MATCH (n)
OPTIONAL MATCH (n)--()
WITH n, COUNT(*) AS degree, mean, sd

WHERE abs(degree - mean) > 2 * sd
RETURN n, degree, mean, sd
ORDER BY degree DESC

If we had a db, we could turn this into one script that calculates such insights and helps better understand these networks.

Eric Taucher (Oct 23 2024 at 11:05):

Michael Bucko said:

If we had a db

Many do not consider the programming language Prolog as a database and Prolog is known for being horrible with numeric math in that it is extremely slow. However Prolog is quite useful as a prototyping language and can be used to hold the information about the nodes and edges for generating the JSON that can be then be displayed as visual graphs. I personally find Prolog to be the most expressive query language for such problems. While SQL and Cypher are nice, they have their limits which I often run into. As such in the light of using Prolog for prototyping, once the prototype is working nicely, the application can often be created anew with better tooling such as Python, etc.

Michael Bucko (Oct 23 2024 at 11:42):

I am not worried about a particular tech (there're many options available).
Instead, I am trying to figure out the fastest possible way to insights -- and believe in Lean, atp, egg, and transformers.
In this case, the goal would be to simply have a script that gets network analysis and stat info from the graph.


Last updated: May 02 2025 at 03:31 UTC