Zulip Chat Archive

Stream: Equational

Topic: Magma equations and fractal tree matching


Michael Bucko (Oct 28 2024 at 20:58):

I picked those three magmas and visualized them as fractal trees:

  • 24, "Green": implies many and is implied by 2 and 4
    E24

  • 1076, "Blue": implied only by 2, implies the (beautiful) 1020

E1076

  • 2531, "Strange": implied only by 2, and implies the (beautiful) 2441

E2531

Alex Meiburg (Oct 28 2024 at 21:43):

What's the relation here between the equation and the tree? Is this some image of the free magma with the law?

Michael Bucko (Oct 28 2024 at 21:53):

Alex Meiburg schrieb:

What's the relation here between the equation and the tree? Is this some image of the free magma with the law?

I just got the graph from Equation Explorer, generated heatmaps, and then asked the LLM to interpret them as fractal trees. So I don't have the exact (ml) features.

Well, saliency could be measured through relative changes-- i.e. the biggest learning for me here is that E24 is somehow supersymetrical, and E2531 is very different somehow (the right branch is turning, and it's quite heavy too; also, it's clear that there are 3 major clusters of branches).

Shreyas Srinivas (Oct 29 2024 at 01:05):

Okay, but does each in the tree point to an equation? How is it acyclic? I thought we get a DAG.

Michael Bucko (Oct 29 2024 at 05:51):

Short: The fractal tree is only the LLM's way of seeing things based on the data. Because of how such NNs work, it's hard for me to know if  each in the tree point to an equation.

My intuition was: let's make fractals out of those magmas. And that's what I'm doing.

I start from what I think too is a DAG from our tool:

Graph

Then get a heatmap.

Heatmap

And then ask the O1 to give me a tree. It essentially has a task to approximate and pick the parameters that "match". Only that. Plus, it's an LLM.

It picks the parameters for this function:

import matplotlib.pyplot as plt
import numpy as np

def draw_nested_magma_tree(ax, x, y, length, angle, depth, max_depth):
    if depth > max_depth:
        return

    new_x = x + length * np.cos(angle)
    new_y = y + length * np.sin(angle)

    ax.plot([x, new_x], [y, new_y], color='purple', lw=1.5)

    new_length = length * 0.7

    draw_nested_magma_tree(ax, new_x, new_y, new_length, angle + np.pi / 6, depth + 1, max_depth)  # First branch
    draw_nested_magma_tree(ax, new_x, new_y, new_length, angle - np.pi / 3, depth + 1, max_depth)  # Second branch
    draw_nested_magma_tree(ax, new_x, new_y, new_length, angle + np.pi / 12, depth + 1, max_depth)  # Additional branch for complexity

fig, ax = plt.subplots(figsize=(10, 10))
ax.set_axis_off()
ax.set_aspect('equal')

initial_x = 0
initial_y = 0
initial_length = 1.0
initial_angle = np.pi / 2
max_depth = 7

# Draw the fractal tree for the nested magma operation
draw_nested_magma_tree(ax, initial_x, initial_y, initial_length, initial_angle, 0, max_depth)

plt.title('Fractal Magma Tree for Nested Operation x = (y  ((y  x)  x))  y')
plt.show()

I think what the LLM does internally is turn the abstract syntax tree (AST) into what we see. But I don't know for sure.

The LLM assistant sees it like this:

Interpretation

Amir Livne Bar-on (Oct 29 2024 at 06:51):

If it's o1 there should be some justification for the parameters, right? Otherwise it might (likely IMO) just emit some values that would make pretty logos, without a relation to the operation table.

Michael Bucko (Oct 29 2024 at 06:54):

Amir Livne Bar-on schrieb:

If it's o1 there should be some justification for the parameters, right? Otherwise it might (likely IMO) just emit some values that would make pretty logos, without a relation to the operation table.

So it's graph -> heatmap -> fractal. It has lots of information. As I mentioned, I think it probably thinks about the AST (it's one of the main factors).

Asked O1,

  • The angles and length reduction reflect the asymmetry and complexity of the magma operation, with each branch representing a recursive call or sub-operation in the equation.
  • The initial length and depth capture the overall scale of the operation and provide a sense of diminishing influence as recursion proceeds, which mirrors the depth and complexity of nested computations in the given equation.
  • The asymmetry in angles was inspired by the heatmap and DAG, which demonstrated non-linear, varied results for different inputs. This motivates a fractal pattern that is not uniformly symmetric but instead exhibits complexity and variability, much like the magma operation itself.

Michael Bucko (Oct 29 2024 at 14:50):

1076 - an Asterisk-like equation, a variant of 65, looks really weird too. I used the same color as in 2531 (implies 2441), because in some ways they similar, in some ways very different.
1076

Shreyas Srinivas (Oct 29 2024 at 14:52):

I still find it hard to make sense of how this relates to the equation graph.

Shreyas Srinivas (Oct 29 2024 at 14:53):

For example, what does "overall scale of the operation" mean?

Shreyas Srinivas (Oct 29 2024 at 14:53):

And what proof do I have that the nodes indeed correspond to equations?

Shreyas Srinivas (Oct 29 2024 at 14:54):

If I present this at, say, a scientific venue, how do I establish that the correspondence between this visual and the equation graph?

Michael Bucko (Oct 29 2024 at 14:56):

Asked O1,

the "overall scale of the operation" refers to how the magnitude or size of the branches evolves as we recursively apply the operation and draw the tree structure.

Michael Bucko (Oct 29 2024 at 15:00):

I don't think we have a proof that there's a simple correspondence. It's like this

LLM(graph, heatmap, insights) generating fractals

For me, it's a means to visualize certain patters from the <graph,heatmap> perspective using a language model. I am basically trying to learn some hidden similarities by working with language models and code.

Amir Livne Bar-on (Oct 29 2024 at 15:15):

When you write 'insights' inside the parens, does it mean that they were in the input buffer when the 'fractals' were output? That is, were they part of a chain of thought leading to the coefficients?

Michael Bucko (Oct 29 2024 at 15:53):

Amir Livne Bar-on schrieb:

When you write 'insights' inside the parens, does it mean that they were in the input buffer when the 'fractals' were output? That is, were they part of a chain of thought leading to the coefficients?

Yes. Input is a bit like (oversimplified) Transformer([graph, heatmap, insights, prompt]) -> Fractal. (not only Fractal, since the context is known, and can be used for further investigation).

Yes, CoT is the way to break a task into sub-tasks. So when the task is known, intermediate steps can be generated for that task.

Other lower-level things involve: token-level analysis, CoT (as you mentioned), attention (incl. attention maps), model's internal rep of concepts, the story (incl. the order of statements), and so on. Currently, it feels like talking to an assistant, but eventually it's gonna feel like legos that are quite intelligent, can move, are proactive, and use intelligent tactics.

I'm gonna write a short blog post about how intelligent tactics (that involve llms, type matching, rules, preferences, normalization strategies, and so on) and not-so-easy-to-interpret LLMs can lead to new, exciting way of doing mathematics.

Michael Bucko (Oct 29 2024 at 15:57):

I actually asked O1 how it made certain decisions around the final transformation, and it shared some insights (posted in one of the previous posts). They are quite general, but I guess we'll need to embrace AI soon.

Terence Tao (Oct 29 2024 at 16:04):

Experimental applications of AI tools can be fun and interesting, but ultimately to have scientific value, they are going to have to produce output that can be externally verified in some replicable fashion. For instance, if we had some pre-existing classification of equations into different types, and were able to use these AI-generated images to improve the accuracy of predicting this classification through some objective scientific experiment, that would be a reportable data point.

Michael Bucko (Oct 29 2024 at 16:25):

Absolutely. For now, it's only an experiment. I'll generate more fractals like this, try to cluster (categorize) them, and then see if I can get the LLM to explain the features it uses to come up with the hyperparameters.

Terence Tao (Oct 29 2024 at 16:43):

I think using AI to explain AI-generated categories is not grounded enough. One has to demonstrate that AI tools can say something non-trivial about pre-existing categories of interest that were not generated through AI tools.

Michael Bucko (Oct 29 2024 at 16:56):

Terence Tao schrieb:

One has to demonstrate that AI tools can say something non-trivial about pre-existing categories of interest that were not generated through AI tools.

That's a better (bigger) challenge!

Here we have multiple options (that I can think of):

  • @Adam Topaz 's pretrained transformer,
  • a fine tune of an existing llm (for example, 4o-mini requires chat-formatted data, so we'd need to work on the dataset), esp. a fine tune of theorem proving models, like DeepSeek-Prover
  • experiments with LeanDojo
  • experiments with diffusion models for proof generation (compute would be necessary here)
  • experiments with RL agents (inspired by You)

Failed so far:

  • my DeepSeek attempts to generate something non-trivial about pre-existing categories

Michael Bucko (Oct 29 2024 at 17:54):

As for the pretrained model, we'd also be able to run:

  • create a cli tool for other SW attempts (so that anyone can profit from the tool in their research)
  • create an HF space

Adam Topaz (Oct 29 2024 at 17:56):

I should clarify that one shouldn't think of "pretraining" here in the same way as one does for LLMs. It just provides a good initialization of the parameters for the classification step. In fact, it seems that this is not even really necessary, as training the classification model from scratch ends up with essentially the same accuracy (around 98%)

Michael Bucko (Oct 29 2024 at 17:57):

Btw, this is from o1, for the reference. That's how it currently sees the categories.

Categories Derived for the Equations

  1. Basic Operations (Simple Identity and Commutativity)

    • Equations: Equation1 to Equation2

    • Description: These are the simplest forms of operations where either a variable is set to itself or another variable. This category includes:

      • Identity: x=x
      • Assignment: x=y
        • Category Name: Identity and Simple Assignment
  2. Binary Operations (Single Level of ◇)

    • Equations: Equation3 to Equation7

    • Description: These equations involve a single use of the operation ◇. This is the fundamental binary operation between two elements. Examples include:

      • Self-operation: x=x◇x
      • Cross-operation: x=x◇y, x=y◇z
        • Category Name: Basic Binary Magma Operations
  3. Nested Operations with Two Levels (Double ◇)

    • Equations: Equation8 to Equation22

    • Description: These equations introduce a second level of nesting. The operation ◇ is applied twice, and the result of one binary operation is used in another. This category includes:

      • Left Nested: x=x◇(x◇y)
      • Right Nested: x=y◇(z◇w)
        • Category Name: Double-Nested Magma Operations
  4. Associative-Like Chains (Triple or More Levels of ◇)

    • Equations: Equation23 to Equation47

    • Description: These are more complex operations where multiple levels of ◇ are chained together. This mimics associativity-like properties by forming longer chains of operations. Examples include:

      • Repeated Self-Nesting: x=(x◇x)◇x
      • Mixed Terms: x=(x◇y)◇z
        • Category Name: Associative-Like Chained Operations
  5. Balanced and Imbalanced Multilevel Operations

    • Equations: Equation48 to Equation150

    • Description: In these equations, the operations are multilevel and involve several nested terms that create either balanced or imbalanced tree-like structures. Examples include:

      • Balanced Nesting: x=x◇(x◇(y◇y))
      • Imbalanced Growth: x=y◇(y◇(z◇w))
        • Category Name: Multilevel Magma Trees
  6. Symmetric and Anti-Symmetric Relations

    • Equations: Equation151 to Equation200

    • Description: These operations explore symmetric and anti-symmetric relationships between terms using the ◇ operation. Examples include:

      • Symmetric Self-Combination: x◇x=y◇x
      • Cross-Term Relations: x◇y=z◇y
        • Category Name: Symmetric and Anti-Symmetric Relations
  7. Deeply Nested and Complex Dependencies

    • Equations: From Equation201 onwards (e.g., Equation300, Equation500, etc.)

    • Description: These equations are deeply nested with a high level of complexity and dependency between the terms. They often form intricate relationships involving multiple different variables and multiple uses of ◇.

    • These types of equations typically correspond to the deeper parts of the fractal magma tree visualizations.

    • Category Name: Deep Nested Dependencies and Complex Structures


Last updated: May 02 2025 at 03:31 UTC