Zulip Chat Archive
Stream: new members
Topic: alok singh
Alok Singh (Mar 24 2025 at 00:39):
just a log for my lean work to start
Alok Singh (Mar 24 2025 at 00:40):
never thought i'd say it but so far i like lakefile.lean > lakefile.toml bc AI agents can use the type info to fix their broken builds
Alok Singh (Mar 24 2025 at 07:42):
deriving Repr can be slow
/-!
# GPT2 Types
This module defines the core types and structures for the GPT-2 model,
ported from the C implementation in llm.c by Andrej Karpathy.
-/
namespace LLM.GPT2
/--
Configuration for a GPT-2 model.
-/
structure Config where
/-- Max sequence length, e.g. 1024 -/
maxSeqLen : Nat
/-- Vocab size, e.g. 50257 -/
vocabSize : Nat
/-- Padded vocab size, e.g. 50304 -/
paddedVocabSize : Nat
/-- Number of layers, e.g. 12 -/
numLayers : Nat
/-- Number of heads in attention, e.g. 12 -/
numHeads : Nat
/-- Number of channels, e.g. 768 -/
channels : Nat
deriving Repr
/--
Parameter tensors for a GPT-2 model.
-/
structure ParameterTensors where
/-- Token embeddings. Shape: `(V, C)` -/
wte : Array Float
/-- Position embeddings. Shape: `(maxT, C)` -/
wpe : Array Float
/-- Layer norm 1 weights. Shape: `(L, C)` -/
ln1w : Array Float
/-- Layer norm 1 biases. Shape: `(L, C)` -/
ln1b : Array Float
/-- QKV projection weights. Shape: `(L, 3*C, C)` -/
qkvw : Array Float
/-- QKV projection biases. Shape: `(L, 3*C)` -/
qkvb : Array Float
/-- Attention projection weights. Shape: `(L, C, C)` -/
attprojw : Array Float
/-- Attention projection biases. Shape: `(L, C)` -/
attprojb : Array Float
/-- Layer norm 2 weights. Shape: `(L, C)` -/
ln2w : Array Float
/-- Layer norm 2 biases. Shape: `(L, C)` -/
ln2b : Array Float
/-- Fully connected weights. Shape: `(L, 4*C, C)` -/
fcw : Array Float
/-- Fully connected biases. Shape: `(L, 4*C)` -/
fcb : Array Float
/-- Fully connected projection weights. Shape: `(L, C, 4*C)` -/
fcprojw : Array Float
/-- Fully connected projection biases. Shape: `(L, C)` -/
fcprojb : Array Float
/-- Final layer norm weights. Shape: `(C)` -/
lnfw : Array Float
/-- Final layer norm biases. Shape: `(C)` -/
lnfb : Array Float
deriving Repr
/--
Activation tensors for a forward pass of the GPT-2 model.
-/
structure ActivationTensors where
/-- encoded tokens. Shape: `(B, T, C)` -/
encoded : Array Float
/-- layer norm 1: Shape: `(L, B, T, C)` -/
ln1 : Array Float
/-- Layer norm 1 mean. Shape: `(L, B, T)` -/
ln1Mean : Array Float
/-- Layer norm 1 rstd. Shape: `(L, B, T)` -/
ln1Rstd : Array Float
/-- QKV. Shape: `(L, B, T, 3*C)` -/
qkv : Array Float
/-- Attention. Shape: `(L, B, NH, T, T)` -/
preatt : Array Float
/-- Attention. Shape: `(L, B, NH, T, T)` -/
att : Array Float
/-- Attention projection. Shape: `(L, B, T, C)` -/
attproj : Array Float
/-- Residual 2. Shape: `(L, B, T, C)` -/
residual2 : Array Float
/-- Layer norm 2. Shape: `(L, B, T, C)` -/
ln2 : Array Float
/-- Layer norm 2 mean. Shape: `(L, B, T)` -/
ln2Mean : Array Float
/-- Layer norm 2 rstd. Shape: `(L, B, T)` -/
ln2Rstd : Array Float
/-- Fully connected. Shape: `(L, B, T, 4*C)` -/
fch : Array Float
/-- Fully connected gelu. Shape: `(L, B, T, 4*C)` -/
fchGelu : Array Float
/-- Fully connected projection. Shape: `(L, B, T, C)` -/
fcproj : Array Float
/-- Residual 3. Shape: `(L, B, T, C)` -/
residual3 : Array Float
/-- Final layer norm. Shape: `(B, T, C)` -/
lnf : Array Float
/-- Final layer norm mean. Shape: `(B, T)` -/
lnfMean : Array Float
/-- Final layer norm rstd. Shape: `(B, T)` -/
lnfRstd : Array Float
/-- Logits. Shape: `(B, T, V)` -/
logits : Array Float
/-- Probabilities. Shape: `(B, T, V)` -/
probs : Array Float
/-- Losses. Shape: `(B, T)` -/
losses : Array Float
deriving Repr
/--
The full GPT-2 model including configuration, parameters, activations, and gradients.
-/
structure GPT2 where
/-- Model configuration. -/
config : Config
/-- Model parameters. -/
params : ParameterTensors
/-- Sizes of each parameter tensor. -/
paramSizes : Array Nat
/-- Contiguous memory for all parameters. -/
paramsMemory : Array Float
/-- Total number of parameters. -/
numParameters : Nat
/-- Gradients of parameters. -/
grads : ParameterTensors
/-- Contiguous memory for all gradients. -/
gradsMemory : Array Float
/-- `AdamW` optimizer `m` buffer. -/
mMemory : Array Float
/-- `AdamW` optimizer `v` buffer. -/
vMemory : Array Float
/-- Model activations. -/
acts : ActivationTensors
/-- Sizes of each activation tensor. -/
actSizes : Array Nat
/-- Contiguous memory for all activations. -/
actsMemory : Array Float
/-- Total number of activations. -/
numActivations : Nat
/-- Gradients of activations. -/
gradsActs : ActivationTensors
/-- Contiguous memory for all activation gradients. -/
gradsActsMemory : Array Float
/-- Batch size (B) of current forward pass. -/
batchSize : Nat
/-- sequence length (T) of current forward pass -/
seqLen : Nat
/-- Input tokens for current forward pass. -/
inputs : Array Int
/-- Target tokens for current forward pass. -/
targets : Array Int
/-- Mean loss after forward pass with targets. -/
meanLoss : Float
deriving Repr
end LLM.GPT2
If I comment out the deriving this loads in infoview instantly, but with deriving it's going long enough for me to type this out
Alok Singh (Mar 27 2025 at 07:51):
no highlights for numerals with underscores in them
Alok Singh (May 01 2025 at 19:11):
I feel missing by
s should have a clearer error message than unknown identifier: 'tacticName'
. A minor papercut for beginners
Isak Colboubrani (May 01 2025 at 20:51):
@Alok Singh Like in example : 1 + 1 = 2 := simp
? I've thought about that minor paper cut too. Would the suggested logic be that if x
in say example : 1 + 1 = 2 := x
is a tactic, then a custom error message would be written? If so, what would an appropriate error message be?
Alok Singh (May 01 2025 at 22:04):
Checking namespaces to see if it's a tactic would be good, maybe a good error message is (with range info) "unknown identifier (..). Hint: a tactic named tacticName
was found. To use it, put it in a by
block".
theres some burden of explaining what a by block is, but i think that's more manageable, and it could link to documentation about them.
Last updated: May 02 2025 at 03:31 UTC