Zulip Chat Archive

Stream: new members

Topic: alok singh

Alok Singh (Mar 24 2025 at 00:39):

just a log for my lean work to start

Alok Singh (Mar 24 2025 at 00:40):

never thought i'd say it but so far i like lakefile.lean > lakefile.toml bc AI agents can use the type info to fix their broken builds

Alok Singh (Mar 24 2025 at 07:42):

deriving Repr can be slow

/-!
# GPT2 Types

This module defines the core types and structures for the GPT-2 model,
ported from the C implementation in llm.c by Andrej Karpathy.
-/

namespace LLM.GPT2

/--
Configuration for a GPT-2 model.
-/
structure Config where
  /-- Max sequence length, e.g. 1024 -/
  maxSeqLen : Nat
  /-- Vocab size, e.g. 50257 -/
  vocabSize : Nat
  /-- Padded vocab size, e.g. 50304 -/
  paddedVocabSize : Nat
  /-- Number of layers, e.g. 12 -/
  numLayers : Nat
  /-- Number of heads in attention, e.g. 12 -/
  numHeads : Nat
  /-- Number of channels, e.g. 768 -/
  channels : Nat
  deriving Repr

/--
Parameter tensors for a GPT-2 model.
-/
structure ParameterTensors where
  /-- Token embeddings. Shape: `(V, C)` -/
  wte : Array Float
  /-- Position embeddings. Shape: `(maxT, C)` -/
  wpe : Array Float
  /-- Layer norm 1 weights. Shape: `(L, C)` -/
  ln1w : Array Float
  /-- Layer norm 1 biases. Shape: `(L, C)` -/
  ln1b : Array Float
  /-- QKV projection weights. Shape: `(L, 3*C, C)` -/
  qkvw : Array Float
  /-- QKV projection biases. Shape: `(L, 3*C)` -/
  qkvb : Array Float
  /-- Attention projection weights. Shape: `(L, C, C)` -/
  attprojw : Array Float
  /-- Attention projection biases. Shape: `(L, C)` -/
  attprojb : Array Float
  /-- Layer norm 2 weights. Shape: `(L, C)` -/
  ln2w : Array Float
  /-- Layer norm 2 biases. Shape: `(L, C)` -/
  ln2b : Array Float
  /-- Fully connected weights. Shape: `(L, 4*C, C)` -/
  fcw : Array Float
  /-- Fully connected biases. Shape: `(L, 4*C)` -/
  fcb : Array Float
  /-- Fully connected projection weights. Shape: `(L, C, 4*C)` -/
  fcprojw : Array Float
  /-- Fully connected projection biases. Shape: `(L, C)` -/
  fcprojb : Array Float
  /-- Final layer norm weights. Shape: `(C)` -/
  lnfw : Array Float
  /-- Final layer norm biases. Shape: `(C)` -/
  lnfb : Array Float
  deriving Repr

/--
Activation tensors for a forward pass of the GPT-2 model.
-/
structure ActivationTensors where
  /-- encoded tokens. Shape: `(B, T, C)` -/
  encoded : Array Float
  /-- layer norm 1: Shape: `(L, B, T, C)` -/
  ln1 : Array Float
  /-- Layer norm 1 mean. Shape: `(L, B, T)` -/
  ln1Mean : Array Float
  /-- Layer norm 1 rstd. Shape: `(L, B, T)` -/
  ln1Rstd : Array Float
  /-- QKV. Shape: `(L, B, T, 3*C)` -/
  qkv : Array Float
  /-- Attention. Shape: `(L, B, NH, T, T)` -/
  preatt : Array Float
  /-- Attention. Shape: `(L, B, NH, T, T)` -/
  att : Array Float
  /-- Attention projection. Shape: `(L, B, T, C)` -/
  attproj : Array Float
  /-- Residual 2. Shape: `(L, B, T, C)` -/
  residual2 : Array Float
  /-- Layer norm 2. Shape: `(L, B, T, C)` -/
  ln2 : Array Float
  /-- Layer norm 2 mean. Shape: `(L, B, T)` -/
  ln2Mean : Array Float
  /-- Layer norm 2 rstd. Shape: `(L, B, T)` -/
  ln2Rstd : Array Float
  /-- Fully connected. Shape: `(L, B, T, 4*C)` -/
  fch : Array Float
  /-- Fully connected gelu. Shape: `(L, B, T, 4*C)` -/
  fchGelu : Array Float
  /-- Fully connected projection. Shape: `(L, B, T, C)` -/
  fcproj : Array Float
  /-- Residual 3. Shape: `(L, B, T, C)` -/
  residual3 : Array Float
  /-- Final layer norm. Shape: `(B, T, C)` -/
  lnf : Array Float
  /-- Final layer norm mean. Shape: `(B, T)` -/
  lnfMean : Array Float
  /-- Final layer norm rstd. Shape: `(B, T)` -/
  lnfRstd : Array Float
  /-- Logits. Shape: `(B, T, V)` -/
  logits : Array Float
  /-- Probabilities. Shape: `(B, T, V)` -/
  probs : Array Float
  /-- Losses. Shape: `(B, T)` -/
  losses : Array Float
  deriving Repr

/--
The full GPT-2 model including configuration, parameters, activations, and gradients.
-/
structure GPT2 where
  /-- Model configuration. -/
  config : Config
  /-- Model parameters. -/
  params : ParameterTensors
  /-- Sizes of each parameter tensor. -/
  paramSizes : Array Nat
  /-- Contiguous memory for all parameters. -/
  paramsMemory : Array Float
  /-- Total number of parameters. -/
  numParameters : Nat
  /-- Gradients of parameters. -/
  grads : ParameterTensors
  /-- Contiguous memory for all gradients. -/
  gradsMemory : Array Float
  /-- `AdamW` optimizer `m` buffer. -/
  mMemory : Array Float
  /-- `AdamW` optimizer `v` buffer. -/
  vMemory : Array Float
  /-- Model activations. -/
  acts : ActivationTensors
  /-- Sizes of each activation tensor. -/
  actSizes : Array Nat
  /-- Contiguous memory for all activations. -/
  actsMemory : Array Float
  /-- Total number of activations. -/
  numActivations : Nat
  /-- Gradients of activations. -/
  gradsActs : ActivationTensors
  /-- Contiguous memory for all activation gradients. -/
  gradsActsMemory : Array Float
  /-- Batch size (B) of current forward pass. -/
  batchSize : Nat
  /-- sequence length (T) of current forward pass -/
  seqLen : Nat
  /-- Input tokens for current forward pass. -/
  inputs : Array Int
  /-- Target tokens for current forward pass. -/
  targets : Array Int
  /-- Mean loss after forward pass with targets. -/
  meanLoss : Float
  deriving Repr

end LLM.GPT2

If I comment out the deriving this loads in infoview instantly, but with deriving it's going long enough for me to type this out

Alok Singh (Mar 27 2025 at 07:51):

2025-03-27-00-51-29.png

no highlights for numerals with underscores in them

Alok Singh (May 01 2025 at 19:11):

I feel missing bys should have a clearer error message than unknown identifier: 'tacticName'. A minor papercut for beginners

Isak Colboubrani (May 01 2025 at 20:51):

@Alok Singh Like in example : 1 + 1 = 2 := simp? I've thought about that minor paper cut too. Would the suggested logic be that if x in say example : 1 + 1 = 2 := x is a tactic, then a custom error message would be written? If so, what would an appropriate error message be?

Alok Singh (May 01 2025 at 22:04):

Checking namespaces to see if it's a tactic would be good, maybe a good error message is (with range info) "unknown identifier (..). Hint: a tactic named tacticName was found. To use it, put it in a by block".

theres some burden of explaining what a by block is, but i think that's more manageable, and it could link to documentation about them.

Last updated: May 02 2025 at 03:31 UTC