Zulip Chat Archive

Stream: new members

Topic: alok singh


Alok Singh (Mar 24 2025 at 00:39):

just a log for my lean work to start

Alok Singh (Mar 24 2025 at 00:40):

never thought i'd say it but so far i like lakefile.lean > lakefile.toml bc AI agents can use the type info to fix their broken builds

Alok Singh (Mar 24 2025 at 07:42):

deriving Repr can be slow

/-!
# GPT2 Types

This module defines the core types and structures for the GPT-2 model,
ported from the C implementation in llm.c by Andrej Karpathy.
-/

namespace LLM.GPT2

/--
Configuration for a GPT-2 model.
-/
structure Config where
  /-- Max sequence length, e.g. 1024 -/
  maxSeqLen : Nat
  /-- Vocab size, e.g. 50257 -/
  vocabSize : Nat
  /-- Padded vocab size, e.g. 50304 -/
  paddedVocabSize : Nat
  /-- Number of layers, e.g. 12 -/
  numLayers : Nat
  /-- Number of heads in attention, e.g. 12 -/
  numHeads : Nat
  /-- Number of channels, e.g. 768 -/
  channels : Nat
  deriving Repr

/--
Parameter tensors for a GPT-2 model.
-/
structure ParameterTensors where
  /-- Token embeddings. Shape: `(V, C)` -/
  wte : Array Float
  /-- Position embeddings. Shape: `(maxT, C)` -/
  wpe : Array Float
  /-- Layer norm 1 weights. Shape: `(L, C)` -/
  ln1w : Array Float
  /-- Layer norm 1 biases. Shape: `(L, C)` -/
  ln1b : Array Float
  /-- QKV projection weights. Shape: `(L, 3*C, C)` -/
  qkvw : Array Float
  /-- QKV projection biases. Shape: `(L, 3*C)` -/
  qkvb : Array Float
  /-- Attention projection weights. Shape: `(L, C, C)` -/
  attprojw : Array Float
  /-- Attention projection biases. Shape: `(L, C)` -/
  attprojb : Array Float
  /-- Layer norm 2 weights. Shape: `(L, C)` -/
  ln2w : Array Float
  /-- Layer norm 2 biases. Shape: `(L, C)` -/
  ln2b : Array Float
  /-- Fully connected weights. Shape: `(L, 4*C, C)` -/
  fcw : Array Float
  /-- Fully connected biases. Shape: `(L, 4*C)` -/
  fcb : Array Float
  /-- Fully connected projection weights. Shape: `(L, C, 4*C)` -/
  fcprojw : Array Float
  /-- Fully connected projection biases. Shape: `(L, C)` -/
  fcprojb : Array Float
  /-- Final layer norm weights. Shape: `(C)` -/
  lnfw : Array Float
  /-- Final layer norm biases. Shape: `(C)` -/
  lnfb : Array Float
  deriving Repr

/--
Activation tensors for a forward pass of the GPT-2 model.
-/
structure ActivationTensors where
  /-- encoded tokens. Shape: `(B, T, C)` -/
  encoded : Array Float
  /-- layer norm 1: Shape: `(L, B, T, C)` -/
  ln1 : Array Float
  /-- Layer norm 1 mean. Shape: `(L, B, T)` -/
  ln1Mean : Array Float
  /-- Layer norm 1 rstd. Shape: `(L, B, T)` -/
  ln1Rstd : Array Float
  /-- QKV. Shape: `(L, B, T, 3*C)` -/
  qkv : Array Float
  /-- Attention. Shape: `(L, B, NH, T, T)` -/
  preatt : Array Float
  /-- Attention. Shape: `(L, B, NH, T, T)` -/
  att : Array Float
  /-- Attention projection. Shape: `(L, B, T, C)` -/
  attproj : Array Float
  /-- Residual 2. Shape: `(L, B, T, C)` -/
  residual2 : Array Float
  /-- Layer norm 2. Shape: `(L, B, T, C)` -/
  ln2 : Array Float
  /-- Layer norm 2 mean. Shape: `(L, B, T)` -/
  ln2Mean : Array Float
  /-- Layer norm 2 rstd. Shape: `(L, B, T)` -/
  ln2Rstd : Array Float
  /-- Fully connected. Shape: `(L, B, T, 4*C)` -/
  fch : Array Float
  /-- Fully connected gelu. Shape: `(L, B, T, 4*C)` -/
  fchGelu : Array Float
  /-- Fully connected projection. Shape: `(L, B, T, C)` -/
  fcproj : Array Float
  /-- Residual 3. Shape: `(L, B, T, C)` -/
  residual3 : Array Float
  /-- Final layer norm. Shape: `(B, T, C)` -/
  lnf : Array Float
  /-- Final layer norm mean. Shape: `(B, T)` -/
  lnfMean : Array Float
  /-- Final layer norm rstd. Shape: `(B, T)` -/
  lnfRstd : Array Float
  /-- Logits. Shape: `(B, T, V)` -/
  logits : Array Float
  /-- Probabilities. Shape: `(B, T, V)` -/
  probs : Array Float
  /-- Losses. Shape: `(B, T)` -/
  losses : Array Float
  deriving Repr

/--
The full GPT-2 model including configuration, parameters, activations, and gradients.
-/
structure GPT2 where
  /-- Model configuration. -/
  config : Config
  /-- Model parameters. -/
  params : ParameterTensors
  /-- Sizes of each parameter tensor. -/
  paramSizes : Array Nat
  /-- Contiguous memory for all parameters. -/
  paramsMemory : Array Float
  /-- Total number of parameters. -/
  numParameters : Nat
  /-- Gradients of parameters. -/
  grads : ParameterTensors
  /-- Contiguous memory for all gradients. -/
  gradsMemory : Array Float
  /-- `AdamW` optimizer `m` buffer. -/
  mMemory : Array Float
  /-- `AdamW` optimizer `v` buffer. -/
  vMemory : Array Float
  /-- Model activations. -/
  acts : ActivationTensors
  /-- Sizes of each activation tensor. -/
  actSizes : Array Nat
  /-- Contiguous memory for all activations. -/
  actsMemory : Array Float
  /-- Total number of activations. -/
  numActivations : Nat
  /-- Gradients of activations. -/
  gradsActs : ActivationTensors
  /-- Contiguous memory for all activation gradients. -/
  gradsActsMemory : Array Float
  /-- Batch size (B) of current forward pass. -/
  batchSize : Nat
  /-- sequence length (T) of current forward pass -/
  seqLen : Nat
  /-- Input tokens for current forward pass. -/
  inputs : Array Int
  /-- Target tokens for current forward pass. -/
  targets : Array Int
  /-- Mean loss after forward pass with targets. -/
  meanLoss : Float
  deriving Repr

end LLM.GPT2

If I comment out the deriving this loads in infoview instantly, but with deriving it's going long enough for me to type this out

Alok Singh (Mar 27 2025 at 07:51):

2025-03-27-00-51-29.png

no highlights for numerals with underscores in them

Alok Singh (May 01 2025 at 19:11):

I feel missing bys should have a clearer error message than unknown identifier: 'tacticName'. A minor papercut for beginners

Isak Colboubrani (May 01 2025 at 20:51):

@Alok Singh Like in example : 1 + 1 = 2 := simp? I've thought about that minor paper cut too. Would the suggested logic be that if x in say example : 1 + 1 = 2 := x is a tactic, then a custom error message would be written? If so, what would an appropriate error message be?

Alok Singh (May 01 2025 at 22:04):

Checking namespaces to see if it's a tactic would be good, maybe a good error message is (with range info) "unknown identifier (..). Hint: a tactic named tacticName was found. To use it, put it in a by block".

theres some burden of explaining what a by block is, but i think that's more manageable, and it could link to documentation about them.

GasStationManager (May 03 2025 at 13:53):

@Alok Singh I'm running into a similar slow down that is possibly related to deriving Repr. What is making it slow? I'm trying to use Plausible's #sample so the type would need Repr; could I do a custom Repr? But would it be faster?

Alok Singh (May 12 2025 at 14:57):

ByteArray doesn't have decidable eq so i impl that, but in doing so i found an unexpected wart in the syntax. the := works but not where. I can see why but if i was starting out no way i'd have guessed that

def ByteArray.decEq (x y : ByteArray) : Decidable (Eq x y) :=
  match x, y with
  | n, m =>
    dite (Eq n m)
      (fun h => isTrue (h  rfl))
      (fun h => isFalse (fun h' => ByteArray.noConfusion h' (fun h' => absurd h' h)))


/-- This is fine-/
instance : DecidableEq ByteArray := ByteArray.decEq

/--invalid {...} notation, expected type is not of the form (C ...)
  DecidableEq ByteArray -/
instance : DecidableEq ByteArray where
  decide a b := ByteArray.decEq a b

Edward van de Meent (May 12 2025 at 15:19):

what do you mean? that it doesn't work in def ByteArray.decEq?

Edward van de Meent (May 12 2025 at 15:20):

what would you even want to write there?

Alok Singh (May 12 2025 at 15:20):

i added doccomments

Edward van de Meent (May 12 2025 at 15:21):

also, it seems to me like this instance exists in batteries already (docs#ByteArray.instDecidableEq_batteries)

Niels Voss (May 13 2025 at 03:05):

where is not allowed because DecidableEq ByteArray is a function type, and where usually only works on structure types

Alok Singh (May 13 2025 at 06:31):

I understood the reason before but damn would it have been esoteric to me a year ago

Tomas Skrivan (May 13 2025 at 07:31):

For a similar reason you can't use extends DecitableEq when defining a new structure. Yeah it is a bit confusing

Alok Singh (May 13 2025 at 19:05):

I hadn’t realized yesterday but this really bothered me, judging by my dreams last night. I think because using := is deprecated for structures but works in both cases. So if I just use something that will solve my problem it’s discouraged. I realized I don’t care at all about the principle behind it because spending such time mentally between where and := just sucks. principles that get me here sucked.

Alok Singh (May 23 2025 at 08:13):

/--a-/
require LSpec from git
  "https://github.com/argumentcomputer/LSpec.git" @ "main"


/-- Main package configuration with Lean options for unicode functions, auto implicits, and documentation requirements. -/
package sqlite where
  leanOptions := #[
    `pp.unicode.fun, true,
    `autoImplicit, true,
    `relaxedAutoImplicit, false,
    `linter.missingDocs, true
  ]

/-- Library configuration that includes all Sqlite modules, Examples, and Tests for comprehensive testing and examples. -/
lean_lib Sqlite where
  globs := #[Lake.Glob.andSubmodules `Sqlite, Lake.Glob.andSubmodules `Examples, Lake.Glob.andSubmodules `Tests]

/-- Main executable entry point with SQLite3 linking and Homebrew library paths for macOS. -/
@[default_target]
lean_exe sqlite where
  root := `Main
  supportInterpreter := true
  moreLinkArgs := #["-lsqlite3", "-L/opt/homebrew/lib", "-L/opt/homebrew/opt/sqlite/lib"]

/-- Test executable specifically for SQLite functionality with interpreter support and SQLite3 linking. -/
lean_exe Tests.Sqlite where
  supportInterpreter := true
  moreLinkArgs := #["-lsqlite3", "-L/opt/homebrew/lib", "-L/opt/homebrew/opt/sqlite/lib"]

/-- Core test suite executable with SQLite3 integration and interpreter support. -/
lean_exe Tests.Core where
  root := `Tests.Core

the fact that i can put doc comments on all these and nothing seems to happen but it compiles makes me want to put them on namespaces and imports and opens just to make ai happy


Last updated: Dec 20 2025 at 21:32 UTC