Skip to content

Instantly share code, notes, and snippets.

Info about my latest training run 2025-12-14

Hello there! I’m trying to train a custom LLM similar to Andrej Karpathy’s nanogpt and nanochat tutorials. My issue is that training loss and gradient norms go to nearly zero after around a hundred steps. I’m using the MLX framework on an M1 Max.

Code, raw logs, graphs of the training loss and validation loss, and gradient norms and raw csv data are all available on this github gist: https://gist.github.com/iankronquist/68bc7e51178aef47dd225074e5310814#file-trainingruninfo-md

I have a rather llama like architecture with rope. Unlike llama I am using gelu (like gpt2) instead of swiglu in the MLP to save a few parameters on the gate matrices. I’m using a embedding dimension of 768 and 12 layers, and an mlp up projection ratio of 4, and group query attention with a key value head ratio of 4 (all like gpt2 small and llama). I’m using the gpt2 tokenizer with a vocab dimension of 50304. This comes out to around 114M parameters and seems like I’m on the beaten path f

@iankronquist
iankronquist / fineweb_data_loader.py
Created December 15, 2025 00:04
igptv4 llm training 2025-12-14
'''
We don't have enough disk to unpack tokenized copies of the fineweb dataset, so tokenize as we go.
'''
import os
import time
import tiktoken
import random
.intel_syntax noprefix
# GDT:
# 0x00 NULL
# 0x10 32 bit code
# 0x18 32 bit data
# 0x20 16 bit code ; 64kb limit
# 0x28 16 bit data ; 64kb limit
.extern BootDrive
.extern halt
@iankronquist
iankronquist / elf.rs
Created November 7, 2020 07:09
dumb elf loader
#![feature(asm)]
#![allow(unused)]
const EI_CLASS: u8 = 4;
const EI_NIDENT: usize = 16;
#[derive(Default,Debug, Copy, Clone)]
#[repr(C,packed)]
struct Elf64Header {
ident: [u8;EI_NIDENT],
type_: u16,
#define MPP_MACHINE (0b11 << 11)
#define MPP_SUPERVISOR (0b01 << 11)
#define SPP_SUPERVISOR (1 << 8)
#define MPIE_YES (1 << 7)
#define SPIE_YES (1 << 5)
#define MIE_YES (1 << 3)
#define SIE_YES (1 << 1)
/* Machine external interrupt enable */
#define MIE_MEIE (1 << 11)
@iankronquist
iankronquist / i3toswayarchmigration.md
Created October 20, 2017 18:16
Migrating from i3 to Sway on Arch Linux

Migrating from i3 to Sway on Arch Linux

Refer to the arch wiki: https://wiki.archlinux.org/index.php/Sway

  1. Install packages: pacman -S sway weston
  2. Copy configuration:
mkdir -p ~/.config/sway
cp ~/.i3/config ~/.config/sway/config
  1. When you log in, start sway:
@iankronquist
iankronquist / 0-Intro.md
Last active May 7, 2017 00:04
A Young Lady's C++ Primer

A Young Lady's C++ Primer

(I have been enjoying The Diamond Age, thank you)

C++ was developed in 198X by Bjourne Strousap. It is an improved version of the venerable C programming language. C is excellent at describing low level details in a way which is portable across computers. It is the most influential language of our lifetimes, but unless you're writing an operating system, a hypervisor (AKA Virtual Machine Monitor), or working on an embedded system on a tiny ass microcontroller, it's probably not the right tool for the job.

@iankronquist
iankronquist / 0-Programming-Paradigms.md
Last active December 8, 2025 18:09
The Fundamentals of Programming

Programming Paradigms

In programming, a paradigm is an abstract way to understand and solve a problem. A paradigm is like a perspective, a high point from which you can survey the terrain and try to decide the path your journey will take.

Toay, there are three major programming paradigms:

  1. Imperative Programming.
  2. Object Oriented Programming (OOP).
  3. Functional Programming (FP).

In principle any language can be used to program in any paradigm, but in practice certain languages tend to favor certain paradigms.

@iankronquist
iankronquist / page fault.txt
Created November 1, 2016 22:46
qemu mapping failure
This file has been truncated, but you can view the full file.
CPU Reset (CPU 0)
EAX=00000000 EBX=00000000 ECX=00000000 EDX=00000000
ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000
EIP=00000000 EFL=00000000 [-------] CPL=0 II=0 A20=0 SMM=0 HLT=0
ES =0000 00000000 00000000 00000000
CS =0000 00000000 00000000 00000000
SS =0000 00000000 00000000 00000000
DS =0000 00000000 00000000 00000000
FS =0000 00000000 00000000 00000000
@iankronquist
iankronquist / 0_typing.md
Last active February 26, 2022 17:45
An Introduction to Python

Typing

When programmers talk about typing, most of the time they aren't talking about the odious task of pressing keys on a keyboard (watch any programmer and look to see how much of their time they spend actually typing out code. What you'll see instead is a lot of frowning and staring at the screen with an expression of great consternation as you can see them think "why the hell didn't my code do what I thought?"). Instead they're talking about the types of variables. Now you're probably familiar with the idea that there are numbers and strings and