A complete hands-on implementation of a GPT-style transformer LLM — trained entirely from scratch on Apple Silicon M1. Built as a learning project to understand how large language models actually work under the hood.
Installing dependencies failed immediately — the yaml module was missing. The PyPI package is pyyaml but imports as yaml — a common gotcha.
ModuleNotFoundError: No module named 'yaml'
# Fix
pip3 install pyyaml
# Note: the package is 'pyyaml' but you import it as 'import yaml'
# Full dependency install
python3 -m pip install -q --break-system-packages \
torch transformers datasets tokenizers sentencepiece \
tqdm pyyaml numpy safetensorsbash
cd elsewhere.A full audit of all pipeline scripts revealed 4 bugs. All were fixed before training began.
actual_vocab_size UndefinedThe variable was referenced in GPTConfig before it was assigned. The tokenizer needed to be loaded first to get the real vocab size.
from tokenizers import Tokenizer as _Tokenizer
_tok = _Tokenizer.from_file(str(_tok_dir / 'tokenizer.json'))
actual_vocab_size = _tok.get_vocab_size()
config = GPTConfig(vocab_size=actual_vocab_size, ...) # ✅ now definedpython
openwebtext was listed as a valid CLI option with no implementation. Added download_openwebtext() using streaming to avoid loading the full dataset into memory.
bos_token_id and eos_token_id were hardcoded to GPT-2's value of 50256. Our custom tokenizer has a different vocab size.
actual_vocab_size = cfg.get('vocab_size', 50257)
eos_token_id = actual_vocab_size - 1 # last token = endoftextpython
The try block imported RobertaProcessing which was never used. On the fallback path this import was missing, causing an ImportError on first install.
# AFTER ✅
from tokenizers import Tokenizer, models, trainers, pre_tokenizers, decoderspython
python3 run_pipeline.py --config configs/tiny.yamlbash
# Model architecture (tiny config)
n_layer: 4 # transformer blocks
n_head: 4 # attention heads
n_embd: 256 # embedding dimension
epochs: 2
batch_size: 8
context_len: 512yaml
| Split | Size |
|---|---|
| Training set | 47,500,000 characters |
| Validation set | 2,500,000 characters |
| Tokenizer vocab | 20,712 tokens (custom BPE) |
| Total parameters | 30,142,848 |
| Metric | Meaning |
|---|---|
loss | Cross-entropy loss — lower is better; below 2.0 means coherent text generation |
ppl | Perplexity — how "surprised" the model is by the next token |
lr | Current learning rate (decays over time via scheduler) |
step/s | Training throughput in steps per second |
eta | Estimated time remaining |
# Run only step 3 (tokenizer training)
python3 run_pipeline.py --config configs/tiny.yaml --only-step 3
# Run steps 4–11 sequentially
for step in 4 5 6 7 8 9 10 11; do
python3 run_pipeline.py --config configs/tiny.yaml --only-step $step
donebash
| File | Purpose |
|---|---|
server.py | Flask server — loads the model and serves the UI |
templates/index.html | Dark terminal-styled chat interface |
finetune_from_logs.py | Retrains the model on logged conversations |
pip3 install flask --break-system-packages
cd llm_chat
python3 server.py --checkpoint ../BuildYourLLM_fixed/output/tiny/checkpoints/best.pt
# Open → http://localhost:5000bash
{
"timestamp": "2026-03-06T14:30:00",
"session_id": "sess_abc123",
"prompt": "Once upon a time there was",
"response": "a little fox who lived in the forest..."
}json
cd llm_chat
python3 finetune_from_logs.py \
--checkpoint ../BuildYourLLM_fixed/output/tiny/checkpoints/best.pt \
--logs logs/conversations.jsonl --epochs 1
python3 server.py --checkpoint .../finetuned_chat/finetuned_best.ptbash
Fine-tuning uses LoRA (Low-Rank Adaptation) — only ~1% of weights are updated, base weights are frozen (no catastrophic forgetting).
| Approach | How It Works | Best For |
|---|---|---|
| Periodic LoRA | Fine-tune on conversation logs every N days | Most practical; low cost |
| Instruction tuning | Train on curated prompt/response JSONL pairs | Teaching Q&A behaviour |
| Full fine-tuning | Update all weights on new data | Maximum quality; high cost |
| ✅ Can Do | ❌ Cannot Do |
|---|---|
| Continue a story from a prompt | Answer factual questions accurately |
| Generate fluent, grammatical text | Follow complex instructions |
| Produce children's story-style prose | Hold a coherent multi-turn conversation |
| Run entirely on-device (M1 Mac) | Reason or plan like a large model |
# Full pipeline
python3 run_pipeline.py --config configs/tiny.yaml
# Skip training, use existing checkpoint
python3 run_pipeline.py --config configs/tiny.yaml --skip-train
# Start chat server
cd llm_chat && python3 server.py \
--checkpoint ../BuildYourLLM_fixed/output/tiny/checkpoints/best.pt
# Fine-tune on logged conversations
python3 finetune_from_logs.py \
--checkpoint .../best.pt \
--logs logs/conversations.jsonl --epochs 1bash