How LLMs Work: Interactive Visual Guide Based on Karpathy's Lecture

🧠 How Do LLMs Like ChatGPT Actually Work?

This interactive guide, based on Andrej Karpathy’s technical deep dive, walks you through the entire process: from downloading the internet to building a conversational assistant.

🌐 Data Collection — Billions of web pages are crawled, filtered, and cleaned down to ~44 TB of high-quality text (FineWeb).

🔤 Tokenization — Text is split into sub-word chunks (tokens). GPT-4 uses a vocabulary of 100K tokens built with the BPE algorithm.

⚙️ Pre-training — A Transformer neural network learns to predict the next token, tuning billions of parameters over months of compute.

🤖 Base Model — The result is a text simulator: it completes sequences in a sophisticated way, but it is not yet an assistant.

🎓 Post-training (SFT + RLHF) — Humans create ideal conversations and rank responses. The model learns to imitate the best possible human labeler.

🌀 LLM Psychology — It hallucinates because it was always trained with confident answers. It has no persistent memory. Every conversation starts fresh.

📚 RAG — For up-to-date data, relevant documents are retrieved and injected into context before generating the answer.

💡 Explanation in a nutshell
#

An LLM is essentially a system that learned to predict the next word by reading trillions of internet texts. It was then “fine-tuned” by humans to behave like a helpful assistant. Every response it generates is like flipping a very informed coin: it picks the most likely word, one by one, until the answer is complete.

How LLMs Work — A Visual Deep Dive

ynarwal.github.io ↗

Also published on LinkedIn.

Author

Juan Pedro Bretti Mandarano

💡 Explanation in a nutshell#

How LLMs Work — A Visual Deep Dive

💡 Explanation in a nutshell
#