The Little Book of Maths for LLMs
Version 0.1.0
Content
A Friendly Guide from Numbers to Neural Networks
- Download PDF - print-ready
- Download EPUB - e-reader friendly
- View LaTex -
.tex
source - Source code (Github) - Markdown source
- Read on GitHub Pages - view online
Licensed under CC BY-NC-SA 4.0.
Chapter 1. Numbers and Meanings (1–10)
- What Does a Number Mean in a Model
- Counting, Tokens, and Frequency
- Integers, Floats, and Precision
- Scaling and Normalization
- Vectors as Collections of Numbers
- Coordinate Systems and Representations
- Measuring Distance and Similarity
- Mean, Variance, and Standard Deviation
- Distributions of Token Frequencies
- Why Numbers Alone Aren’t Enough
Chapter 2. Algebra and Structure (11–20)
- Variables, Symbols, and Parameters
- Linear Equations and Model Layers
- Matrix Multiplication and Transformations
- Systems of Equations in Neural Layers
- Vector Spaces and Linear Independence
- Basis, Span, and Dimensionality
- Rank, Null Space, and Information
- Linear Maps as Functions
- Eigenvalues and Directions of Change
- Why Linear Algebra Is Everywhere in LLMs
Chapter 3. Calculus of Change (21–30)
- Functions and Flows of Information
- Limits and Continuity in Models
- Derivatives and Sensitivity
- Partial Derivatives in Multi-Input Models
- Gradient as Direction of Learning
- Chain Rule and Backpropagation
- Integration and Accumulation of Signals
- Surfaces, Slopes, and Loss Landscapes
- Second Derivatives and Curvature
- Why Calculus Powers Optimization
Chapter 4. Probability and Uncertainty (31–40)
- What Is Probability for a Model
- Random Variables and Token Sampling
- Probability Distributions and Softmax
- Expectation and Average Predictions
- Variance and Entropy
- Bayes’ Rule and Conditional Meaning
- Joint and Marginal Distributions
- Independence and Correlation
- Information Gain and Surprise
- Why Uncertainty Matters in LLMs
Chapter 5. Statistics and Estimation (41–50)
- Sampling and Datasets
- Histograms and Frequency Counts
- Mean, Median, and Robustness
- Variance, Bias, and Noise
- Estimators and Consistency
- Confidence and Significance
- Hypothesis Testing for Model Validation
- Regression as Pattern Fitting
- Correlation vs Causation
- Why Statistics Grounds Model Evaluation
Chapter 6. Geometry of Thought (51–60)
- Points, Spaces, and Embeddings
- Inner Products and Angles
- Orthogonality and Independence
- Norms and Lengths in Vector Space
- Projections and Attention
- Manifolds and Curved Spaces
- Clusters and Concept Neighborhoods
- High-Dimensional Geometry
- Similarity Search and Vector Databases
- Why Geometry Reveals Meaning
Chapter 7. Optimization and Learning (61–70)
- What Is Optimization
- Objective Functions and Loss
- Gradient Descent and Its Variants
- Momentum, RMSProp, Adam
- Learning Rates and Convergence
- Local Minima and Saddle Points
- Regularization and Generalization
- Overfitting and Bias-Variance Trade-off
- Stochastic vs Batch Training
- Why Learning Is an Optimization Journey
Chapter 8. Discrete Math and Graphs (71–80)
- Sets, Relations, and Mappings
- Combinatorics and Counting Tokens
- Graphs and Networks
- Trees, DAGs, and Computation Graphs
- Paths and Connectivity
- Dynamic Programming Intuitions
- Sequences and Recurrence
- Finite Automata and Token Flows
- Graph Attention and Dependencies
- Why Discrete Math Shapes Transformers
Chapter 9. Information and Entropy (81–90)
- What Is Information
- Bits and Shannon Entropy
- Cross-Entropy Loss Explained
- Mutual Information and Alignment
- Compression and Language Modeling
- Perplexity as Predictive Power
- KL Divergence and Distillation
- Coding Theory and Efficiency
- Noisy Channels and Robustness
- Why Information Theory Guides Training
Chapter 10. Advanced Math for Modern Models (91–100)
- Linear Operators and Functional Spaces
- Spectral Theory and Decompositions
- Singular Value Decomposition (SVD)
- Fourier Transforms and Attention
- Convolutions and Signal Processing
- Differential Equations in Dynamics
- Tensor Algebra and Multi-Mode Data
- Manifold Learning and Representation
- Category Theory and Compositionality
- The Mathematical Heart of LLMs