The Little Book of Maths for LLMs

Version 0.1.0

Author

Duc-Tam Nguyen

Published

October 4, 2025

Content

A Friendly Guide from Numbers to Neural Networks

Download PDF - print-ready
Download EPUB - e-reader friendly
View LaTex - .tex source
Source code (Github) - Markdown source
Read on GitHub Pages - view online

Licensed under CC BY-NC-SA 4.0.

Chapter 1. Numbers and Meanings (1–10)

What Does a Number Mean in a Model
Counting, Tokens, and Frequency
Integers, Floats, and Precision
Scaling and Normalization
Vectors as Collections of Numbers
Coordinate Systems and Representations
Measuring Distance and Similarity
Mean, Variance, and Standard Deviation
Distributions of Token Frequencies
Why Numbers Alone Aren’t Enough

Chapter 2. Algebra and Structure (11–20)

Variables, Symbols, and Parameters
Linear Equations and Model Layers
Matrix Multiplication and Transformations
Systems of Equations in Neural Layers
Vector Spaces and Linear Independence
Basis, Span, and Dimensionality
Rank, Null Space, and Information
Linear Maps as Functions
Eigenvalues and Directions of Change
Why Linear Algebra Is Everywhere in LLMs

Chapter 3. Calculus of Change (21–30)

Functions and Flows of Information
Limits and Continuity in Models
Derivatives and Sensitivity
Partial Derivatives in Multi-Input Models
Gradient as Direction of Learning
Chain Rule and Backpropagation
Integration and Accumulation of Signals
Surfaces, Slopes, and Loss Landscapes
Second Derivatives and Curvature
Why Calculus Powers Optimization

Chapter 4. Probability and Uncertainty (31–40)

What Is Probability for a Model
Random Variables and Token Sampling
Probability Distributions and Softmax
Expectation and Average Predictions
Variance and Entropy
Bayes’ Rule and Conditional Meaning
Joint and Marginal Distributions
Independence and Correlation
Information Gain and Surprise
Why Uncertainty Matters in LLMs

Chapter 5. Statistics and Estimation (41–50)

Sampling and Datasets
Histograms and Frequency Counts
Mean, Median, and Robustness
Variance, Bias, and Noise
Estimators and Consistency
Confidence and Significance
Hypothesis Testing for Model Validation
Regression as Pattern Fitting
Correlation vs Causation
Why Statistics Grounds Model Evaluation

Chapter 6. Geometry of Thought (51–60)

Points, Spaces, and Embeddings
Inner Products and Angles
Orthogonality and Independence
Norms and Lengths in Vector Space
Projections and Attention
Manifolds and Curved Spaces
Clusters and Concept Neighborhoods
High-Dimensional Geometry
Similarity Search and Vector Databases
Why Geometry Reveals Meaning

Chapter 7. Optimization and Learning (61–70)

What Is Optimization
Objective Functions and Loss
Gradient Descent and Its Variants
Momentum, RMSProp, Adam
Learning Rates and Convergence
Local Minima and Saddle Points
Regularization and Generalization
Overfitting and Bias-Variance Trade-off
Stochastic vs Batch Training
Why Learning Is an Optimization Journey

Chapter 8. Discrete Math and Graphs (71–80)

Sets, Relations, and Mappings
Combinatorics and Counting Tokens
Graphs and Networks
Trees, DAGs, and Computation Graphs
Paths and Connectivity
Dynamic Programming Intuitions
Sequences and Recurrence
Finite Automata and Token Flows
Graph Attention and Dependencies
Why Discrete Math Shapes Transformers

Chapter 9. Information and Entropy (81–90)

What Is Information
Bits and Shannon Entropy
Cross-Entropy Loss Explained
Mutual Information and Alignment
Compression and Language Modeling
Perplexity as Predictive Power
KL Divergence and Distillation
Coding Theory and Efficiency
Noisy Channels and Robustness
Why Information Theory Guides Training

Chapter 10. Advanced Math for Modern Models (91–100)

Linear Operators and Functional Spaces
Spectral Theory and Decompositions
Singular Value Decomposition (SVD)
Fourier Transforms and Attention
Convolutions and Signal Processing
Differential Equations in Dynamics
Tensor Algebra and Multi-Mode Data
Manifold Learning and Representation
Category Theory and Compositionality
The Mathematical Heart of LLMs