The Little Book of Maths for LLMs

Version 0.1.0

Author

Duc-Tam Nguyen

Published

October 4, 2025

Content

A Friendly Guide from Numbers to Neural Networks

Licensed under CC BY-NC-SA 4.0.

Chapter 1. Numbers and Meanings (1–10)

  1. What Does a Number Mean in a Model
  2. Counting, Tokens, and Frequency
  3. Integers, Floats, and Precision
  4. Scaling and Normalization
  5. Vectors as Collections of Numbers
  6. Coordinate Systems and Representations
  7. Measuring Distance and Similarity
  8. Mean, Variance, and Standard Deviation
  9. Distributions of Token Frequencies
  10. Why Numbers Alone Aren’t Enough

Chapter 2. Algebra and Structure (11–20)

  1. Variables, Symbols, and Parameters
  2. Linear Equations and Model Layers
  3. Matrix Multiplication and Transformations
  4. Systems of Equations in Neural Layers
  5. Vector Spaces and Linear Independence
  6. Basis, Span, and Dimensionality
  7. Rank, Null Space, and Information
  8. Linear Maps as Functions
  9. Eigenvalues and Directions of Change
  10. Why Linear Algebra Is Everywhere in LLMs

Chapter 3. Calculus of Change (21–30)

  1. Functions and Flows of Information
  2. Limits and Continuity in Models
  3. Derivatives and Sensitivity
  4. Partial Derivatives in Multi-Input Models
  5. Gradient as Direction of Learning
  6. Chain Rule and Backpropagation
  7. Integration and Accumulation of Signals
  8. Surfaces, Slopes, and Loss Landscapes
  9. Second Derivatives and Curvature
  10. Why Calculus Powers Optimization

Chapter 4. Probability and Uncertainty (31–40)

  1. What Is Probability for a Model
  2. Random Variables and Token Sampling
  3. Probability Distributions and Softmax
  4. Expectation and Average Predictions
  5. Variance and Entropy
  6. Bayes’ Rule and Conditional Meaning
  7. Joint and Marginal Distributions
  8. Independence and Correlation
  9. Information Gain and Surprise
  10. Why Uncertainty Matters in LLMs

Chapter 5. Statistics and Estimation (41–50)

  1. Sampling and Datasets
  2. Histograms and Frequency Counts
  3. Mean, Median, and Robustness
  4. Variance, Bias, and Noise
  5. Estimators and Consistency
  6. Confidence and Significance
  7. Hypothesis Testing for Model Validation
  8. Regression as Pattern Fitting
  9. Correlation vs Causation
  10. Why Statistics Grounds Model Evaluation

Chapter 6. Geometry of Thought (51–60)

  1. Points, Spaces, and Embeddings
  2. Inner Products and Angles
  3. Orthogonality and Independence
  4. Norms and Lengths in Vector Space
  5. Projections and Attention
  6. Manifolds and Curved Spaces
  7. Clusters and Concept Neighborhoods
  8. High-Dimensional Geometry
  9. Similarity Search and Vector Databases
  10. Why Geometry Reveals Meaning

Chapter 7. Optimization and Learning (61–70)

  1. What Is Optimization
  2. Objective Functions and Loss
  3. Gradient Descent and Its Variants
  4. Momentum, RMSProp, Adam
  5. Learning Rates and Convergence
  6. Local Minima and Saddle Points
  7. Regularization and Generalization
  8. Overfitting and Bias-Variance Trade-off
  9. Stochastic vs Batch Training
  10. Why Learning Is an Optimization Journey

Chapter 8. Discrete Math and Graphs (71–80)

  1. Sets, Relations, and Mappings
  2. Combinatorics and Counting Tokens
  3. Graphs and Networks
  4. Trees, DAGs, and Computation Graphs
  5. Paths and Connectivity
  6. Dynamic Programming Intuitions
  7. Sequences and Recurrence
  8. Finite Automata and Token Flows
  9. Graph Attention and Dependencies
  10. Why Discrete Math Shapes Transformers

Chapter 9. Information and Entropy (81–90)

  1. What Is Information
  2. Bits and Shannon Entropy
  3. Cross-Entropy Loss Explained
  4. Mutual Information and Alignment
  5. Compression and Language Modeling
  6. Perplexity as Predictive Power
  7. KL Divergence and Distillation
  8. Coding Theory and Efficiency
  9. Noisy Channels and Robustness
  10. Why Information Theory Guides Training

Chapter 10. Advanced Math for Modern Models (91–100)

  1. Linear Operators and Functional Spaces
  2. Spectral Theory and Decompositions
  3. Singular Value Decomposition (SVD)
  4. Fourier Transforms and Attention
  5. Convolutions and Signal Processing
  6. Differential Equations in Dynamics
  7. Tensor Algebra and Multi-Mode Data
  8. Manifold Learning and Representation
  9. Category Theory and Compositionality
  10. The Mathematical Heart of LLMs