The Book

Download PDF - print-ready
Download EPUB - e-reader friendly
View LaTex - .tex source
Source code (Github) - Markdown source
Read on GitHub Pages - view online

Licensed under CC BY-NC-SA 4.0.

Chapter 1. Foundations of algorithms

1. What Is an Algorithm?

Let’s start at the beginning. Before code, data, or performance, we need a clear idea of what an algorithm really is.

An algorithm is a clear, step-by-step procedure to solve a problem. Think of it like a recipe: you have inputs (ingredients), a series of steps (instructions), and an output (the finished dish).

At its core, an algorithm should be:

Precise: every step is well defined and unambiguous
Finite: it finishes after a limited number of steps
Effective: each step is simple enough to carry out
Deterministic (usually): the same input gives the same output

When you write an algorithm, you are describing how to get from question to answer, not just what the answer is.

Example: Sum from 1 to (n)

Suppose you want the sum of the numbers from 1 to (n).

Natural language steps

Set total = 0
For each i from 1 to n, add i to total
Return total

Pseudocode

Algorithm SumToN(n):
    total ← 0
    for i ← 1 to n:
        total ← total + i
    return total

C code

int sum_to_n(int n) {
    int total = 0;
    for (int i = 1; i <= n; i++) {
        total += i;
    }
    return total;
}

Tiny Code

Try a quick run by hand with (n = 5):

start total = 0
add 1 → total = 1
add 2 → total = 3
add 3 → total = 6
add 4 → total = 10
add 5 → total = 15

Output is 15.

You will also see this closed-form formula soon:

\[ 1 + 2 + 3 + \dots + n = \frac{n(n+1)}{2} \]

Why It Matters

Algorithms are the blueprints of computation. Every program, from a calculator to an AI model, is built from algorithms. Computers are fast at following instructions. Algorithms give those instructions structure and purpose.

Algorithms are the language of problem solving.

Try It Yourself

Write an algorithm to find the maximum number in a list
Write an algorithm to reverse a string
Describe your morning routine as an algorithm: list the inputs, the steps, and the final output

Tip: the best way to learn is to think in small, clear steps. Break a problem into simple actions you can execute one by one.

2. Measuring Time and Space

Now that you know what an algorithm is, it’s time to ask a deeper question:

How do we know if one algorithm is better than another?

It’s not enough for an algorithm to be correct. It should also be efficient. We measure efficiency in two key ways: time and space.

Time Complexity

Time measures how long an algorithm takes to run, relative to its input size. We don’t measure in seconds, because hardware speed varies. Instead, we count steps or operations.

Example:

for (int i = 0; i < n; i++) {
    printf("Hi\n");
}

This loop runs $n$ times, so it has time complexity $O(n)$. The time grows linearly with input size.

Another example:

for (int i = 0; i < n; i++)
  for (int j = 0; j < n; j++)
    printf("*");

This runs $n \times n = n^2$ times, so it has $O(n^2)$ time complexity.

These Big-O symbols describe how runtime grows as the input grows.

Space Complexity

Space measures how much memory an algorithm uses.

Example:

int sum = 0;  // O(1) space

This uses a constant amount of memory, regardless of input size.

But if we allocate an array:

int arr[n];   // O(n) space

This uses space proportional to $n$.

Often, we trade time for space:

Using a hash table speeds up lookups (more memory, less time)
Using a streaming algorithm saves memory (less space, more time)

Tiny Code

Compare two ways to compute the sum from 1 to $n$:

Method 1: Loop

int sum_loop(int n) {
    int total = 0;
    for (int i = 1; i <= n; i++) total += i;
    return total;
}

Time: $O(n)$ Space: $O(1)$

Method 2: Formula

int sum_formula(int n) {
    return n * (n + 1) / 2;
}

Time: $O(1)$ Space: $O(1)$

Both are correct, but one is faster. Analyzing time and space helps you understand why.

Why It Matters

When data grows huge (millions or billions), small inefficiencies explode.

An algorithm that takes $O(n^2)$ time might feel fine for 10 elements, but impossible for 1,000,000.

Measuring time and space helps you:

Predict performance
Compare different solutions
Optimize intelligently

It’s your compass for navigating complexity.

Try It Yourself

Write a simple algorithm to find the minimum in an array. Estimate its time and space complexity.
Compare two algorithms that solve the same problem. Which one scales better?
Think of a daily task that feels like $O(n)$. Can you imagine one that’s $O(1)$?

Understanding these measurements early makes every future algorithm more meaningful.

3. Big-O, Big-Theta, Big-Omega

Now that you can measure time and space, let’s learn the language used to describe those measurements.

When we say an algorithm is $O(n)$, we’re using asymptotic notation, a way to describe how an algorithm’s running time or memory grows as input size $n$ increases.

It’s not about exact steps, but about how the cost scales for very large $n$.

The Big-O (Upper Bound)

Big-O answers the question: “How bad can it get?” It gives an upper bound on growth, the worst-case scenario.

If an algorithm takes at most $5n + 20$ steps, we write $O(n)$. We drop constants and lower-order terms because they don’t matter at scale.

Common Big-O notations:

Name	Notation	Growth	Example
Constant	$O(1)$	Flat	Accessing array element
Logarithmic	$O(\log n)$	Very slow growth	Binary search
Linear	$O(n)$	Proportional	Single loop
Quadratic	$O(n^2)$	Grows quickly	Double loop
Exponential	$O(2^n)$	Explodes	Recursive subset generation

If your algorithm is $O(n)$, doubling input size roughly doubles runtime. If it’s $O(n^2)$, doubling input size makes it about four times slower.

The Big-Theta (Tight Bound)

Big-Theta ($\Theta$) gives a tight bound, when you know the algorithm’s growth from above and below.

If runtime is roughly $3n + 2$, then $T(n) = \Theta(n)$. That means it’s both $O(n)$ and $\Omega(n)$.

The Big-Omega (Lower Bound)

Big-Omega ($\Omega$) answers: “How fast can it possibly be?” It’s the best-case growth, the lower limit.

Example:

Linear search: $\Omega(1)$ if the element is at the start
$O(n)$ in the worst case if it’s at the end

So we might say:

\[ T(n) = \Omega(1),\quad T(n) = O(n) \]

Tiny Code

Let’s see Big-O in action.

int sum_pairs(int n) {
    int total = 0;
    for (int i = 0; i < n; i++)        // O(n)
        for (int j = 0; j < n; j++)    // O(n)
            total += i + j;            // O(1)
    return total;
}

Total steps ≈ $n \times n = n^2$. So $T(n) = O(n^2)$.

If we added a constant-time operation before or after the loops, it wouldn’t matter. Constants vanish in asymptotic notation.

Why It Matters

Big-O, Big-Theta, and Big-Omega let you talk precisely about performance. They are the grammar of efficiency.

When you can write:

Algorithm A runs in $O(n \log n)$ time, $O(n)$ space

you’ve captured its essence clearly and compared it meaningfully.

They help you:

Predict behavior at scale
Choose better data structures
Communicate efficiency in interviews and papers

It’s not about exact timing, it’s about growth.

Try It Yourself

Analyze this code:

for (int i = 1; i <= n; i *= 2)
    printf("%d", i);

What’s the time complexity?

Write an algorithm that’s $O(n \log n)$ (hint: merge sort).
Identify the best, worst, and average-case complexities for linear search and binary search.

Learning Big-O is like learning a new language, once you’re fluent, you can see how code grows before you even run it.

4. Algorithmic Paradigms (Greedy, Divide and Conquer, DP)

Once you can measure performance, it’s time to explore how algorithms are designed. Behind every clever solution is a guiding paradigm, a way of thinking about problems.

Three of the most powerful are:

Greedy Algorithms
Divide and Conquer
Dynamic Programming (DP)

Each represents a different mindset for problem solving.

1. Greedy Algorithms

A greedy algorithm makes the best local choice at each step, hoping it leads to a global optimum.

Think of it like:

“Take what looks best right now, and don’t worry about the future.”

They are fast and simple, but not always correct. They only work when the greedy choice property holds.

Example: Coin Change (Greedy version) Suppose you want to make 63 cents using US coins (25, 10, 5, 1). The greedy approach:

Take 25 → 38 left
Take 25 → 13 left
Take 10 → 3 left
Take 1 × 3

This works here, but not always (try coins 1, 3, 4 for amount 6). Simple, but not guaranteed optimal.

Common greedy algorithms:

Kruskal’s Minimum Spanning Tree
Prim’s Minimum Spanning Tree
Dijkstra’s Shortest Path (non-negative weights)
Huffman Coding

2. Divide and Conquer

This is a classic paradigm. You break the problem into smaller subproblems, solve each recursively, and then combine the results.

It’s like splitting a task among friends, then merging their answers.

Formally:

\[ T(n) = aT\left(\frac{n}{b}\right) + f(n) \]

Examples:

Merge Sort: divide the array, sort halves, merge
Quick Sort: partition around a pivot
Binary Search: halve the range each step

Elegant and powerful, but recursion overhead can add cost if poorly structured.

3. Dynamic Programming (DP)

DP is for problems with overlapping subproblems and optimal substructure. You solve smaller subproblems once and store the results to avoid recomputation.

It’s like divide and conquer with memory.

Example: Fibonacci Naive recursion is exponential. DP with memoization is linear.

int fib(int n) {
    if (n <= 1) return n;
    static int memo[1000] = {0};
    if (memo[n]) return memo[n];
    memo[n] = fib(n-1) + fib(n-2);
    return memo[n];
}

Efficient reuse, but requires insight into subproblem structure.

Tiny Code

Quick comparison using Fibonacci:

Naive (Divide and Conquer)

int fib_dc(int n) {
    if (n <= 1) return n;
    return fib_dc(n-1) + fib_dc(n-2);  // exponential
}

DP (Memoization)

int fib_dp(int n, int memo[]) {
    if (n <= 1) return n;
    if (memo[n]) return memo[n];
    return memo[n] = fib_dp(n-1, memo) + fib_dp(n-2, memo);
}

Why It Matters

Algorithmic paradigms give you patterns for design:

Greedy: when local choices lead to a global optimum
Divide and Conquer: when the problem splits naturally
Dynamic Programming: when subproblems overlap

Once you recognize a problem’s structure, you’ll instantly know which mindset fits best.

Think of paradigms as templates for reasoning, not just techniques but philosophies.

Try It Yourself

Write a greedy algorithm to make change using coins [1, 3, 4] for amount 6. Does it work?
Implement merge sort using divide and conquer.
Solve Fibonacci both ways (naive vs DP) and compare speeds.
Think of a real-life task you solve greedily.

Learning paradigms is like learning styles of thought. Once you know them, every problem starts to look familiar.

5. Recurrence Relations

Every time you break a problem into smaller subproblems, you create a recurrence, a mathematical way to describe how the total cost grows.

Recurrence relations are the backbone of analyzing recursive algorithms. They tell us how much time or space an algorithm uses, based on the cost of its subproblems.

What Is a Recurrence?

A recurrence relation expresses $T(n)$, the total cost for input size $n$, in terms of smaller instances.

Example (Merge Sort):

\[ T(n) = 2T(n/2) + O(n) \]

That means:

It divides the problem into 2 halves ($2T(n/2)$)
Merges results in $O(n)$ time

You will often see recurrences like:

$T(n) = T(n - 1) + O(1)$
$T(n) = 2T(n/2) + O(n)$
$T(n) = T(n/2) + O(1)$

Each one represents a different structure of recursion.

Example 1: Simple Linear Recurrence

Consider this code:

int count_down(int n) {
    if (n == 0) return 0;
    return 1 + count_down(n - 1);
}

This calls itself once for each smaller input:

\[ T(n) = T(n - 1) + O(1) \]

Solve it:

\[ T(n) = O(n) \]

Because it runs once per level.

Example 2: Binary Recurrence

For binary recursion:

int sum_tree(int n) {
    if (n == 1) return 1;
    return sum_tree(n/2) + sum_tree(n/2) + 1;
}

Here we do two subcalls on $n/2$ and a constant amount of extra work:

\[ T(n) = 2T(n/2) + O(1) \]

Solve it: $T(n) = O(n)$

Why? Each level doubles the number of calls but halves the size. There are $\log n$ levels, and total work adds up to $O(n)$.

Solving Recurrences

There are several ways to solve them:

Substitution Method Guess the solution, then prove it by induction.
Recursion Tree Method Expand the recurrence into a tree and sum the cost per level.
Master Theorem Use a formula when the recurrence matches:

\[ T(n) = aT(n/b) + f(n) \]

Master Theorem (Quick Summary)

If $T(n) = aT(n/b) + f(n)$, then:

If $f(n) = O(n^{\log_b a - \epsilon})$, then $T(n) = \Theta(n^{\log_b a})$
If $f(n) = \Theta(n^{\log_b a})$, then $T(n) = \Theta(n^{\log_b a} \log n)$
If $f(n) = \Omega(n^{\log_b a + \epsilon})$, and the regularity condition holds, then $T(n) = \Theta(f(n))$

Example (Merge Sort): $a = 2$, $b = 2$, $f(n) = O(n)$

\[ T(n) = 2T(n/2) + O(n) = O(n \log n) \]

Tiny Code

Let’s write a quick recursive sum:

int sum_array(int arr[], int l, int r) {
    if (l == r) return arr[l];
    int mid = (l + r) / 2;
    return sum_array(arr, l, mid) + sum_array(arr, mid+1, r);
}

Recurrence:

\[ T(n) = 2T(n/2) + O(1) \]

→ $O(n)$

If you added merging (like in merge sort), you would get $+O(n)$:

→ $O(n \log n)$

Why It Matters

Recurrence relations let you predict the cost of recursive solutions.

Without them, recursion feels like magic. With them, you can quantify efficiency.

They are key to understanding:

Divide and Conquer
Dynamic Programming
Backtracking

Once you can set up a recurrence, solving it becomes a game of algebra and logic.

Try It Yourself

Write a recurrence for binary search. Solve it.
Write a recurrence for merge sort. Solve it.

Analyze this function:

void fun(int n) {
    if (n <= 1) return;
    fun(n/2);
    fun(n/3);
    fun(n/6);
}

What’s the recurrence? Approximate the complexity.

Expand $T(n) = T(n-1) + 1$ into its explicit sum.

Learning recurrences helps you see inside recursion. They turn code into equations.

6. Searching Basics

Before we sort or optimize, we need a way to find things. Searching is one of the most fundamental actions in computing, whether it’s looking up a name, finding a key, or checking if something exists.

A search algorithm takes a collection (array, list, tree, etc.) and a target, and returns whether the target is present (and often its position).

Let’s begin with two foundational techniques: Linear Search and Binary Search.

1. Linear Search

Linear search is the simplest method:

Start at the beginning
Check each element in turn
Stop if you find the target

It works on any list, sorted or not, but can be slow for large data.

int linear_search(int arr[], int n, int key) {
    for (int i = 0; i < n; i++) {
        if (arr[i] == key) return i;
    }
    return -1;
}

Example: If arr = [2, 4, 6, 8, 10] and key = 6, it finds it at index 2.

Complexity:

Time: $O(n)$
Space: $O(1)$

Linear search is simple and guaranteed to find the target if it exists, but slow when lists are large.

2. Binary Search

When the list is sorted, we can do much better. Binary search repeatedly divides the search space in half.

Steps:

Check the middle element
If it matches, you’re done
If target < mid, search the left half
Else, search the right half

int binary_search(int arr[], int n, int key) {
    int low = 0, high = n - 1;
    while (low <= high) {
        int mid = (low + high) / 2;
        if (arr[mid] == key) return mid;
        else if (arr[mid] < key) low = mid + 1;
        else high = mid - 1;
    }
    return -1;
}

Example: arr = [2, 4, 6, 8, 10], key = 8

mid = 6 → key > mid → search right half
mid = 8 → found

Complexity:

Time: $O(\log n)$
Space: $O(1)$

Binary search is a massive improvement; doubling input only adds one extra step.

3. Recursive Binary Search

Binary search can also be written recursively:

int binary_search_rec(int arr[], int low, int high, int key) {
    if (low > high) return -1;
    int mid = (low + high) / 2;
    if (arr[mid] == key) return mid;
    else if (arr[mid] > key) return binary_search_rec(arr, low, mid - 1, key);
    else return binary_search_rec(arr, mid + 1, high, key);
}

Same logic, different structure. Both iterative and recursive forms are equally efficient.

4. Choosing Between Them

Method	Works On	Time	Space	Needs Sorting
Linear Search	Any list	O(n)	O(1)	No
Binary Search	Sorted list	O(log n)	O(1)	Yes

If data is unsorted or very small, linear search is fine. If data is sorted and large, binary search is far superior.

Tiny Code

Compare the steps: For $n = 16$:

Linear search → up to 16 comparisons
Binary search → $\log_2 16 = 4$ comparisons

That’s a huge difference.

Why It Matters

Searching is the core of information retrieval. Every database, compiler, and system relies on it.

Understanding simple searches prepares you for:

Hash tables (constant-time lookups)
Tree searches (ordered structures)
Graph traversals (structured exploration)

It’s not just about finding values; it’s about learning how data structure and algorithm design fit together.

Try It Yourself

Write a linear search that returns all indices where a target appears.
Modify binary search to return the first occurrence of a target in a sorted array.
Compare runtime on arrays of size 10, 100, 1000.
What happens if you run binary search on an unsorted list?

Search is the foundation. Once you master it, you’ll recognize its patterns everywhere.

7. Sorting Basics

Sorting is one of the most studied problems in computer science. Why? Because order matters. It makes searching faster, patterns clearer, and data easier to manage.

A sorting algorithm arranges elements in a specific order (usually ascending or descending). Once sorted, many operations (like binary search, merging, or deduplication) become much simpler.

Let’s explore the foundational sorting methods and the principles behind them.

1. What Makes a Sort Algorithm

A sorting algorithm should define:

Input: a sequence of elements
Output: the same elements, in sorted order
Stability: keeps equal elements in the same order (important for multi-key sorts)
In-place: uses only a constant amount of extra space

Different algorithms balance speed, memory, and simplicity.

2. Bubble Sort

Idea: repeatedly “bubble up” the largest element to the end by swapping adjacent pairs.

void bubble_sort(int arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        for (int j = 0; j < n - i - 1; j++) {
            if (arr[j] > arr[j + 1]) {
                int temp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = temp;
            }
        }
    }
}

Each pass moves the largest remaining item to its final position.

Time: $O(n^2)$
Space: $O(1)$
Stable: Yes

Simple but inefficient for large data.

3. Selection Sort

Idea: repeatedly select the smallest element and put it in the correct position.

void selection_sort(int arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        int min_idx = i;
        for (int j = i + 1; j < n; j++) {
            if (arr[j] < arr[min_idx]) min_idx = j;
        }
        int temp = arr[i];
        arr[i] = arr[min_idx];
        arr[min_idx] = temp;
    }
}

Time: $O(n^2)$
Space: $O(1)$
Stable: No

Fewer swaps, but still quadratic in time.

4. Insertion Sort

Idea: build the sorted list one item at a time, inserting each new item in the right place.

void insertion_sort(int arr[], int n) {
    for (int i = 1; i < n; i++) {
        int key = arr[i];
        int j = i - 1;
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = key;
    }
}

Time: $O(n^2)$ (best case $O(n)$ when nearly sorted)
Space: $O(1)$
Stable: Yes

Insertion sort is great for small or nearly sorted datasets. It is often used as a base in hybrid sorts like Timsort.

5. Comparing the Basics

Algorithm	Best Case	Average Case	Worst Case	Stable	In-place
Bubble Sort	O(n)	O(n²)	O(n²)	Yes	Yes
Selection Sort	O(n²)	O(n²)	O(n²)	No	Yes
Insertion Sort	O(n)	O(n²)	O(n²)	Yes	Yes

All three are quadratic in time, but Insertion Sort performs best on small or partially sorted data.

Tiny Code

Quick check with arr = [5, 3, 4, 1, 2]:

Insertion Sort (step by step)

Insert 3 before 5 → [3, 5, 4, 1, 2]
Insert 4 → [3, 4, 5, 1, 2]
Insert 1 → [1, 3, 4, 5, 2]
Insert 2 → [1, 2, 3, 4, 5]

Sorted!

Why It Matters

Sorting is a gateway algorithm. It teaches you about iteration, swapping, and optimization.

Efficient sorting is critical for:

Preprocessing data for binary search
Organizing data for analysis
Building indexes and ranking systems

It’s the first step toward deeper concepts like divide and conquer and hybrid optimization.

Try It Yourself

Implement all three: bubble, selection, insertion
Test them on arrays of size 10, 100, 1000, and note timing differences
Try sorting an array that’s already sorted. Which one adapts best?
Modify insertion sort to sort in descending order

Sorting may seem simple, but it’s a cornerstone. Mastering it will shape your intuition for almost every other algorithm.

8. Data Structures Overview

Algorithms and data structures are two sides of the same coin. An algorithm is how you solve a problem. A data structure is where you store and organize data so that your algorithm can work efficiently.

You can think of data structures as containers, each one shaped for specific access patterns, trade-offs, and performance needs. Choosing the right one is often the key to designing a fast algorithm.

1. Why Data Structures Matter

Imagine you want to find a book quickly.

If all books are piled randomly → you must scan every one ($O(n)$)
If they’re sorted on a shelf → you can use binary search ($O(\log n)$)
If you have an index or catalog → you can find it instantly ($O(1)$)

Different structures unlock different efficiencies.

2. The Core Data Structures

Let’s walk through the most essential ones:

Type	Description	Key Operations	Typical Use
Array	Fixed-size contiguous memory	Access ($O(1)$), Insert/Delete ($O(n)$)	Fast index access
Linked List	Sequence of nodes with pointers	Insert/Delete ($O(1)$), Access ($O(n)$)	Dynamic sequences
Stack	LIFO (last-in, first-out)	push(), pop() in $O(1)$	Undo, recursion
Queue	FIFO (first-in, first-out)	enqueue(), dequeue() in $O(1)$	Scheduling, buffers
Hash Table	Key-value pairs via hashing	Average $O(1)$, Worst $O(n)$	Lookup, caching
Heap	Partially ordered tree	Insert $O(\log n)$, Extract-Min $O(\log n)$	Priority queues
Tree	Hierarchical structure	Access $O(\log n)$ (balanced)	Sorted storage
Graph	Nodes + edges	Traversal $O(V+E)$	Networks, paths
Set / Map	Unique keys or key-value pairs	$O(\log n)$ or $O(1)$	Membership tests

Each comes with trade-offs. Arrays are fast but rigid, linked lists are flexible but slower to access, and hash tables are lightning-fast but unordered.

3. Abstract Data Types (ADTs)

An ADT defines what operations you can do, not how they’re implemented. For example, a Stack ADT promises:

push(x)
pop()
peek()

It can be implemented with arrays or linked lists, the behavior stays the same.

Common ADTs:

Stack
Queue
Deque
Priority Queue
Map / Dictionary

This separation of interface and implementation helps design flexible systems.

4. The Right Tool for the Job

Choosing the correct data structure often decides the performance of your algorithm:

Problem	Good Choice	Reason
Undo feature	Stack	LIFO fits history
Scheduling tasks	Queue	FIFO order
Dijkstra’s algorithm	Priority Queue	Extract smallest distance
Counting frequencies	Hash Map	Fast key lookup
Dynamic median	Heap + Heap	Balance two halves
Search by prefix	Trie	Fast prefix lookups

Good programmers don’t just write code. They pick the right structure.

Tiny Code

Example: comparing array vs linked list

Array:

int arr[5] = {1, 2, 3, 4, 5};
printf("%d", arr[3]); // O(1)

Linked List:

struct Node { int val; struct Node* next; };

To get the 4th element, you must traverse → $O(n)$

Different structures, different access costs.

Why It Matters

Every efficient algorithm depends on the right data structure.

Searching, sorting, and storing all rely on structure
Memory layout affects cache performance
The wrong choice can turn $O(1)$ into $O(n^2)$

Understanding these structures is like knowing the tools in a workshop. Once you recognize their shapes, you’ll instinctively know which to grab.

Try It Yourself

Implement a stack using an array. Then implement it using a linked list.
Write a queue using two stacks.
Try storing key-value pairs in a hash table (hint: mod by table size).
Compare access times for arrays vs linked lists experimentally.

Data structures aren’t just storage. They are the skeletons your algorithms stand on.

9. Graphs and Trees Overview

Now that you’ve seen linear structures like arrays and linked lists, it’s time to explore nonlinear structures, graphs and trees. These are the shapes behind networks, hierarchies, and relationships.

They appear everywhere: family trees, file systems, maps, social networks, and knowledge graphs all rely on them.

1. Trees

A tree is a connected structure with no cycles. It’s a hierarchy, and every node (except the root) has one parent.

Root: the top node
Child: a node directly connected below
Leaf: a node with no children
Height: the longest path from root to a leaf

A binary tree is one where each node has at most two children. A binary search tree (BST) keeps elements ordered:

Left child < parent < right child

Basic operations:

Insert
Search
Delete
Traverse (preorder, inorder, postorder, level-order)

Example:

struct Node {
    int val;
    struct Node *left, *right;
};

Insert in BST:

struct Node* insert(struct Node* root, int val) {
    if (!root) return newNode(val);
    if (val < root->val) root->left = insert(root->left, val);
    else root->right = insert(root->right, val);
    return root;
}

2. Common Tree Types

Type	Description	Use Case
Binary Tree	Each node has ≤ 2 children	General hierarchy
Binary Search Tree	Left < Root < Right	Ordered data
AVL / Red-Black Tree	Self-balancing BST	Fast search/insert
Heap	Complete binary tree, parent ≥ or ≤ children	Priority queues
Trie	Tree of characters	Prefix search
Segment Tree	Tree over ranges	Range queries
Fenwick Tree	Tree with prefix sums	Efficient updates

Balanced trees keep height $O(\log n)$, guaranteeing fast operations.

3. Graphs

A graph generalizes the idea of trees. In graphs, nodes (vertices) can connect freely.

A graph is a set of vertices ($V$) and edges ($E$):

\[ G = (V, E) \]

Directed vs Undirected:

Directed: edges have direction (A → B)
Undirected: edges connect both ways (A, B)

Weighted vs Unweighted:

Weighted: each edge has a cost
Unweighted: all edges are equal

Representation:

Adjacency Matrix: $n \times n$ matrix; entry $(i, j) = 1$ if edge exists
Adjacency List: array of lists; each vertex stores its neighbors

Example adjacency list:

vector<int> graph[n];
graph[0].push_back(1);
graph[0].push_back(2);

4. Common Graph Types

Graph Type	Description	Example
Undirected	Edges without direction	Friendship network
Directed	Arrows indicate direction	Web links
Weighted	Edges have costs	Road network
Cyclic	Contains loops	Task dependencies
Acyclic	No loops	Family tree
DAG (Directed Acyclic)	Directed, no cycles	Scheduling, compilers
Complete	All pairs connected	Dense networks
Sparse	Few edges	Real-world graphs

5. Basic Graph Operations

Add Vertex / Edge
Traversal: Depth-First Search (DFS), Breadth-First Search (BFS)
Path Finding: Dijkstra, Bellman-Ford
Connectivity: Union-Find, Tarjan (SCC)
Spanning Trees: Kruskal, Prim

Each graph problem has its own flavor, from finding shortest paths to detecting cycles.

Tiny Code

Breadth-first search (BFS):

void bfs(int start, vector<int> graph[], int n) {
    bool visited[n];
    memset(visited, false, sizeof(visited));
    queue<int> q;
    visited[start] = true;
    q.push(start);
    while (!q.empty()) {
        int node = q.front(); q.pop();
        printf("%d ", node);
        for (int neighbor : graph[node]) {
            if (!visited[neighbor]) {
                visited[neighbor] = true;
                q.push(neighbor);
            }
        }
    }
}

This explores level by level, perfect for shortest paths in unweighted graphs.

Why It Matters

Trees and graphs model relationships and connections, not just sequences. They are essential for:

Search engines (web graph)
Compilers (syntax trees, dependency DAGs)
AI (state spaces, decision trees)
Databases (indexes, joins, relationships)

Understanding them unlocks an entire world of algorithms, from DFS and BFS to Dijkstra, Kruskal, and beyond.

Try It Yourself

Build a simple binary search tree and implement inorder traversal.
Represent a graph with adjacency lists and print all edges.
Write a DFS and BFS for a small graph.
Draw a directed graph with a cycle and detect it manually.

Graphs and trees move you beyond linear thinking. They let you explore connections, not just collections.

10. Algorithm Design Patterns

By now, you’ve seen what algorithms are and how they’re analyzed. You’ve explored searches, sorts, structures, and recursion. The next step is learning patterns, reusable strategies that guide how you build new algorithms from scratch.

Just like design patterns in software architecture, algorithmic design patterns give structure to your thinking. Once you recognize them, many problems suddenly feel familiar.

1. Brute Force

Start simple. Try every possibility and pick the best result. Brute force is often your baseline, clear but inefficient.

Example: Find the maximum subarray sum by checking all subarrays.

Time: $O(n^2)$
Advantage: easy to reason about
Disadvantage: explodes for large input

Sometimes, brute force helps you see the structure needed for a better approach.

2. Divide and Conquer

Split the problem into smaller parts, solve each, and combine. Ideal for problems with self-similarity.

Classic examples:

Merge Sort → split and merge
Binary Search → halve the search space
Quick Sort → partition and sort

General form:

\[ T(n) = aT(n/b) + f(n) \]

Use recurrence relations and the Master Theorem to analyze them.

3. Greedy

Make the best local decision at each step. Works only when local optimal choices lead to a global optimum.

Examples:

Activity Selection
Huffman Coding
Dijkstra (for non-negative weights)

Greedy algorithms are simple and fast when they fit.

4. Dynamic Programming (DP)

When subproblems overlap, store results and reuse them. Think recursion plus memory.

Two main styles:

Top-Down (Memoization): recursive with caching
Bottom-Up (Tabulation): iterative filling table

Used in:

Fibonacci numbers
Knapsack
Longest Increasing Subsequence (LIS)
Matrix Chain Multiplication

DP transforms exponential recursion into polynomial time.

5. Backtracking

Explore all possibilities, but prune when constraints fail. It is brute force with early exits.

Perfect for:

N-Queens
Sudoku
Permutation generation
Subset sums

Backtracking builds solutions incrementally, abandoning paths that cannot lead to a valid result.

6. Two Pointers

Move two indices through a sequence to find patterns or meet conditions.

Common use:

Sorted arrays (sum pairs, partitions)
String problems (palindromes, sliding windows)
Linked lists (slow and fast pointers)

Simple, but surprisingly powerful.

7. Sliding Window

Maintain a window over data, expand or shrink it as needed.

Used for:

Maximum sum subarray (Kadane’s algorithm)
Substrings of length $k$
Longest substring without repeating characters

Helps reduce $O(n^2)$ to $O(n)$ in sequence problems.

8. Binary Search on Answer

Sometimes, the input is not sorted, but the answer space is. If you can define a function check(mid) that is monotonic (true or false changes once), you can apply binary search on possible answers.

Examples:

Minimum capacity to ship packages in D days
Smallest feasible value satisfying a constraint

Powerful for optimization under monotonic conditions.

9. Graph-Based

Think in terms of nodes and edges, paths and flows.

Patterns include:

BFS and DFS (exploration)
Topological Sort (ordering)
Dijkstra and Bellman-Ford (shortest paths)
Union-Find (connectivity)
Kruskal and Prim (spanning trees)

Graphs often reveal relationships hidden in data.

10. Meet in the Middle

Split the problem into two halves, compute all possibilities for each, and combine efficiently. Used in problems where brute force $O(2^n)$ is too large but $O(2^{n/2})$ is manageable.

Examples:

Subset sum (divide into two halves)
Search problems in combinatorics

A clever compromise between brute force and efficiency.

Tiny Code

Example: Two Pointers to find a pair sum

int find_pair_sum(int arr[], int n, int target) {
    int i = 0, j = n - 1;
    while (i < j) {
        int sum = arr[i] + arr[j];
        if (sum == target) return 1;
        else if (sum < target) i++;
        else j--;
    }
    return 0;
}

Works in $O(n)$ for sorted arrays, elegant and fast.

Why It Matters

Patterns are mental shortcuts. They turn “blank page” problems into “I’ve seen this shape before.”

Once you recognize the structure, you can choose a suitable pattern and adapt it. This is how top coders solve complex problems under time pressure, not by memorizing algorithms, but by seeing patterns.

Try It Yourself

Write a brute-force and a divide-and-conquer solution for maximum subarray sum. Compare speed.
Solve the coin change problem using both greedy and DP.
Implement N-Queens with backtracking.
Use two pointers to find the smallest window with a given sum.
Pick a problem you’ve solved before. Can you reframe it using a different design pattern?

The more patterns you practice, the faster you will map new problems to known strategies, and the more powerful your algorithmic intuition will become.

Chapter 2. Sorting and Searching

11. Elementary Sorting (Bubble, Insertion, Selection)

Before diving into advanced sorts like mergesort or heapsort, it’s important to understand the elementary sorting algorithms , the building blocks. They’re simple, intuitive, and great for learning how sorting works under the hood.

In this section, we’ll cover three classics:

Bubble Sort - swap adjacent out-of-order pairs- Selection Sort - select the smallest element each time- Insertion Sort - insert elements one by one in order These algorithms share ( O$n^2$ ) time complexity but differ in behavior and stability.

1. Bubble Sort

Idea: Compare adjacent pairs and swap if they’re out of order. Repeat until the array is sorted. Each pass “bubbles” the largest element to the end.

Steps:

Compare arr[j] and arr[j+1]
Swap if arr[j] > arr[j+1]
Continue passes until no swaps are needed

Code:

void bubble_sort(int arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        int swapped = 0;
        for (int j = 0; j < n - i - 1; j++) {
            if (arr[j] > arr[j + 1]) {
                int temp = arr[j];
                arr[j] = arr[j + 1];
                arr[j + 1] = temp;
                swapped = 1;
            }
        }
        if (!swapped) break;
    }
}

Complexity:

Best: ( O(n) ) (already sorted)- Worst: ( O$n^2$ )- Space: ( O(1) )- Stable: Yes Intuition: Imagine bubbles rising , after each pass, the largest “bubble” settles at the top.

2. Selection Sort

Idea: Find the smallest element and place it at the front.

Steps:

For each position i, find the smallest element in the remainder of the array
Swap it with arr[i]

Code:

void selection_sort(int arr[], int n) {
    for (int i = 0; i < n - 1; i++) {
        int min_idx = i;
        for (int j = i + 1; j < n; j++) {
            if (arr[j] < arr[min_idx])
                min_idx = j;
        }
        int temp = arr[i];
        arr[i] = arr[min_idx];
        arr[min_idx] = temp;
    }
}

Complexity:

Best: ( O$n^2$ )- Worst: ( O$n^2$ )- Space: ( O(1) )- Stable: No Intuition: Selection sort “selects” the next correct element and fixes it. It minimizes swaps but still scans all elements.

3. Insertion Sort

Idea: Build a sorted array one element at a time by inserting each new element into its correct position.

Steps:

Start from index 1
Compare with previous elements
Shift elements greater than key to the right
Insert key into the correct place

Code:

void insertion_sort(int arr[], int n) {
    for (int i = 1; i < n; i++) {
        int key = arr[i];
        int j = i - 1;
        while (j >= 0 && arr[j] > key) {
            arr[j + 1] = arr[j];
            j--;
        }
        arr[j + 1] = key;
    }
}

Complexity:

Best: ( O(n) ) (nearly sorted)- Worst: ( O$n^2$ )- Space: ( O(1) )- Stable: Yes Intuition: It’s like sorting cards in your hand , take the next card and slide it into the right place.

4. Comparing the Three

Algorithm	Best Case	Average Case	Worst Case	Stable	In-Place	Notes
Bubble Sort	O(n)	O(n²)	O(n²)	Yes	Yes	Early exit possible
Selection Sort	O(n²)	O(n²)	O(n²)	No	Yes	Few swaps
Insertion Sort	O(n)	O(n²)	O(n²)	Yes	Yes	Great on small or nearly sorted data

Tiny Code

Let’s see how insertion sort works on [5, 3, 4, 1, 2]:

Start with 3 → insert before 5 → [3, 5, 4, 1, 2]- Insert 4 → [3, 4, 5, 1, 2]- Insert 1 → [1, 3, 4, 5, 2]- Insert 2 → [1, 2, 3, 4, 5] Sorted in five passes.

Why It Matters

Elementary sorts teach you:

How comparisons and swaps drive order- The trade-off between simplicity and efficiency- How to reason about stability and adaptability While these aren’t used for large datasets in practice, they’re used inside hybrid algorithms like Timsort and IntroSort, which switch to insertion sort for small chunks.

Try It Yourself

Implement all three and print the array after each pass.
Test on arrays: already sorted, reversed, random, partially sorted.
Modify bubble sort to sort descending.
Try insertion sort on 10,000 elements and note its behavior.
Can you detect when the list is already sorted and stop early?

Start simple. Master these patterns. They’ll be your foundation for everything from merge sort to radix sort.

12. Divide-and-Conquer Sorting (Merge, Quick, Heap)

Elementary sorts are great for learning, but their (O$n^2$) runtime quickly becomes a bottleneck. To scale beyond small arrays, we need algorithms that divide problems into smaller parts, sort them independently, and combine the results.

This is the essence of divide and conquer , break it down, solve subproblems, merge solutions. In sorting, this approach yields some of the fastest general-purpose algorithms: Merge Sort, Quick Sort, and Heap Sort.

1. Merge Sort

Idea: Split the array in half, sort each half recursively, then merge the two sorted halves.

Merge sort is stable, works well with linked lists, and guarantees (O$n \log n$) time.

Steps:

Divide the array into halves
Recursively sort each half
Merge two sorted halves into one

Code:

void merge(int arr[], int l, int m, int r) {
    int n1 = m - l + 1;
    int n2 = r - m;
    int L[n1], R[n2];
    for (int i = 0; i < n1; i++) L[i] = arr[l + i];
    for (int j = 0; j < n2; j++) R[j] = arr[m + 1 + j];
    int i = 0, j = 0, k = l;
    while (i < n1 && j < n2) {
        if (L[i] <= R[j]) arr[k++] = L[i++];
        else arr[k++] = R[j++];
    }
    while (i < n1) arr[k++] = L[i++];
    while (j < n2) arr[k++] = R[j++];
}

void merge_sort(int arr[], int l, int r) {
    if (l < r) {
        int m = (l + r) / 2;
        merge_sort(arr, l, m);
        merge_sort(arr, m + 1, r);
        merge(arr, l, m, r);
    }
}

Complexity:

Time: (O$n \log n$) (always)- Space: (O(n)) (temporary arrays)- Stable: Yes Merge sort is predictable, making it ideal for external sorting (like sorting data on disk).

2. Quick Sort

Idea: Pick a pivot, partition the array so smaller elements go left and larger go right, then recursively sort both sides.

Quick sort is usually the fastest in practice due to good cache locality and low constant factors.

Steps:

Choose a pivot (often middle or random)
Partition: move smaller elements to left, larger to right
Recursively sort the two partitions

Code:

int partition(int arr[], int low, int high) {
    int pivot = arr[high];
    int i = low - 1;
    for (int j = low; j < high; j++) {
        if (arr[j] < pivot) {
            i++;
            int tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp;
        }
    }
    int tmp = arr[i + 1]; arr[i + 1] = arr[high]; arr[high] = tmp;
    return i + 1;
}

void quick_sort(int arr[], int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        quick_sort(arr, low, pi - 1);
        quick_sort(arr, pi + 1, high);
    }
}

Complexity:

Best / Average: (O$n \log n$)- Worst: (O$n^2$) (bad pivot, e.g. sorted input with naive pivot)- Space: (O$\log n$) (recursion)- Stable: No (unless modified) Quick sort is often used in standard libraries due to its efficiency in real-world workloads.

3. Heap Sort

Idea: Turn the array into a heap, repeatedly extract the largest element, and place it at the end.

A heap is a binary tree where every parent is ≥ its children (max-heap).

Steps:

Build a max-heap
Swap the root (max) with the last element
Reduce heap size, re-heapify
Repeat until sorted

Code:

void heapify(int arr[], int n, int i) {
    int largest = i;
    int l = 2 * i + 1;
    int r = 2 * i + 2;
    if (l < n && arr[l] > arr[largest]) largest = l;
    if (r < n && arr[r] > arr[largest]) largest = r;
    if (largest != i) {
        int tmp = arr[i]; arr[i] = arr[largest]; arr[largest] = tmp;
        heapify(arr, n, largest);
    }
}

void heap_sort(int arr[], int n) {
    for (int i = n / 2 - 1; i >= 0; i--)
        heapify(arr, n, i);
    for (int i = n - 1; i > 0; i--) {
        int tmp = arr[0]; arr[0] = arr[i]; arr[i] = tmp;
        heapify(arr, i, 0);
    }
}

Complexity:

Time: (O$n \log n$)- Space: (O(1))- Stable: No Heap sort is reliable and space-efficient but less cache-friendly than quicksort.

4. Comparison

Algorithm	Best Case	Average Case	Worst Case	Space	Stable	Notes
Merge Sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes	Predictable, stable
Quick Sort	O(n log n)	O(n log n)	O(n²)	O(log n)	No	Fast in practice
Heap Sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No	In-place, robust

Each one fits a niche:

Merge Sort → stability and guarantees- Quick Sort → speed and cache performance- Heap Sort → low memory usage and simplicity

Tiny Code

Try sorting [5, 1, 4, 2, 8] with merge sort:

Split → [5,1,4], [2,8]
Sort each → [1,4,5], [2,8]
Merge → [1,2,4,5,8]

Each recursive split halves the problem, yielding (O$\log n$) depth with (O(n)) work per level.

Why It Matters

Divide-and-conquer sorting is the foundation for efficient order processing. It introduces ideas you’ll reuse in:

Binary search (halving)- Matrix multiplication- Fast Fourier Transform- Dynamic programming These sorts teach how recursion, partitioning, and merging combine into scalable solutions.

Try It Yourself

Implement merge sort, quick sort, and heap sort.
Test all three on the same random array. Compare runtime.
Modify quick sort to use a random pivot.
Build a stable version of heap sort.
Visualize merge sort’s recursion tree and merging process.

Mastering these sorts gives you a template for solving any divide-and-conquer problem efficiently.

13. Counting and Distribution Sorts (Counting, Radix, Bucket)

So far, we’ve seen comparison-based sorts like merge sort and quicksort. These rely on comparing elements and are bounded by the O(n log n) lower limit for comparisons.

But what if you don’t need to compare elements directly , what if they’re integers or values from a limited range?

That’s where counting and distribution sorts come in. They exploit structure, not just order, to achieve linear-time sorting in the right conditions.

1. Counting Sort

Idea: If your elements are integers in a known range ([0, k)), you can count occurrences of each value, then reconstruct the sorted output.

Counting sort doesn’t compare , it counts.

Steps:

Find the range of input (max value (k))
Count occurrences in a frequency array
Convert counts to cumulative counts
Place elements into their sorted positions

Code:

void counting_sort(int arr[], int n, int k) {
    int count[k + 1];
    int output[n];
    for (int i = 0; i <= k; i++) count[i] = 0;
    for (int i = 0; i < n; i++) count[arr[i]]++;
    for (int i = 1; i <= k; i++) count[i] += count[i - 1];
    for (int i = n - 1; i >= 0; i--) {
        output[count[arr[i]] - 1] = arr[i];
        count[arr[i]]--;
    }
    for (int i = 0; i < n; i++) arr[i] = output[i];
}

Example: arr = [4, 2, 2, 8, 3, 3, 1], k = 8 → count = [0,1,2,2,1,0,0,0,1] → cumulative = [0,1,3,5,6,6,6,6,7] → sorted = [1,2,2,3,3,4,8]

Complexity:

Time: (O(n + k))- Space: (O(k))- Stable: Yes When to use:
Input is integers- Range (k) not much larger than (n)

2. Radix Sort

Idea: Sort digits one at a time, from least significant (LSD) or most significant (MSD), using a stable sub-sort like counting sort.

Radix sort works best when all elements have fixed-length representations (e.g., integers, strings of equal length).

Steps (LSD method):

For each digit position (from rightmost to leftmost)
Sort all elements by that digit using a stable sort (like counting sort)

Code:

int get_max(int arr[], int n) {
    int mx = arr[0];
    for (int i = 1; i < n; i++)
        if (arr[i] > mx) mx = arr[i];
    return mx;
}

void counting_sort_digit(int arr[], int n, int exp) {
    int output[n];
    int count[10] = {0};
    for (int i = 0; i < n; i++)
        count[(arr[i] / exp) % 10]++;
    for (int i = 1; i < 10; i++)
        count[i] += count[i - 1];
    for (int i = n - 1; i >= 0; i--) {
        int digit = (arr[i] / exp) % 10;
        output[count[digit] - 1] = arr[i];
        count[digit]--;
    }
    for (int i = 0; i < n; i++)
        arr[i] = output[i];
}

void radix_sort(int arr[], int n) {
    int m = get_max(arr, n);
    for (int exp = 1; m / exp > 0; exp *= 10)
        counting_sort_digit(arr, n, exp);
}

Example: arr = [170, 45, 75, 90, 802, 24, 2, 66] → sort by 1s → 10s → 100s → final = [2, 24, 45, 66, 75, 90, 170, 802]

Complexity:

Time: (O(d (n + b))), where
- (d): number of digits - (b): base (10 for decimal)- Space: (O(n + b))- Stable: Yes When to use:
Fixed-length numbers- Bounded digits (e.g., base 10 or 2)

3. Bucket Sort

Idea: Divide elements into buckets based on value ranges, sort each bucket individually, then concatenate.

Works best when data is uniformly distributed in a known interval.

Steps:

Create (k) buckets for value ranges
Distribute elements into buckets
Sort each bucket (often using insertion sort)
Merge buckets

Code:

void bucket_sort(float arr[], int n) {
    vector<float> buckets[n];
    for (int i = 0; i < n; i++) {
        int idx = n * arr[i]; // assuming 0 <= arr[i] < 1
        buckets[idx].push_back(arr[i]);
    }
    for (int i = 0; i < n; i++)
        sort(buckets[i].begin(), buckets[i].end());
    int idx = 0;
    for (int i = 0; i < n; i++)
        for (float val : buckets[i])
            arr[idx++] = val;
}

Complexity:

Average: (O(n + k))- Worst: (O$n^2$) (if all fall in one bucket)- Space: (O(n + k))- Stable: Depends on bucket sort method When to use:
Real numbers uniformly distributed in ([0,1))

4. Comparison

Algorithm	Time	Space	Stable	Type	Best Use
Counting Sort	O(n + k)	O(k)	Yes	Non-comparison	Small integer range
Radix Sort	O(d(n + b))	O(n + b)	Yes	Non-comparison	Fixed-length numbers
Bucket Sort	O(n + k) avg	O(n + k)	Often	Distribution-based	Uniform floats

These algorithms achieve O(n) behavior when assumptions hold , they’re specialized but incredibly fast when applicable.

Tiny Code

Let’s walk counting sort on arr = [4, 2, 2, 8, 3, 3, 1]:

Count occurrences → [1,2,2,1,0,0,0,1]- Cumulative count → positions- Place elements → [1,2,2,3,3,4,8] Sorted , no comparisons.

Why It Matters

Distribution sorts teach a key insight:

If you know the structure of your data, you can sort faster than comparison allows.

They show how data properties , range, distribution, digit length , can drive algorithm design.

You’ll meet these ideas again in:

Hashing (bucketing)- Indexing (range partitioning)- Machine learning (binning, histogramming)

Try It Yourself

Implement counting sort for integers from 0 to 100.
Extend radix sort to sort strings by character.
Visualize bucket sort for values between 0 and 1.
What happens if you use counting sort on negative numbers? Fix it.
Compare counting vs quick sort on small integer arrays.

These are the first glimpses of linear-time sorting , harnessing knowledge about data to break the (O$n \log n$) barrier.

14. Hybrid Sorts (IntroSort, Timsort)

In practice, no single sorting algorithm is perfect for all cases. Some are fast on average but fail in worst cases (like Quick Sort). Others are consistent but slow due to overhead (like Merge Sort). Hybrid sorting algorithms combine multiple techniques to get the best of all worlds , practical speed, stability, and guaranteed performance.

Two of the most widely used hybrids in modern systems are IntroSort and Timsort , both power the sorting functions in major programming languages.

1. The Idea Behind Hybrid Sorting

Real-world data is messy: sometimes nearly sorted, sometimes random, sometimes pathological. A smart sorting algorithm should adapt to the data.

Hybrids switch between different strategies based on:

Input size- Recursion depth- Degree of order- Performance thresholds So, the algorithm “introspects” or “adapts” while running.

2. IntroSort

IntroSort (short for introspective sort) begins like Quick Sort, but when recursion gets too deep , which means Quick Sort’s worst case may be coming , it switches to Heap Sort to guarantee (O$n \log n$) time.

Steps:

Use Quick Sort as long as recursion depth < $2 \log n$
If depth exceeds limit → switch to Heap Sort
For very small subarrays → switch to Insertion Sort

This triple combo ensures:

Fast average case (Quick Sort)- Guaranteed upper bound (Heap Sort)- Efficiency on small arrays (Insertion Sort) Code Sketch:

void intro_sort(int arr[], int n) {
    int depth_limit = 2 * log(n);
    intro_sort_util(arr, 0, n - 1, depth_limit);
}

void intro_sort_util(int arr[], int begin, int end, int depth_limit) {
    int size = end - begin + 1;
    if (size < 16) {
        insertion_sort(arr + begin, size);
        return;
    }
    if (depth_limit == 0) {
        heap_sort_range(arr, begin, end);
        return;
    }
    int pivot = partition(arr, begin, end);
    intro_sort_util(arr, begin, pivot - 1, depth_limit - 1);
    intro_sort_util(arr, pivot + 1, end, depth_limit - 1);
}

Complexity:

Average: (O$n \log n$)- Worst: (O$n \log n$)- Space: (O$\log n$)- Stable: No (depends on partition scheme) Used in:
C++ STL’s std::sort- Many systems where performance guarantees matter

3. Timsort

Timsort is a stable hybrid combining Insertion Sort and Merge Sort. It was designed to handle real-world data, which often has runs (already sorted segments).

Developed by Tim Peters (Python core dev), Timsort is now used in:

Python’s sorted() and .sort()- Java’s Arrays.sort() for objects Idea:
Identify runs , segments already ascending or descending- Reverse descending runs (to make them ascending)- Sort small runs with Insertion Sort- Merge runs with Merge Sort Timsort adapts beautifully to partially ordered data.

Steps:

Scan array, detect runs (sequences already sorted)
Push runs to a stack
Merge runs using a carefully balanced merge strategy

Pseudocode (simplified):

def timsort(arr):
    RUN = 32
    n = len(arr)

    # Step 1: sort small chunks
    for i in range(0, n, RUN):
        insertion_sort(arr, i, min((i + RUN - 1), n - 1))

    # Step 2: merge sorted runs
    size = RUN
    while size < n:
        for start in range(0, n, size * 2):
            mid = start + size - 1
            end = min(start + size * 2 - 1, n - 1)
            merge(arr, start, mid, end)
        size *= 2

Complexity:

Best: (O(n)) (already sorted data)- Average: (O$n \log n$)- Worst: (O$n \log n$)- Space: (O(n))- Stable: Yes Key Strengths:
Excellent for real-world, partially sorted data- Stable (keeps equal keys in order)- Optimized merges (adaptive merging)

4. Comparison

Algorithm	Base Methods	Stability	Best	Average	Worst	Real Use
IntroSort	Quick + Heap + Insertion	No	O(n log n)	O(n log n)	O(n log n)	C++ STL
Timsort	Merge + Insertion	Yes	O(n)	O(n log n)	O(n log n)	Python, Java

IntroSort prioritizes performance guarantees. Timsort prioritizes adaptivity and stability.

Both show that “one size fits all” sorting doesn’t exist , great systems detect what’s going on and adapt.

Tiny Code

Suppose we run Timsort on [1, 2, 3, 7, 6, 5, 8, 9]:

Detect runs: [1,2,3], [7,6,5], [8,9]- Reverse [7,6,5] → [5,6,7]- Merge runs → [1,2,3,5,6,7,8,9] Efficient because it leverages the existing order.

Why It Matters

Hybrid sorts are the real-world heroes , they combine theory with practice. They teach an important principle:

When one algorithm’s weakness shows up, switch to another’s strength.

These are not academic curiosities , they’re in your compiler, your browser, your OS, your database. Understanding them means you understand how modern languages optimize fundamental operations.

Try It Yourself

Implement IntroSort and test on random, sorted, and reverse-sorted arrays.
Simulate Timsort’s run detection on nearly sorted input.
Compare sorting speed of Insertion Sort vs Timsort for small arrays.
Add counters to Quick Sort and see when IntroSort should switch.
Explore Python’s sorted() with different input shapes , guess when it uses merge vs insertion.

Hybrid sorts remind us: good algorithms adapt , they’re not rigid, they’re smart.

15. Special Sorts (Cycle, Gnome, Comb, Pancake)

Not all sorting algorithms follow the mainstream divide-and-conquer or distribution paradigms. Some were designed to solve niche problems, to illustrate elegant ideas, or simply to experiment with different mechanisms of ordering.

These special sorts, Cycle Sort, Gnome Sort, Comb Sort, and Pancake Sort, are fascinating not because they’re the fastest, but because they reveal creative ways to think about permutation, local order, and in-place operations.

1. Cycle Sort

Idea: Minimize the number of writes. Cycle sort rearranges elements into cycles, placing each value directly in its correct position. It performs exactly as many writes as there are misplaced elements, making it ideal for flash memory or systems where writes are expensive.

Steps:

For each position i, find where arr[i] belongs (its rank).
If it’s not already there, swap it into position.
Continue the cycle until the current position is correct.
Move to the next index.

Code:

void cycle_sort(int arr[], int n) {
    for (int cycle_start = 0; cycle_start < n - 1; cycle_start++) {
        int item = arr[cycle_start];
        int pos = cycle_start;

        for (int i = cycle_start + 1; i < n; i++)
            if (arr[i] < item) pos++;

        if (pos == cycle_start) continue;

        while (item == arr[pos]) pos++;
        int temp = arr[pos];
        arr[pos] = item;
        item = temp;

        while (pos != cycle_start) {
            pos = cycle_start;
            for (int i = cycle_start + 1; i < n; i++)
                if (arr[i] < item) pos++;
            while (item == arr[pos]) pos++;
            temp = arr[pos];
            arr[pos] = item;
            item = temp;
        }
    }
}

Complexity:

Time: (O$n^2$)- Writes: minimal (exactly n-c, where c = #cycles)- Stable: No Use Case: When minimizing writes is more important than runtime.

2. Gnome Sort

Idea: A simpler variation of insertion sort. Gnome sort moves back and forth like a “gnome” tidying flower pots: if two adjacent pots are out of order, swap and step back; otherwise, move forward.

Steps:

Start at index 1
If arr[i] >= arr[i-1], move forward
Else, swap and step back (if possible)
Repeat until the end

Code:

void gnome_sort(int arr[], int n) {
    int i = 1;
    while (i < n) {
        if (i == 0 || arr[i] >= arr[i - 1]) i++;
        else {
            int temp = arr[i]; arr[i] = arr[i - 1]; arr[i - 1] = temp;
            i--;
        }
    }
}

Complexity:

Time: (O$n^2$)- Space: (O(1))- Stable: Yes Use Case: Educational simplicity. It’s a readable form of insertion logic without nested loops.

3. Comb Sort

Idea: An improvement over Bubble Sort by introducing a gap between compared elements, shrinking it gradually. By jumping farther apart early, Comb Sort helps eliminate small elements that are “stuck” near the end.

Steps:

Start with gap = n
On each pass, shrink gap = gap / 1.3
Compare and swap items gap apart
Stop when gap = 1 and no swaps occur

Code:

void comb_sort(int arr[], int n) {
    int gap = n;
    int swapped = 1;
    while (gap > 1 || swapped) {
        gap = (gap * 10) / 13;
        if (gap == 9 || gap == 10) gap = 11;
        if (gap < 1) gap = 1;
        swapped = 0;
        for (int i = 0; i + gap < n; i++) {
            if (arr[i] > arr[i + gap]) {
                int temp = arr[i]; arr[i] = arr[i + gap]; arr[i + gap] = temp;
                swapped = 1;
            }
        }
    }
}

Complexity:

Average: (O$n \log n$)- Worst: (O$n^2$)- Space: (O(1))- Stable: No Use Case: When a simple, in-place, nearly linear-time alternative to bubble sort is desired.

4. Pancake Sort

Idea: Sort an array using only one operation: flip (reversing a prefix). It’s like sorting pancakes on a plate, flip the stack so the largest pancake goes to the bottom, then repeat for the rest.

Steps:

Find the maximum unsorted element
Flip it to the front
Flip it again to its correct position
Reduce the unsorted portion by one

Code:

void flip(int arr[], int i) {
    int start = 0;
    while (start < i) {
        int temp = arr[start];
        arr[start] = arr[i];
        arr[i] = temp;
        start++;
        i--;
    }
}

void pancake_sort(int arr[], int n) {
    for (int curr_size = n; curr_size > 1; curr_size--) {
        int mi = 0;
        for (int i = 1; i < curr_size; i++)
            if (arr[i] > arr[mi]) mi = i;
        if (mi != curr_size - 1) {
            flip(arr, mi);
            flip(arr, curr_size - 1);
        }
    }
}

Complexity:

Time: (O$n^2$)- Space: (O(1))- Stable: No Fun Fact: Pancake sort is the only known algorithm whose operations mimic a kitchen utensil, and inspired the Burnt Pancake Problem in combinatorics and genome rearrangement theory.

5. Comparison

Algorithm	Time	Space	Stable	Distinctive Trait
Cycle Sort	O(n²)	O(1)	No	Minimal writes
Gnome Sort	O(n²)	O(1)	Yes	Simple insertion-like behavior
Comb Sort	O(n log n) avg	O(1)	No	Shrinking gap, improved bubble
Pancake Sort	O(n²)	O(1)	No	Prefix reversals only

Each highlights a different design goal:

Cycle: minimize writes- Gnome: simplify logic- Comb: optimize comparisons- Pancake: restrict operations

Tiny Code

Example (Pancake Sort on [3, 6, 1, 9]):

Max = 9 at index 3 → flip(3) → [9,1,6,3]
flip(3) → [3,6,1,9] (9 fixed)
Max = 6 → flip(1) → [6,3,1,9]
flip(2) → [1,3,6,9]

Sorted using only flips.

Why It Matters

Special sorts show there’s more than one way to think about ordering. They’re laboratories for exploring new ideas: minimizing swaps, limiting operations, or optimizing stability. Even if they’re not the go-to in production, they deepen your intuition about sorting mechanics.

Try It Yourself

Implement each algorithm and visualize their operations step-by-step.
Measure how many writes Cycle Sort performs vs. others.
Compare Gnome and Insertion sort on nearly sorted arrays.
Modify Comb Sort’s shrink factor, how does performance change?
Write Pancake Sort with printouts of every flip to see the “stack” in motion.

These quirky algorithms prove that sorting isn’t just science, it’s also art and experimentation.

16. Linear and Binary Search

Searching is the process of finding a target value within a collection of data. Depending on whether the data is sorted or unsorted, you’ll use different strategies.

In this section, we revisit two of the most fundamental searching methods , Linear Search and Binary Search , and see how they underpin many higher-level algorithms and data structures.

1. Linear Search

Idea: Check each element one by one until you find the target. This is the simplest possible search and works on unsorted data.

Steps:

Start from index 0
Compare arr[i] with the target
If match, return index
If end reached, return -1

Code:

int linear_search(int arr[], int n, int key) {
    for (int i = 0; i < n; i++) {
        if (arr[i] == key) return i;
    }
    return -1;
}

Example: arr = [7, 2, 4, 9, 1], key = 9

Compare 7, 2, 4, then 9 → found at index 3 Complexity:
Time: ( O(n) )- Space: ( O(1) )- Best case: ( O(1) ) (first element)- Worst case: ( O(n) ) Pros:
Works on any data (sorted or unsorted)- Simple to implement Cons:
Inefficient on large arrays Use it when data is small or unsorted, or when simplicity matters more than speed.

2. Binary Search

Idea: If the array is sorted, you can repeatedly halve the search space. Compare the middle element to the target , if it’s greater, search left; if smaller, search right.

Steps:

Find the midpoint
If arr[mid] == key, done
If arr[mid] > key, search left
If arr[mid] < key, search right
Repeat until range is empty

Iterative Version:

int binary_search(int arr[], int n, int key) {
    int low = 0, high = n - 1;
    while (low <= high) {
        int mid = (low + high) / 2;
        if (arr[mid] == key) return mid;
        else if (arr[mid] < key) low = mid + 1;
        else high = mid - 1;
    }
    return -1;
}

Recursive Version:

int binary_search_rec(int arr[], int low, int high, int key) {
    if (low > high) return -1;
    int mid = (low + high) / 2;
    if (arr[mid] == key) return mid;
    else if (arr[mid] > key)
        return binary_search_rec(arr, low, mid - 1, key);
    else
        return binary_search_rec(arr, mid + 1, high, key);
}

Example: arr = [1, 3, 5, 7, 9, 11], key = 7

mid = 5 → key > mid → move right- mid = 7 → found Complexity:
Time: ( O$\log n$ )- Space: ( O(1) ) (iterative) or ( O$\log n$ ) (recursive)- Best case: ( O(1) ) (middle element) Requirements:
Must be sorted- Must have random access (array, not linked list) Pros:
Very fast for large sorted arrays- Foundation for advanced searches (e.g. interpolation, exponential) Cons:
Needs sorted data- Doesn’t adapt to frequent insertions/deletions

3. Binary Search Variants

Binary search is a pattern as much as a single algorithm. You can tweak it to find:

First occurrence: move left if arr[mid] == key- Last occurrence: move right if arr[mid] == key- Lower bound: first index ≥ key- Upper bound: first index > key Example (Lower Bound):

int lower_bound(int arr[], int n, int key) {
    int low = 0, high = n;
    while (low < high) {
        int mid = (low + high) / 2;
        if (arr[mid] < key) low = mid + 1;
        else high = mid;
    }
    return low;
}

Usage: These variants power functions like std::lower_bound() in C++ and binary search trees’ lookup logic.

4. Comparison

Algorithm	Works On	Time	Space	Sorted Data Needed	Notes
Linear Search	Any	O(n)	O(1)	No	Best for small/unsorted
Binary Search	Sorted	O(log n)	O(1)	Yes	Fastest on ordered arrays

Binary search trades simplicity for power , once your data is sorted, you unlock sublinear search.

Tiny Code

Compare on array [2, 4, 6, 8, 10], key = 8:

Linear: 4 steps- Binary: 2 steps This gap grows huge with size , for $n = 10^6$, linear takes up to a million steps, binary about 20.

Why It Matters

These two searches form the foundation of retrieval. Linear search shows brute-force iteration; binary search shows how structure (sorted order) leads to exponential improvement.

From databases to compiler symbol tables to tree lookups, this principle , divide to search faster , is everywhere.

Try It Yourself

Implement linear and binary search.
Count comparisons for ( n = 10, 100, 1000 ).
Modify binary search to return the first occurrence of a duplicate.
Try binary search on unsorted data , what happens?
Combine with sorting: sort array, then search.

Mastering these searches builds intuition for all lookup operations , they are the gateway to efficient data retrieval.

17. Interpolation and Exponential Search

Linear and binary search work well across many scenarios, but they don’t take into account how data is distributed. When values are uniformly distributed, we can estimate where the target lies, instead of always splitting the range in half. This leads to Interpolation Search, which “jumps” close to where the value should be.

For unbounded or infinite lists, we can’t even know the size of the array up front , that’s where Exponential Search shines, by quickly expanding its search window before switching to binary search.

Let’s dive into both.

1. Interpolation Search

Idea: If data is sorted and uniformly distributed, you can predict where a key might be using linear interpolation. Instead of splitting at the middle, estimate the position based on the value’s proportion in the range.

Formula: \[ \text{pos} = \text{low} + \frac{(key - arr[low]) \times (high - low)}{arr[high] - arr[low]} \]

This “guesses” where the key lies. If (key = arr[pos]), we’re done. Otherwise, adjust low or high and repeat.

Steps:

Compute estimated position pos
Compare arr[pos] with key
Narrow range accordingly
Repeat while low <= high and key within range

Code:

int interpolation_search(int arr[], int n, int key) {
    int low = 0, high = n - 1;

    while (low <= high && key >= arr[low] && key <= arr[high]) {
        if (low == high) {
            if (arr[low] == key) return low;
            return -1;
        }
        int pos = low + ((double)(key - arr[low]) * (high - low)) / (arr[high] - arr[low]);

        if (arr[pos] == key)
            return pos;
        if (arr[pos] < key)
            low = pos + 1;
        else
            high = pos - 1;
    }
    return -1;
}

Example: arr = [10, 20, 30, 40, 50], key = 40 pos = 0 + ((40 - 10) * (4 - 0)) / (50 - 10) = 3 → found at index 3

Complexity:

Best: (O(1))- Average: (O$\log \log n$) (uniform data)- Worst: (O(n)) (non-uniform or skewed data)- Space: (O(1)) When to Use:
Data is sorted and nearly uniform- Numeric data where values grow steadily Note: Interpolation search is adaptive , faster when data is predictable, slower when data is irregular.

2. Exponential Search

Idea: When you don’t know the array size (e.g., infinite streams, linked data, files), you can’t just binary search from 0 to n-1. Exponential search finds a search range dynamically by doubling its step size until it overshoots the target, then does binary search within that range.

Steps:

If arr[0] == key, return 0
Find a range [bound/2, bound] such that arr[bound] >= key
Perform binary search in that range

Code:

int exponential_search(int arr[], int n, int key) {
    if (arr[0] == key) return 0;
    int bound = 1;
    while (bound < n && arr[bound] < key)
        bound *= 2;
    int low = bound / 2;
    int high = (bound < n) ? bound : n - 1;
    // Binary search in [low, high]
    while (low <= high) {
        int mid = (low + high) / 2;
        if (arr[mid] == key) return mid;
        else if (arr[mid] < key) low = mid + 1;
        else high = mid - 1;
    }
    return -1;
}

Example: arr = [2, 4, 6, 8, 10, 12, 14, 16], key = 10

Step: bound = 1 (4), 2 (6), 4 (10 ≥ key)- Binary search [2,4] → found Complexity:
Time: (O$\log i$), where (i) is index of the target- Space: (O(1))- Best: (O(1)) When to Use:
Unbounded or streamed data- Unknown array size but sorted order

3. Comparison

Algorithm	Best Case	Average Case	Worst Case	Data Requirement	Notes
Linear Search	O(1)	O(n)	O(n)	Unsorted	Works everywhere
Binary Search	O(1)	O(log n)	O(log n)	Sorted	Predictable halving
Interpolation Search	O(1)	O(log log n)	O(n)	Sorted + Uniform	Adaptive, fast on uniform data
Exponential Search	O(1)	O(log n)	O(log n)	Sorted	Great for unknown size

Interpolation improves on binary if data is smooth. Exponential shines when size is unknown.

Tiny Code

Interpolation intuition: If your data is evenly spaced (10, 20, 30, 40, 50), the value 40 should be roughly 75% along. Instead of halving every time, we jump right near it. It’s data-aware searching.

Exponential intuition: When size is unknown, “expand until you find the wall,” then search within.

Why It Matters

These two searches show how context shapes algorithm design:

Distribution (Interpolation Search)- Boundaries (Exponential Search) They teach that performance depends not only on structure (sortedness) but also metadata , how much you know about data spacing or limits.

These principles resurface in skip lists, search trees, and probabilistic indexing.

Try It Yourself

Test interpolation search on [10, 20, 30, 40, 50] , note how few steps it takes.
Try the same on [1, 2, 4, 8, 16, 32, 64] , note slowdown.
Implement exponential search and simulate an “infinite” array by stopping at n.
Compare binary vs interpolation search on random vs uniform data.
Extend exponential search to linked lists , how does complexity change?

Understanding these searches helps you tailor lookups to the shape of your data , a key skill in algorithmic thinking.

18. Selection Algorithms (Quickselect, Median of Medians)

Sometimes you don’t need to sort an entire array , you just want the k-th smallest (or largest) element. Sorting everything is overkill when you only need one specific rank. Selection algorithms solve this problem efficiently, often in linear time.

They’re the backbone of algorithms for median finding, percentiles, and order statistics, and they underpin operations like pivot selection in Quick Sort.

1. The Selection Problem

Given an unsorted array of ( n ) elements and a number ( k ), find the element that would be at position ( k ) if the array were sorted.

For example: arr = [7, 2, 9, 4, 6], (k = 3) → Sorted = [2, 4, 6, 7, 9] → 3rd smallest = 6

We can solve this without sorting everything.

2. Quickselect

Idea: Quickselect is a selection variant of Quick Sort. It partitions the array around a pivot, but recurses only on the side that contains the k-th element.

It has average-case O(n) time because each partition roughly halves the search space.

Steps:

Choose a pivot (random or last element)
Partition array into elements < pivot and > pivot
Let pos be the pivot’s index after partition
If pos == k-1 → done
If pos > k-1 → recurse left
If pos < k-1 → recurse right

Code:

int partition(int arr[], int low, int high) {
    int pivot = arr[high];
    int i = low;
    for (int j = low; j < high; j++) {
        if (arr[j] < pivot) {
            int temp = arr[i]; arr[i] = arr[j]; arr[j] = temp;
            i++;
        }
    }
    int temp = arr[i]; arr[i] = arr[high]; arr[high] = temp;
    return i;
}

int quickselect(int arr[], int low, int high, int k) {
    if (low == high) return arr[low];
    int pos = partition(arr, low, high);
    int rank = pos - low + 1;
    if (rank == k) return arr[pos];
    if (rank > k) return quickselect(arr, low, pos - 1, k);
    return quickselect(arr, pos + 1, high, k - rank);
}

Example: arr = [7, 2, 9, 4, 6], ( k = 3 )

Pivot = 6- Partition → [2, 4, 6, 9, 7], pos = 2- rank = 3 → found (6) Complexity:
Average: (O(n))- Worst: (O$n^2$) (bad pivots)- Space: (O(1))- In-place When to Use:
Fast average case- You don’t need full sorting Quickselect is used in C++’s nth_element() and many median-finding implementations.

3. Median of Medians

Idea: Guarantee worst-case ( O(n) ) time by choosing a good pivot deterministically.

This method ensures the pivot divides the array into reasonably balanced parts every time.

Steps:

Divide array into groups of 5
Find the median of each group (using insertion sort)
Recursively find the median of these medians → pivot
Partition array around this pivot
Recurse into the side containing the k-th element

This guarantees at least 30% of elements are eliminated each step → linear time in worst case.

Code Sketch:

int select_pivot(int arr[], int low, int high) {
    int n = high - low + 1;
    if (n <= 5) {
        insertion_sort(arr + low, n);
        return low + n / 2;
    }

    int medians[(n + 4) / 5];
    int i;
    for (i = 0; i < n / 5; i++) {
        insertion_sort(arr + low + i * 5, 5);
        medians[i] = arr[low + i * 5 + 2];
    }
    if (i * 5 < n) {
        insertion_sort(arr + low + i * 5, n % 5);
        medians[i] = arr[low + i * 5 + (n % 5) / 2];
        i++;
    }
    return select_pivot(medians, 0, i - 1);
}

You’d then partition around pivot and recurse just like Quickselect.

Complexity:

Worst: (O(n))- Space: (O(1)) (in-place version)- Stable: No (doesn’t preserve order) Why It Matters: Median of Medians is slower in practice than Quickselect but provides theoretical guarantees , vital in real-time or critical systems.

4. Special Cases

Min / Max: trivial , just scan once ((O(n)))- Median: $k = \lceil n/2 \rceil$ , can use Quickselect or Median of Medians- Top-k Elements: use partial selection or heaps (k smallest/largest) Example: To get top 5 scores from a million entries, use Quickselect to find 5th largest, then filter ≥ threshold.

5. Comparison

Algorithm	Best	Average	Worst	Stable	In-Place	Notes
Quickselect	O(n)	O(n)	O(n²)	No	Yes	Fast in practice
Median of Medians	O(n)	O(n)	O(n)	No	Yes	Deterministic
Sorting	O(n log n)	O(n log n)	O(n log n)	Depends	Depends	Overkill for single element

Quickselect is fast and simple; Median of Medians is safe and predictable.

Tiny Code

Find 4th smallest in [9, 7, 2, 5, 4, 3]:

Pivot = 4 → partition [2,3,4,9,7,5]- 4 at position 2 → rank = 3 < 4 → recurse right- New range [9,7,5], ( k = 1 ) → smallest = 5 Result: 5

Why It Matters

Selection algorithms reveal a key insight:

Sometimes you don’t need everything , just what matters.

They form the basis for:

Median filters in signal processing- Partitioning steps in sorting- k-th order statistics- Robust statistics and quantile computation They embody a “partial work, full answer” philosophy , do exactly enough.

Try It Yourself

Implement Quickselect and find k-th smallest for various k.
Compare runtime vs full sorting.
Modify Quickselect to find k-th largest.
Implement Median of Medians pivot selection.
Use Quickselect to find median of 1,000 random elements.

Mastering selection algorithms helps you reason about efficiency , you’ll learn when to stop sorting and start selecting.

19. Range Searching and Nearest Neighbor

Searching isn’t always about finding a single key. Often, you need to find all elements within a given range , or the closest match to a query point.

These problems are central to databases, computational geometry, and machine learning (like k-NN classification). This section introduces algorithms for range queries (e.g. find all values between L and R) and nearest neighbor searches (e.g. find the point closest to query q).

1. Range Searching

Idea: Given a set of data points (1D or multidimensional), quickly report all points within a specified range.

In 1D (simple arrays), range queries can be handled by binary search and prefix sums. In higher dimensions, we need trees designed for efficient spatial querying.

A. 1D Range Query (Sorted Array)

Goal: Find all elements in [L, R].

Steps:

Use lower bound to find first element ≥ L
Use upper bound to find first element > R
Output all elements in between

Code (C++-style pseudo):

int l = lower_bound(arr, arr + n, L) - arr;
int r = upper_bound(arr, arr + n, R) - arr;
for (int i = l; i < r; i++)
    printf("%d ", arr[i]);

Time Complexity:

Binary search bounds: (O$\log n$)- Reporting results: (O(k)) where (k) = number of elements in range → Total: (O$\log n + k$)

B. Prefix Sum Range Query (For sums)

If you just need the sum (not the actual elements), use prefix sums:

\[ \text{prefix}[i] = a_0 + a_1 + \ldots + a_i \]

Then range sum: \[ \text{sum}(L, R) = \text{prefix}[R] - \text{prefix}[L - 1] \]

Code:

int prefix[n];
prefix[0] = arr[0];
for (int i = 1; i < n; i++)
    prefix[i] = prefix[i - 1] + arr[i];

int range_sum(int L, int R) {
    return prefix[R] - (L > 0 ? prefix[L - 1] : 0);
}

Time: (O(1)) per query after (O(n)) preprocessing.

Used in:

Databases for fast range aggregation- Fenwick trees, segment trees

C. 2D Range Queries (Rectangular Regions)

For points ((x, y)), queries like:

“Find all points where $L_x ≤ x ≤ R_x$ and $L_y ≤ y ≤ R_y$”

Use specialized structures:

Range Trees (balanced BSTs per dimension)- Fenwick Trees / Segment Trees (for 2D arrays)- KD-Trees (spatial decomposition) Time: (O$\log^2 n + k$) typical for 2D Space: (O$n \log n$)

2. Nearest Neighbor Search

Idea: Given a set of points, find the one closest to query (q). Distance is often Euclidean, but can be any metric.

Brute Force: Check all points → (O(n)) per query. Too slow for large datasets.

We need structures that let us prune far regions fast.

A. KD-Tree

KD-tree = K-dimensional binary tree. Each level splits points by one coordinate, alternating axes. Used for efficient nearest neighbor search in low dimensions (2D-10D).

Construction:

Choose axis = depth % k
Sort points by axis
Pick median → root
Recursively build left and right

Query (Nearest Neighbor):

Traverse down tree based on query position
Backtrack , check whether hypersphere crosses splitting plane
Keep track of best (closest) distance

Complexity:

Build: (O$n \log n$)- Query: (O$\log n$) avg, (O(n)) worst Use Cases:
Nearest city lookup- Image / feature vector matching- Game AI spatial queries Code Sketch (2D Example):

struct Point { double x, y; };

double dist(Point a, Point b) {
    return sqrt((a.x - b.x)*(a.x - b.x) + (a.y - b.y)*(a.y - b.y));
}

(Full KD-tree implementation omitted for brevity , idea is recursive partitioning.)

B. Ball Tree / VP-Tree

For high-dimensional data, KD-trees degrade. Alternatives like Ball Trees (split by hyperspheres) or VP-Trees (Vantage Point Trees) perform better.

They split based on distance metrics, not coordinate axes.

C. Approximate Nearest Neighbor (ANN)

For large-scale, high-dimensional data (e.g. embeddings, vectors):

Locality Sensitive Hashing (LSH)- HNSW (Hierarchical Navigable Small World Graphs) These trade exactness for speed, common in:
Vector databases- Recommendation systems- AI model retrieval

3. Summary

Problem	Brute Force	Optimized	Time (Query)	Notes
1D Range Query	Scan O(n)	Binary Search	O(log n + k)	Sorted data
Range Sum	O(n)	Prefix Sum	O(1)	Static data
2D Range Query	O(n)	Range Tree	O(log² n + k)	Spatial filtering
Nearest Neighbor	O(n)	KD-Tree	O(log n) avg	Exact, low-dim
Nearest Neighbor (high-dim)	O(n)	HNSW / LSH	~O(1)	Approximate

Tiny Code

Simple 1D range query:

int arr[] = {1, 3, 5, 7, 9, 11};
int L = 4, R = 10;
int l = lower_bound(arr, arr + 6, L) - arr;
int r = upper_bound(arr, arr + 6, R) - arr;
for (int i = l; i < r; i++)
    printf("%d ", arr[i]); // 5 7 9

Output: 5 7 9

Why It Matters

Range and nearest-neighbor queries power:

Databases (SQL range filters, BETWEEN)- Search engines (spatial indexing)- ML (k-NN classifiers, vector similarity)- Graphics / Games (collision detection, spatial queries) These are not just searches , they’re geometric lookups, linking algorithms to spatial reasoning.

Try It Yourself

Write a function to return all numbers in [L, R] using binary search.
Build a prefix sum array and answer 5 range-sum queries in O(1).
Implement a KD-tree for 2D points and query nearest neighbor.
Compare brute-force vs KD-tree search on 1,000 random points.
Explore Python’s scipy.spatial.KDTree or sklearn.neighbors.

These algorithms bridge searching with geometry and analytics, forming the backbone of spatial computation.

20. Search Optimizations and Variants

We’ve explored the main search families , linear, binary, interpolation, exponential , each fitting a different data shape or constraint. Now let’s move one step further: optimizing search for performance and adapting it to specialized scenarios.

This section introduces practical variants and enhancements used in real systems, databases, and competitive programming, including jump search, fibonacci search, ternary search, and exponential + binary combinations.

1. Jump Search

Idea: If data is sorted, we can “jump” ahead by fixed steps instead of scanning linearly. It’s like hopping through the array in blocks , when you overshoot the target, you step back and linearly search that block.

It strikes a balance between linear and binary search , fewer comparisons without the recursion or halving of binary search.

Steps:

Choose jump size = $\sqrt{n}$
Jump by blocks until arr[step] > key
Linear search in previous block

Code:

int jump_search(int arr[], int n, int key) {
    int step = sqrt(n);
    int prev = 0;

    while (arr[min(step, n) - 1] < key) {
        prev = step;
        step += sqrt(n);
        if (prev >= n) return -1;
    }

    for (int i = prev; i < min(step, n); i++) {
        if (arr[i] == key) return i;
    }
    return -1;
}

Example: arr = [1, 3, 5, 7, 9, 11, 13, 15], key = 11

step = 2- Jump 5, 7, 9, 11 → found Complexity:
Time: (O$\sqrt{n}$)- Space: (O(1))- Works on sorted data When to Use: For moderately sized sorted lists when you want fewer comparisons but minimal overhead.

2. Fibonacci Search

Idea: Similar to binary search, but it splits the array based on Fibonacci numbers instead of midpoints. This allows using only addition and subtraction (no division), useful on hardware where division is costly.

Also, like binary search, it halves (roughly) the search space each iteration.

Steps:

Find the smallest Fibonacci number ≥ n
Use it to compute probe index
Compare and move interval accordingly

Code (Sketch):

int fibonacci_search(int arr[], int n, int key) {
    int fibMMm2 = 0; // (m-2)'th Fibonacci
    int fibMMm1 = 1; // (m-1)'th Fibonacci
    int fibM = fibMMm2 + fibMMm1; // m'th Fibonacci

    while (fibM < n) {
        fibMMm2 = fibMMm1;
        fibMMm1 = fibM;
        fibM = fibMMm2 + fibMMm1;
    }

    int offset = -1;
    while (fibM > 1) {
        int i = min(offset + fibMMm2, n - 1);
        if (arr[i] < key) {
            fibM = fibMMm1;
            fibMMm1 = fibMMm2;
            fibMMm2 = fibM - fibMMm1;
            offset = i;
        } else if (arr[i] > key) {
            fibM = fibMMm2;
            fibMMm1 = fibMMm1 - fibMMm2;
            fibMMm2 = fibM - fibMMm1;
        } else return i;
    }
    if (fibMMm1 && arr[offset + 1] == key)
        return offset + 1;
    return -1;
}

Complexity:

Time: (O$\log n$)- Space: (O(1))- Sorted input required Fun Fact: Fibonacci search was originally designed for tape drives , where random access is expensive, and predictable jumps matter.

3. Ternary Search

Idea: When the function or sequence is unimodal (strictly increasing then decreasing), you can locate a maximum or minimum by splitting the range into three parts instead of two.

Used not for discrete lookup but for optimization on sorted functions.

Steps:

Divide range into thirds
Evaluate at two midpoints m1, m2
Eliminate one-third based on comparison
Repeat until range is small

Code:

double ternary_search(double low, double high, double (*f)(double)) {
    for (int i = 0; i < 100; i++) {
        double m1 = low + (high - low) / 3;
        double m2 = high - (high - low) / 3;
        if (f(m1) < f(m2))
            low = m1;
        else
            high = m2;
    }
    return (low + high) / 2;
}

Example: Find minimum of ( f(x) = (x-3)^2 ) between [0,10]. After iterations, converges to (x ≈ 3).

Complexity:

Time: $O(\log\text{range})$
Space: $O(1)$
Works for unimodal functions

Used in:

Mathematical optimization
Search-based tuning
Game AI decision models

4. Binary Search Variants (Review)

Binary search can be tailored to answer richer queries:

Lower Bound: first index ≥ key- Upper Bound: first index > key- Equal Range: range of all equal elements- Rotated Arrays: find element in rotated sorted array- Infinite Arrays: use exponential expansion Rotated Example: arr = [6,7,9,1,3,4], key = 3 → Find pivot, then binary search correct side.

5. Combined Searches

Real systems often chain algorithms:

Exponential + Binary Search → when bounds unknown- Interpolation + Linear Search → when near target- Jump + Linear Search → hybrid iteration These hybrids use context switching , pick a fast search, then fall back to simple scan in a narrowed window.

6. Summary

Algorithm	Time	Space	Data Requirement	Special Strength
Jump Search	O(√n)	O(1)	Sorted	Fewer comparisons
Fibonacci Search	O(log n)	O(1)	Sorted	Division-free
Ternary Search	O(log range)	O(1)	Unimodal	Optimization
Binary Variants	O(log n)	O(1)	Sorted	Bound finding
Combined Searches	Adaptive	O(1)	Mixed	Practical hybrids

Tiny Code

Jump Search intuition:

// Blocks of size sqrt(n)
[1, 3, 5, 7, 9, 11, 13, 15]
Step: 3 → 7 > 6 → search previous block

Jumps reduce comparisons dramatically vs linear scan.

Why It Matters

Search optimization is about adapting structure to context. You don’t always need a fancy data structure , sometimes a tweak like fixed-step jumping or Fibonacci spacing yields massive gains.

These ideas influence:

Indexing in databases- Compilers’ symbol resolution- Embedded systems with low-level constraints They embody the principle: search smarter, not harder.

Try It Yourself

Implement Jump Search and test vs Binary Search on 1M elements.
Write a Fibonacci Search , compare steps taken.
Use Ternary Search to find min of a convex function.
Modify binary search to find element in rotated array.
Combine Jump + Linear , how does it behave for small n?

Understanding these variants arms you with flexibility , the heart of algorithmic mastery.

Chapter 3. Data Structures in Actions

21. Arrays, Linked Lists, Stacks, Queues

Every data structure is built on top of a few core foundations , the ones that teach you how data is stored, accessed, and moved. In this section, we’ll revisit the essentials: arrays, linked lists, stacks, and queues.

They’re simple, but they show you the most important design trade-offs in algorithms:

Contiguity vs. flexibility- Speed vs. dynamic growth- Last-in-first-out vs. first-in-first-out access

1. Arrays

Idea: A contiguous block of memory storing elements of the same type. Access by index in O(1) time , that’s their superpower.

Operations:

Access arr[i]: (O(1))- Update arr[i]: (O(1))- Insert at end: (O(1)) (amortized for dynamic arrays)- Insert in middle: (O(n))- Delete: (O(n)) Example:

int arr[5] = {10, 20, 30, 40, 50};
printf("%d", arr[2]); // 30

Strengths:

Fast random access- Cache-friendly (contiguous memory)- Simple, predictable Weaknesses:
Fixed size (unless using dynamic array)- Costly inserts/deletes Dynamic Arrays: Languages provide resizable arrays (like vector in C++ or ArrayList in Java) using doubling strategy , when full, allocate new array twice as big and copy. This gives amortized (O(1)) insertion at end.

2. Linked Lists

Idea: A chain of nodes, where each node stores a value and a pointer to the next. No contiguous memory required.

Operations:

Access: (O(n))- Insert/Delete at head: (O(1))- Search: (O(n)) Example:

typedef struct Node {
    int data;
    struct Node* next;
} Node;

Node* head = NULL;

Types:

Singly Linked List: one pointer (next)- Doubly Linked List: two pointers (next, prev)- Circular Linked List: last node points back to first Strengths:
Dynamic size- Fast insert/delete (no shifting) Weaknesses:
Slow access- Extra memory for pointers- Poor cache locality Linked lists shine when memory is fragmented or frequent insertions/deletions are needed.

3. Stack

Idea: A Last-In-First-Out (LIFO) structure , the most recently added element is the first to be removed.

Used in:

Function call stacks- Expression evaluation- Undo operations Operations:
push(x): add element on top- pop(): remove top element- peek(): view top element Example (Array-based Stack):

#define MAX 100
int stack[MAX], top = -1;

void push(int x) { stack[++top] = x; }
int pop() { return stack[top--]; }
int peek() { return stack[top]; }

Complexity: All (O(1)): push, pop, peek

Variants:

Linked-list-based stack (no fixed size)- Min-stack (tracks minimums) Stacks also appear implicitly , in recursion and backtracking algorithms.

4. Queue

Idea: A First-In-First-Out (FIFO) structure , the first added element leaves first.

Used in:

Task scheduling- BFS traversal- Producer-consumer pipelines Operations:
enqueue(x): add to rear- dequeue(): remove from front- front(): view front Example (Array-based Queue):

#define MAX 100
int queue[MAX], front = 0, rear = 0;

void enqueue(int x) { queue[rear++] = x; }
int dequeue() { return queue[front++]; }

This simple implementation can waste space. A circular queue fixes that by wrapping indices modulo MAX:

rear = (rear + 1) % MAX;

Complexity: All (O(1)): enqueue, dequeue

Variants:

Deque (double-ended queue): push/pop from both ends- Priority Queue: dequeue highest priority (not strictly FIFO)

5. Comparison

Structure	Access	Insert	Delete	Order	Memory	Notes
Array	O(1)	O(n)	O(n)	Indexed	Contiguous	Fast access
Linked List	O(n)	O(1)*	O(1)*	Sequential	Pointers	Flexible size
Stack	O(1)	O(1)	O(1)	LIFO	Minimal	Call stack, parsing
Queue	O(1)	O(1)	O(1)	FIFO	Minimal	Scheduling, BFS

(* at head or tail with pointer)

Tiny Code

Simple stack example:

push(10);
push(20);
printf("%d", pop()); // 20

Simple queue example:

enqueue(5);
enqueue(8);
printf("%d", dequeue()); // 5

These short routines appear in almost every algorithm , from recursion stacks to graph traversals.

Why It Matters

These four structures form the spine of data structures:

Arrays teach indexing and memory- Linked lists teach pointers and dynamic allocation- Stacks teach recursion and reversal- Queues teach scheduling and order maintenance Every complex structure (trees, heaps, graphs) builds on these.

Master them, and every algorithm will feel more natural.

Try It Yourself

Implement a linked list with insert_front and delete_value.
Build a stack and use it to reverse an array.
Implement a queue for a round-robin scheduler.
Convert infix expression to postfix using a stack.
Compare time taken to insert 1000 elements in array vs linked list.

Understanding these foundations gives you the vocabulary of structure , the way algorithms organize their thoughts in memory.

22. Hash Tables and Variants (Cuckoo, Robin Hood, Consistent)

When you need lightning-fast lookups, insertions, and deletions, few data structures match the raw efficiency of a hash table. They’re everywhere , from symbol tables and caches to compilers and databases , powering average-case O(1) access.

In this section, we’ll unpack how hash tables work, their collision strategies, and explore modern variants like Cuckoo Hashing, Robin Hood Hashing, and Consistent Hashing, each designed to handle different real-world needs.

1. The Core Idea

A hash table maps keys to values using a hash function that transforms the key into an index in an array.

\[ \text{index} = h(\text{key}) \bmod \text{table\_size} \]

If no two keys hash to the same index, all operations are (O(1)). But in practice, collisions happen , two keys may map to the same slot , and we must handle them smartly.

2. Collision Resolution Strategies

A. Separate Chaining Each table slot holds a linked list (or dynamic array) of entries with the same hash.

Pros: Simple, handles load factor > 1 Cons: Extra pointers, memory overhead

Code Sketch:

typedef struct Node {
    int key, value;
    struct Node* next;
} Node;

Node* table[SIZE];

int hash(int key) { return key % SIZE; }

void insert(int key, int value) {
    int idx = hash(key);
    Node* node = malloc(sizeof(Node));
    node->key = key; node->value = value;
    node->next = table[idx];
    table[idx] = node;
}

B. Open Addressing All keys live directly in the table. On collision, find another slot.

Three main strategies:

Linear probing: try next slot (+1)- Quadratic probing: step size increases quadratically- Double hashing: second hash decides step size Example (Linear Probing):

int hash(int key) { return key % SIZE; }
int insert(int key, int value) {
    int idx = hash(key);
    while (table[idx].used)
        idx = (idx + 1) % SIZE;
    table[idx] = (Entry){key, value, 1};
}

Load Factor $\alpha = \frac{n}{m}$ affects performance , when too high, rehash to larger size.

3. Modern Variants

Classic hash tables can degrade under heavy collisions. Modern variants reduce probe chains and balance load more evenly.

A. Cuckoo Hashing

Idea: Each key has two possible locations , if both full, evict one (“kick out the cuckoo”) and reinsert. Ensures constant lookup , at most two probes.

Steps:

Compute two hashes (h_1(key)), (h_2(key))
If slot 1 empty → place
Else evict occupant, reinsert it using alternate hash
Repeat until placed or cycle detected (rehash if needed)

Code Sketch (Conceptual):

int h1(int key) { return key % SIZE; }
int h2(int key) { return (key / SIZE) % SIZE; }

void insert(int key) {
    int pos1 = h1(key);
    if (!table1[pos1]) { table1[pos1] = key; return; }
    int displaced = table1[pos1]; table1[pos1] = key;

    int pos2 = h2(displaced);
    if (!table2[pos2]) { table2[pos2] = displaced; return; }
    // continue evicting if needed
}

Pros:

Worst-case O(1) lookup (constant probes)- Predictable latency Cons:
Rehash needed on insertion failure- More complex logic Used in high-performance caches and real-time systems.

B. Robin Hood Hashing

Idea: Steal slots from richer (closer) keys to ensure fairness. When inserting, if you find someone with smaller probe distance, swap , “steal from the rich, give to the poor.”

This balances probe lengths and improves variance and average lookup time.

Key Principle: \[ \text{If new\_probe\_distance} > \text{existing\_probe\_distance} \Rightarrow \text{swap} \]

Code Sketch:

int insert(int key) {
    int idx = hash(key);
    int dist = 0;
    while (table[idx].used) {
        if (table[idx].dist < dist) {
            // swap entries
            Entry tmp = table[idx];
            table[idx] = (Entry){key, dist, 1};
            key = tmp.key;
            dist = tmp.dist;
        }
        idx = (idx + 1) % SIZE;
        dist++;
    }
    table[idx] = (Entry){key, dist, 1};
}

Pros:

Reduced variance- Better performance under high load Cons:
Slightly slower insertion Used in modern languages like Rust (hashbrown) and Swift.

C. Consistent Hashing

Idea: When distributing keys across multiple nodes, you want minimal movement when adding/removing a node. Consistent hashing maps both keys and nodes onto a circular hash ring.

Steps:

Hash nodes into a ring
Hash keys into same ring
Each key belongs to the next node clockwise

When a node is added or removed, only nearby keys move.

Used in:

Distributed caches (Memcached, DynamoDB)- Load balancing- Sharding in databases Code (Conceptual):

Ring: 0 -------------------------------- 2^32
Nodes: N1 at hash("A"), N2 at hash("B")
Key: hash("User42") → assign to next node clockwise

Pros:

Minimal rebalancing- Scalable Cons:
More complex setup- Requires virtual nodes for even distribution

4. Complexity Overview

Variant	Insert	Search	Delete	Memory	Notes
Chaining	O(1) avg	O(1) avg	O(1) avg	High	Simple, dynamic
Linear Probing	O(1) avg	O(1) avg	O(1) avg	Low	Clustering risk
Cuckoo	O(1)	O(1)	O(1)	Medium	Two tables, predictable
Robin Hood	O(1)	O(1)	O(1)	Low	Balanced probes
Consistent	O(log n)	O(log n)	O(log n)	Depends	Distributed keys

Tiny Code

Simple hash table with linear probing:

#define SIZE 10
int keys[SIZE], values[SIZE], used[SIZE];

int hash(int key) { return key % SIZE; }

void insert(int key, int value) {
    int idx = hash(key);
    while (used[idx]) idx = (idx + 1) % SIZE;
    keys[idx] = key; values[idx] = value; used[idx] = 1;
}

Lookup:

int get(int key) {
    int idx = hash(key);
    while (used[idx]) {
        if (keys[idx] == key) return values[idx];
        idx = (idx + 1) % SIZE;
    }
    return -1;
}

Why It Matters

Hash tables show how structure and randomness combine for speed. They embody the idea that a good hash function + smart collision handling = near-constant performance.

Variants like Cuckoo and Robin Hood are examples of modern engineering trade-offs , balancing performance, memory, and predictability. Consistent hashing extends these ideas to distributed systems.

Try It Yourself

Implement a hash table with chaining and test collision handling.
Modify it to use linear probing , measure probe lengths.
Simulate Cuckoo hashing with random inserts.
Implement Robin Hood swapping logic , observe fairness.
Draw a consistent hash ring with 3 nodes and 10 keys , track movement when adding one node.

Once you master these, you’ll see hashing everywhere , from dictionaries to distributed databases.

23. Heaps (Binary, Fibonacci, Pairing)

Heaps are priority-driven data structures , they always give you fast access to the most important element, typically the minimum or maximum. They’re essential for priority queues, scheduling, graph algorithms (like Dijkstra), and streaming analytics.

In this section, we’ll start from the basic binary heap and then explore more advanced ones like Fibonacci and pairing heaps, which trade off simplicity, speed, and amortized guarantees.

1. The Heap Property

A heap is a tree-based structure (often represented as an array) that satisfies:

Min-Heap: Every node ≤ its children- Max-Heap: Every node ≥ its children This ensures the root always holds the smallest (or largest) element.

Complete Binary Tree: All levels filled except possibly the last, which is filled left to right.

Example (Min-Heap):

Here, the smallest element (2) is at the root.

2. Binary Heap

Storage: Stored compactly in an array. For index i (0-based):

Parent = (i - 1) / 2- Left child = 2i + 1- Right child = 2i + 2 Operations:

Operation	Description	Time
`push(x)`	Insert element	(O$\log n$)
`pop()`	Remove root	(O$\log n$)
`peek()`	Get root	(O(1))
`heapify()`	Build heap	(O(n))

A. Insertion (Push)

Insert at the end, then bubble up until heap property is restored.

Code:

void push(int heap[], int *n, int x) {
    int i = (*n)++;
    heap[i] = x;
    while (i > 0 && heap[(i - 1)/2] > heap[i]) {
        int tmp = heap[i];
        heap[i] = heap[(i - 1)/2];
        heap[(i - 1)/2] = tmp;
        i = (i - 1) / 2;
    }
}

B. Removal (Pop)

Replace root with last element, then bubble down (heapify).

Code:

void heapify(int heap[], int n, int i) {
    int smallest = i, l = 2*i + 1, r = 2*i + 2;
    if (l < n && heap[l] < heap[smallest]) smallest = l;
    if (r < n && heap[r] < heap[smallest]) smallest = r;
    if (smallest != i) {
        int tmp = heap[i]; heap[i] = heap[smallest]; heap[smallest] = tmp;
        heapify(heap, n, smallest);
    }
}

Pop:

int pop(int heap[], int *n) {
    int root = heap[0];
    heap[0] = heap[--(*n)];
    heapify(heap, *n, 0);
    return root;
}

C. Building a Heap

Heapify bottom-up from last non-leaf: (O(n))

for (int i = n/2 - 1; i >= 0; i--)
    heapify(heap, n, i);

D. Applications

Heapsort: Repeatedly pop min (O(n log n))- Priority Queue: Fast access to smallest/largest- Graph Algorithms: Dijkstra, Prim- Streaming: Median finding using two heaps

3. Fibonacci Heap

Idea: A heap optimized for algorithms that do many decrease-key operations (like Dijkstra’s). It stores a collection of trees with lazy merging, giving amortized bounds:

Operation	Amortized Time
Insert	(O(1))
Find-Min	(O(1))
Extract-Min	(O$\log n$)
Decrease-Key	(O(1))
Merge	(O(1))

It achieves this by delaying structural fixes until absolutely necessary (using potential method in amortized analysis).

Structure:

A circular linked list of roots- Each node can have multiple children- Consolidation on extract-min ensures minimal degree duplication Used in theoretical optimizations where asymptotic complexity matters (e.g. Dijkstra in (O$E + V \log V$) vs (O$E \log V$)).

4. Pairing Heap

Idea: A simpler, practical alternative to Fibonacci heaps. Self-adjusting structure using a tree with multiple children.

Operations:

Insert: (O(1))- Extract-Min: (O$\log n$) amortized- Decrease-Key: (O$\log n$) amortized Steps:
merge two heaps: attach one as child of the other- extract-min: remove root, merge children in pairs, then merge all results Why It’s Popular:
Easier to implement- Great real-world performance- Used in functional programming and priority schedulers

5. Comparison

Heap Type	Insert	Extract-Min	Decrease-Key	Merge	Simplicity	Use Case
Binary Heap	O(log n)	O(log n)	O(log n)	O(n)	Easy	General-purpose
Fibonacci Heap	O(1)	O(log n)	O(1)	O(1)	Complex	Theoretical optimality
Pairing Heap	O(1)	O(log n)	O(log n)	O(1)	Moderate	Practical alternative

Tiny Code

Binary Heap Demo:

int heap[100], n = 0;
push(heap, &n, 10);
push(heap, &n, 4);
push(heap, &n, 7);
printf("%d ", pop(heap, &n)); // 4

Output: 4

Why It Matters

Heaps show how to prioritize elements dynamically. From sorting to scheduling, they’re the backbone of many “choose the best next” algorithms. Variants like Fibonacci and Pairing Heaps demonstrate how amortized analysis can unlock deeper efficiency , crucial in graph theory and large-scale optimization.

Try It Yourself

Implement a binary min-heap with push, pop, and peek.
Use a heap to sort a list (Heapsort).
Build a priority queue for task scheduling.
Study how Dijkstra changes when replacing arrays with heaps.
Explore Fibonacci heap pseudo-code , trace decrease-key.

Mastering heaps gives you a deep sense of priority-driven design , how to keep “the best” element always within reach.

24. Balanced Trees (AVL, Red-Black, Splay, Treap)

Unbalanced trees can degrade into linear lists, turning your beautiful (O$\log n$) search into a sad (O(n)) crawl. Balanced trees solve this , they keep the height logarithmic, guaranteeing fast lookups, insertions, and deletions.

In this section, you’ll learn how different balancing philosophies work , AVL (strict balance), Red-Black (relaxed balance), Splay (self-adjusting), and Treap (randomized balance).

1. The Idea of Balance

For a binary search tree (BST):

\[ \text{height} = O(\log n) \]

only if it’s balanced , meaning the number of nodes in left and right subtrees differ by a small factor.

Unbalanced BST (bad):

Balanced BST (good):

  2
 / \
1   3

Balance ensures efficient:

search(x) → (O$\log n$)- insert(x) → (O$\log n$)- delete(x) → (O$\log n$)

2. AVL Tree (Adelson-Velsky & Landis)

Invented in 1962, AVL is the first self-balancing BST. It enforces strict balance: \[ | \text{height(left)} - \text{height(right)} | \le 1 \]

Whenever this condition breaks, rotations fix it.

Rotations:

LL (Right Rotation): imbalance on left-left- RR (Left Rotation): imbalance on right-right- LR / RL: double rotation cases Code (Rotation Example):

Node* rotateRight(Node* y) {
    Node* x = y->left;
    Node* T = x->right;
    x->right = y;
    y->left = T;
    return x;
}

Height & Balance Factor:

int height(Node* n) { return n ? n->h : 0; }
int balance(Node* n) { return height(n->left) - height(n->right); }

Properties:

Strict height bound: (O$\log n$)- More rotations (slower insertions)- Excellent lookup speed Used when lookups > updates (databases, indexing).

3. Red-Black Tree

Idea: A slightly looser balance for faster insertions. Each node has a color (Red/Black) with these rules:

Root is black
Red node’s children are black
Every path has same number of black nodes
Null nodes are black

Balance through color flips + rotations

Compared to AVL:

Fewer rotations (faster insert/delete)- Slightly taller (slower lookup)- Simpler amortized balance Used in:
C++ std::map, std::set- Java TreeMap, Linux scheduler Complexity: All major operations (O$\log n$)

4. Splay Tree

Idea: Bring recently accessed node to root via splaying (rotations). It adapts to access patterns , the more you access a key, the faster it becomes.

Splaying Steps:

Zig: one rotation (root child)- Zig-Zig: two rotations (same side)- Zig-Zag: two rotations (different sides) Code (Conceptual):

Node* splay(Node* root, int key) {
    if (!root || root->key == key) return root;
    if (key < root->key) {
        if (!root->left) return root;
        // splay in left subtree
        if (key < root->left->key)
            root->left->left = splay(root->left->left, key),
            root = rotateRight(root);
        else if (key > root->left->key)
            root->left->right = splay(root->left->right, key),
            root->left = rotateLeft(root->left);
        return rotateRight(root);
    } else {
        if (!root->right) return root;
        // symmetric
    }
}

Why It’s Cool: No strict balance, but amortized (O$\log n$). Frequently accessed elements stay near top.

Used in self-adjusting caches, rope data structures, memory allocators.

5. Treap (Tree + Heap)

Idea: Each node has two keys:

BST key → order property- Priority → heap property Insertion = normal BST insert + heap fix via rotation.

Balance comes from randomization , random priorities ensure expected (O$\log n$) height.

Code Sketch:

typedef struct Node {
    int key, priority;
    struct Node *left, *right;
} Node;

Node* insert(Node* root, int key) {
    if (!root) return newNode(key, rand());
    if (key < root->key) root->left = insert(root->left, key);
    else root->right = insert(root->right, key);

    if (root->left && root->left->priority > root->priority)
        root = rotateRight(root);
    if (root->right && root->right->priority > root->priority)
        root = rotateLeft(root);
    return root;
}

Advantages:

Simple logic- Random balancing- Expected (O$\log n$) Used in randomized algorithms and functional programming.

6. Comparison

Tree	Balance Type	Rotations	Height	Insert/Delete	Lookup	Notes
AVL	Strict	More	(O$\log n$)	Medium	Fast	Lookup-heavy
Red-Black	Relaxed	Fewer	(O$\log n$)	Fast	Medium	Library std
Splay	Adaptive	Variable	Amortized (O$\log n$)	Fast	Fast (amortized)	Access patterns
Treap	Randomized	Avg few	(O$\log n$) expected	Simple	Simple	Probabilistic

Tiny Code

AVL Insert (Skeleton):

Node* insert(Node* root, int key) {
    if (!root) return newNode(key);
    if (key < root->key) root->left = insert(root->left, key);
    else root->right = insert(root->right, key);
    root->h = 1 + max(height(root->left), height(root->right));
    int b = balance(root);
    if (b > 1 && key < root->left->key) return rotateRight(root);
    if (b < -1 && key > root->right->key) return rotateLeft(root);
    // other cases...
    return root;
}

Why It Matters

Balanced trees guarantee predictable performance under dynamic updates. Each variant represents a philosophy:

AVL: precision- Red-Black: practicality- Splay: adaptability- Treap: randomness Together, they teach one core idea , keep height in check, no matter the operations.

Try It Yourself

Implement an AVL tree and visualize rotations.
Insert keys [10, 20, 30, 40, 50] and trace Red-Black color changes.
Splay after each access , see which keys stay near top.
Build a Treap with random priorities , measure average height.
Compare performance of BST vs AVL on sorted input.

Balanced trees are the architects of order , always keeping chaos one rotation away.

25. Segment Trees and Fenwick Trees

When you need to answer range queries quickly (like sum, min, max) and support updates to individual elements, simple prefix sums won’t cut it anymore.

You need something smarter , data structures that can divide and conquer over ranges, updating and combining results efficiently.

That’s exactly what Segment Trees and Fenwick Trees (Binary Indexed Trees) do:

Query over a range in (O$\log n$)- Update elements in (O$\log n$) They’re the backbone of competitive programming, signal processing, and database analytics.

1. The Problem

Given an array A[0..n-1], support:

update(i, x) → change A[i] to x
query(L, R) → compute sum (or min, max) of A[L..R]

Naive approach:

Update: (O(1))- Query: (O(n)) Prefix sums fix one but not both. Segment and Fenwick trees fix both.

2. Segment Tree

Idea: Divide the array into segments (intervals) recursively. Each node stores an aggregate (sum, min, max) of its range. You can combine child nodes to get any range result.

Structure (Sum Example):

           [0,7] sum=36
         /           \
   [0,3]=10         [4,7]=26
   /     \           /      \
[0,1]=3 [2,3]=7  [4,5]=11  [6,7]=15

Each node represents a range [L,R]. Leaf nodes = single elements.

A. Build

Recursive Construction: Time: (O(n))

void build(int node, int L, int R) {
    if (L == R) tree[node] = arr[L];
    else {
        int mid = (L + R) / 2;
        build(2*node, L, mid);
        build(2*node+1, mid+1, R);
        tree[node] = tree[2*node] + tree[2*node+1];
    }
}

B. Query (Range Sum)

Query [l, r] recursively:

If current range [L, R] fully inside [l, r], return node value- If disjoint, return 0- Else combine children

int query(int node, int L, int R, int l, int r) {
    if (r < L || R < l) return 0;
    if (l <= L && R <= r) return tree[node];
    int mid = (L + R) / 2;
    return query(2*node, L, mid, l, r)
         + query(2*node+1, mid+1, R, l, r);
}

C. Update

Change arr[i] = x and update tree nodes covering i.

void update(int node, int L, int R, int i, int x) {
    if (L == R) tree[node] = x;
    else {
        int mid = (L + R)/2;
        if (i <= mid) update(2*node, L, mid, i, x);
        else update(2*node+1, mid+1, R, i, x);
        tree[node] = tree[2*node] + tree[2*node+1];
    }
}

Complexities:

Build: (O(n))- Query: (O$\log n$)- Update: (O$\log n$)- Space: (O(4n))

D. Variants

Segment trees are flexible:

Range minimum/maximum- Range GCD- Lazy propagation → range updates- 2D segment tree for grids

3. Fenwick Tree (Binary Indexed Tree)

Idea: Stores cumulative frequencies using bit manipulation. Each node covers a range size = LSB(index).

Simpler, smaller, but supports only associative ops (sum, xor, etc.)

Indexing:

Parent: i + (i & -i)- Child: i - (i & -i) Build: Initialize with zero, then add elements one by one.

Add / Update:

void add(int i, int x) {
    for (; i <= n; i += i & -i)
        bit[i] += x;
}

Prefix Sum:

int sum(int i) {
    int res = 0;
    for (; i > 0; i -= i & -i)
        res += bit[i];
    return res;
}

Range Sum [L, R]: \[ \text{sum}(R) - \text{sum}(L-1) \]

Complexities:

Build: (O$n \log n$)- Query: (O$\log n$)- Update: (O$\log n$)- Space: (O(n))

4. Comparison

Feature	Segment Tree	Fenwick Tree
Space	O(4n)	O(n)
Build	O(n)	O(n log n)
Query	O(log n)	O(log n)
Update	O(log n)	O(log n)
Range Update	With Lazy	Tricky
Range Query	Flexible	Sum/XOR only
Implementation	Moderate	Simple

5. Applications

Sum / Min / Max / XOR queries- Frequency counts- Inversions counting- Order statistics- Online problems where array updates over time Used in:
Competitive programming- Databases (analytics on changing data)- Time series queries- Games (damage/range updates)

Tiny Code

Fenwick Tree Example:

int bit[1001], n;

void update(int i, int val) {
    for (; i <= n; i += i & -i)
        bit[i] += val;
}

int query(int i) {
    int res = 0;
    for (; i > 0; i -= i & -i)
        res += bit[i];
    return res;
}

// range sum
int range_sum(int L, int R) { return query(R) - query(L - 1); }

Why It Matters

Segment and Fenwick trees embody divide-and-conquer over data , balancing dynamic updates with range queries. They’re how modern systems aggregate live data efficiently.

They teach a powerful mindset:

“If you can split a problem, you can solve it fast.”

Try It Yourself

Build a segment tree for sum queries.
Add range minimum queries (RMQ).
Implement a Fenwick tree , test with prefix sums.
Solve: number of inversions in array using Fenwick tree.
Add lazy propagation to segment tree for range updates.

Once you master these, range queries will never scare you again , you’ll slice through them in logarithmic time.

26. Disjoint Set Union (Union-Find)

Many problems involve grouping elements into sets and efficiently checking whether two elements belong to the same group , like connected components in a graph, network connectivity, Kruskal’s MST, or even social network clustering.

For these, the go-to structure is the Disjoint Set Union (DSU), also called Union-Find. It efficiently supports two operations:

find(x) → which set does x belong to?
union(x, y) → merge the sets containing x and y.

With path compression and union by rank, both operations run in near-constant time, specifically (O((n))), where $\alpha$ is the inverse Ackermann function (practically ≤ 4).

1. The Problem

Suppose you have (n) elements initially in separate sets. Over time, you want to:

Merge two sets- Check if two elements share the same set Example:

Sets: {1}, {2}, {3}, {4}, {5}
Union(1,2) → {1,2}, {3}, {4}, {5}
Union(3,4) → {1,2}, {3,4}, {5}
Find(2) == Find(1)? Yes
Find(5) == Find(3)? No

2. Basic Implementation

Each element has a parent pointer. Initially, every node is its own parent.

Parent array representation:

int parent[N];

void make_set(int v) {
    parent[v] = v;
}

int find(int v) {
    if (v == parent[v]) return v;
    return find(parent[v]);
}

void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b)
        parent[b] = a;
}

This works, but deep trees can form , making find slow. We fix that with path compression.

3. Path Compression

Every time we call find(v), we make all nodes along the path point directly to the root. This flattens the tree dramatically.

Optimized Find:

int find(int v) {
    if (v == parent[v]) return v;
    return parent[v] = find(parent[v]);
}

So next time, lookups will be (O(1)) for those nodes.

4. Union by Rank / Size

When merging, always attach the smaller tree to the larger to keep depth small.

Union by Rank:

int parent[N], rank[N];

void make_set(int v) {
    parent[v] = v;
    rank[v] = 0;
}

void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (rank[a] < rank[b])
            swap(a, b);
        parent[b] = a;
        if (rank[a] == rank[b])
            rank[a]++;
    }
}

Union by Size (Alternative): Track size of each set and attach smaller to larger.

int size[N];
void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (size[a] < size[b]) swap(a, b);
        parent[b] = a;
        size[a] += size[b];
    }
}

5. Complexity

With both path compression and union by rank, all operations are effectively constant time: \[ O(\alpha(n)) \approx O(1) \]

For all practical (n), ((n) ).

Operation	Time
Make set	O(1)
Find	O(α(n))
Union	O(α(n))

6. Applications

Graph Connectivity: determine connected components- Kruskal’s MST: add edges, avoid cycles- Dynamic connectivity- Image segmentation- Network clustering- Cycle detection in undirected graphs Example: Kruskal’s Algorithm

sort(edges.begin(), edges.end());
for (edge e : edges)
    if (find(e.u) != find(e.v)) {
        union_sets(e.u, e.v);
        mst_weight += e.w;
    }

7. Example

int parent[6], rank[6];

void init() {
    for (int i = 1; i <= 5; i++) {
        parent[i] = i;
        rank[i] = 0;
    }
}

int main() {
    init();
    union_sets(1, 2);
    union_sets(3, 4);
    union_sets(2, 3);
    printf("%d\n", find(4)); // prints representative of {1,2,3,4}
}

Result: {1,2,3,4}, {5}

8. Visualization

Before compression:
1
 \
  2
   \
    3

After compression:
1
├─2
└─3

Every find call makes future queries faster.

9. Comparison

Variant	Find	Union	Notes
Basic	O(n)	O(n)	Deep trees
Path Compression	O(α(n))	O(α(n))	Very fast
+ Rank / Size	O(α(n))	O(α(n))	Balanced
Persistent DSU	O(log n)	O(log n)	Undo/rollback support

Tiny Code

Full DSU with path compression + rank:

int parent[1000], rank[1000];

void make_set(int v) {
    parent[v] = v;
    rank[v] = 0;
}

int find(int v) {
    if (v != parent[v])
        parent[v] = find(parent[v]);
    return parent[v];
}

void union_sets(int a, int b) {
    a = find(a);
    b = find(b);
    if (a != b) {
        if (rank[a] < rank[b]) swap(a, b);
        parent[b] = a;
        if (rank[a] == rank[b])
            rank[a]++;
    }
}

Why It Matters

Union-Find embodies structural sharing and lazy optimization , you don’t balance eagerly, but just enough. It’s one of the most elegant demonstrations of how constant-time algorithms are possible through clever organization.

It teaches a key algorithmic lesson:

“Work only when necessary, and fix structure as you go.”

Try It Yourself

Implement DSU and test find/union.
Build a program that counts connected components.
Solve Kruskal’s MST using DSU.
Add get_size(v) to return component size.
Try rollback DSU (keep stack of changes).

Union-Find is the quiet powerhouse behind many graph and connectivity algorithms , simple, fast, and deeply elegant.

27. Probabilistic Data Structures (Bloom, Count-Min, HyperLogLog)

When you work with massive data streams , billions of elements, too big for memory , you can’t store everything. But what if you don’t need perfect answers, just fast and tiny approximate ones?

That’s where probabilistic data structures shine. They trade a bit of accuracy for huge space savings and constant-time operations.

In this section, we’ll explore three of the most famous:

Bloom Filters → membership queries- Count-Min Sketch → frequency estimation- HyperLogLog → cardinality estimation Each of them answers “How likely is X?” or “How many?” efficiently , perfect for modern analytics, caching, and streaming systems.

1. Bloom Filter , “Is this element probably in the set?”

A Bloom filter answers:

“Is x in the set?” with either maybe yes or definitely no.

No false negatives, but some false positives.

A. Idea

Use an array of bits (size m), all initialized to 0. Use k different hash functions.

To insert an element:

Compute k hashes: ( h_1(x), h_2(x), , h_k(x) )
Set each bit position $b_i = 1$

To query an element:

Compute same k hashes
If all bits are 1 → maybe yes
If any bit is 0 → definitely no

B. Example

Insert dog:

(h_1(dog)=2, h_2(dog)=5, h_3(dog)=9) Set bits 2, 5, 9 → 1

Check cat:

If any hash bit = 0 → not present

C. Complexity

Operation	Time	Space	Accuracy
Insert	O(k)	O(m)	Tunable
Query	O(k)	O(m)	False positives

False positive rate ≈ ( $1 - e^{-kn/m}$^k )

Choose m and k based on expected n and acceptable error.

D. Code

#define M 1000
int bitset[M];

int hash1(int x) { return (x * 17) % M; }
int hash2(int x) { return (x * 31 + 7) % M; }

void add(int x) {
    bitset[hash1(x)] = 1;
    bitset[hash2(x)] = 1;
}

bool contains(int x) {
    return bitset[hash1(x)] && bitset[hash2(x)];
}

Used in:

Caches (check before disk lookup)- Spam filters- Databases (join filtering)- Blockchain and peer-to-peer networks

2. Count-Min Sketch , “How often has this appeared?”

Tracks frequency counts in a stream, using sub-linear memory.

Instead of a full table, it uses a 2D array of counters, each row hashed with a different hash function.

A. Insert

For each row i:

Compute hash (h_i(x))- Increment count[i][h_i(x)]++ #### B. Query

For element x:

Compute all (h_i(x))- Take min(count[i][h_i(x)]) across rows → gives an upper-bounded estimate of true frequency

C. Code

#define W 1000
#define D 5
int count[D][W];

int hash(int i, int x) {
    return (x * (17*i + 3)) % W;
}

void add(int x) {
    for (int i = 0; i < D; i++)
        count[i][hash(i, x)]++;
}

int query(int x) {
    int res = INT_MAX;
    for (int i = 0; i < D; i++)
        res = min(res, count[i][hash(i, x)]);
    return res;
}

D. Complexity

Operation	Time	Space
Insert	O(D)	O(W×D)
Query	O(D)	O(W×D)

Error controlled by: \[ \varepsilon = \frac{1}{W}, \quad \delta = 1 - e^{-D} \]

Used in:

Frequency counting in streams- Hot-key detection- Network flow analysis- Trending topics

3. HyperLogLog , “How many unique items?”

Estimates cardinality (number of distinct elements) with very small memory (~1.5 KB for millions).

A. Idea

Hash each element uniformly → 32-bit value. Split hash into:

Prefix bits → bucket index- Suffix bits → count leading zeros Each bucket stores the max leading zero count seen. At the end, use harmonic mean of counts to estimate distinct values.

B. Formula

\[ E = \alpha_m \cdot m^2 \cdot \Big(\sum_{i=1}^m 2^{-M[i]}\Big)^{-1} \]

where M[i] is the zero count in bucket i, and $\alpha_m$ is a correction constant.

Accuracy: ~1.04 / √m

C. Complexity

Operation	Time	Space	Error
Add	O(1)	O(m)	~1.04/√m
Merge	O(m)	O(m)	,

Used in:

Web analytics (unique visitors)- Databases (COUNT DISTINCT)- Distributed systems (mergeable estimates)

4. Comparison

Structure	Purpose	Query	Memory	Error	Notes
Bloom	Membership	O(k)	Tiny	False positives	No deletions
Count-Min	Frequency	O(D)	Small	Overestimate	Streaming counts
HyperLogLog	Cardinality	O(1)	Very small	~1%	Mergeable

Tiny Code

Bloom Filter Demo:

add(42);
add(17);
printf("%d\n", contains(42)); // 1 (maybe yes)
printf("%d\n", contains(99)); // 0 (definitely no)

Why It Matters

Probabilistic data structures show how approximation beats impossibility when resources are tight. They make it feasible to process massive streams in real time, when storing everything is impossible.

They teach a deeper algorithmic truth:

“A bit of uncertainty can buy you a world of scalability.”

Try It Yourself

Implement a Bloom filter with 3 hash functions.
Measure false positive rate for 10K elements.
Build a Count-Min Sketch and test frequency estimation.
Approximate unique elements using HyperLogLog logic.
Explore real-world systems: Redis (Bloom/CM Sketch), PostgreSQL (HyperLogLog).

These tiny probabilistic tools are how big data becomes tractable.

28. Skip Lists and B-Trees

When you want fast search, insert, and delete but need a structure that’s easier to code than trees or optimized for disk and memory blocks, two clever ideas step in:

Skip Lists → randomized, layered linked lists that behave like balanced BSTs- B-Trees → multi-way trees that minimize disk I/O and organize large data blocks Both guarantee (O$\log n$) operations, but they shine in very different environments , Skip Lists in-memory, B-Trees on disk.

1. Skip Lists

Invented by: William Pugh (1990) Goal: Simulate binary search using linked lists with probabilistic shortcuts.

A. Idea

A skip list is a stack of linked lists, each level skipping over more elements.

Example:

Level 3:        ┌───────> 50 ───────┐
Level 2:   ┌──> 10 ─────> 30 ─────> 50 ───┐
Level 1:  5 ──> 10 ──> 20 ──> 30 ──> 40 ──> 50

Higher levels are sparser and let you “skip” large chunks of the list.

You search top-down:

Move right while next ≤ target- Drop down when you can’t go further This mimics binary search , logarithmic layers, logarithmic hops.

B. Construction

Each inserted element is given a random height, with geometric distribution:

Level 1 (base) always exists- Level 2 with probability ½- Level 3 with ¼, etc. Expected total nodes = 2n, Expected height = (O$\log n$)

C. Operations

Operation	Time	Space	Notes
Search	(O$\log n$)	O(n)	Randomized balance
Insert	(O$\log n$)	O(n)	Rebuild towers
Delete	(O$\log n$)	O(n)	Rewire pointers

Search Algorithm:

Node* search(SkipList* sl, int key) {
    Node* cur = sl->head;
    for (int lvl = sl->level; lvl >= 0; lvl--) {
        while (cur->forward[lvl] && cur->forward[lvl]->key < key)
            cur = cur->forward[lvl];
    }
    cur = cur->forward[0];
    if (cur && cur->key == key) return cur;
    return NULL;
}

Skip Lists are simple, fast, and probabilistically balanced , no rotations, no rebalancing.

D. Why Use Skip Lists?

Easier to implement than balanced trees- Support concurrent access well- Randomized, not deterministic , but highly reliable Used in:
Redis (sorted sets)- LevelDB / RocksDB internals- Concurrent maps

2. B-Trees

Invented by: Rudolf Bayer & Ed McCreight (1972) Goal: Reduce disk access by grouping data in blocks.

A B-Tree is a generalization of a BST:

Each node holds multiple keys and children- Keys are kept sorted- Child subtrees span ranges between keys

A. Structure

A B-Tree of order m:

Each node has ≤ m children- Each internal node has k-1 keys if it has k children- All leaves at the same depth Example (order 3):

        [17 | 35]
       /    |     \
 [5 10] [20 25 30] [40 45 50]

B. Operations

Search
- Traverse from root - Binary search in each node’s key array - Follow appropriate child → (O$\log_m n$)
Insert
- Insert in leaf - If overflow → split node - Promote median key to parent
Delete
- Borrow or merge if node underflows Each split or merge keeps height minimal.

C. Complexity

Operation	Time	Disk Accesses	Notes
Search	(O$\log_m n$)	(O$\log_m n$)	m = branching factor
Insert	(O$\log_m n$)	(O(1)) splits	Balanced
Delete	(O$\log_m n$)	(O(1)) merges	Balanced

Height ≈ $\log_m n$ → very shallow when (m) large (e.g. 100).

D. B+ Tree Variant

In B+ Trees:

All data in leaves (internal nodes = indexes)- Leaves linked → efficient range queries Used in:
Databases (MySQL, PostgreSQL)- File systems (NTFS, HFS+)- Key-value stores

E. Example Flow

Insert 25:

[10 | 20 | 30] → overflow
Split → [10] [30]
Promote 20
Root: [20]

3. Comparison

Feature	Skip List	B-Tree
Balancing	Randomized	Deterministic
Fanout	2 (linked)	m-way
Environment	In-memory	Disk-based
Search	O(log n)	O$log_m n$
Insert/Delete	O(log n)	O$log_m n$
Concurrency	Easy	Complex
Range Queries	Sequential scan	Linked leaves (B+)

Tiny Code

Skip List Search (Conceptual):

Node* search(SkipList* list, int key) {
    Node* cur = list->head;
    for (int lvl = list->level; lvl >= 0; lvl--) {
        while (cur->next[lvl] && cur->next[lvl]->key < key)
            cur = cur->next[lvl];
    }
    cur = cur->next[0];
    return (cur && cur->key == key) ? cur : NULL;
}

B-Tree Node (Skeleton):

#define M 4
typedef struct {
    int keys[M-1];
    Node* child[M];
    int n;
} Node;

Why It Matters

Skip Lists and B-Trees show two paths to balance:

Randomized simplicity (Skip List)- Block-based order (B-Tree) Both offer logarithmic guarantees, but one optimizes pointer chasing, the other I/O.

They’re fundamental to:

In-memory caches (Skip List)- On-disk indexes (B-Tree, B+ Tree)- Sorted data structures across systems

Try It Yourself

Build a basic skip list and insert random keys.
Trace a search path across levels.
Implement B-Tree insert and split logic.
Compare height of BST vs B-Tree for 1,000 keys.
Explore how Redis and MySQL use these internally.

Together, they form the bridge between linked lists and balanced trees, uniting speed, structure, and scalability.

29. Persistent and Functional Data Structures

Most data structures are ephemeral , when you update them, the old version disappears. But sometimes, you want to keep all past versions, so you can go back in time, undo operations, or run concurrent reads safely.

That’s the magic of persistent data structures: every update creates a new version while sharing most of the old structure.

This section introduces the idea of persistence, explores how to make classic structures like arrays and trees persistent, and explains why functional programming loves them.

1. What Is Persistence?

A persistent data structure preserves previous versions after updates. You can access any version , past or present , without side effects.

Three levels:

Type	Description	Example
Partial	Can access past versions, but only modify the latest	Undo stack
Full	Can access and modify any version	Immutable map
Confluent	Can combine different versions	Git-like merges

This is essential in functional programming, undo systems, version control, persistent segment trees, and immutable databases.

2. Ephemeral vs Persistent

Ephemeral:

arr[2] = 7; // old value lost forever

Persistent:

new_arr = update(arr, 2, 7); // old_arr still exists

Persistent structures use structural sharing , unchanged parts are reused, not copied.

3. Persistent Linked List

Easiest example: each update creates a new head, reusing the tail.

struct Node { int val; Node* next; };

Node* push(Node* head, int x) {
    Node* newHead = malloc(sizeof(Node));
    newHead->val = x;
    newHead->next = head;
    return newHead;
}

Now both old_head and new_head coexist. Each version is immutable , you never change existing nodes.

Access: old and new lists share most of their structure:

v0: 1 → 2 → 3
v1: 0 → 1 → 2 → 3

Only one new node was created.

4. Persistent Binary Tree

For trees, updates create new paths from the root to the modified node, reusing the rest.

typedef struct Node {
    int key;
    struct Node *left, *right;
} Node;

Node* update(Node* root, int pos, int val) {
    if (!root) return newNode(val);
    Node* node = malloc(sizeof(Node));
    *node = *root; // copy
    if (pos < root->key) node->left = update(root->left, pos, val);
    else node->right = update(root->right, pos, val);
    return node;
}

Each update creates a new version , only (O$\log n$) new nodes per change.

This is the core of persistent segment trees used in competitive programming.

5. Persistent Array (Functional Trick)

Arrays are trickier because of random access. Solutions:

Use balanced binary trees as array replacements- Each update replaces one node- Persistent vector = tree of small arrays (used in Clojure, Scala) This gives:
Access: (O$\log n$)- Update: (O$\log n$)- Space: (O$\log n$) per update

6. Persistent Segment Tree

Used for versioned range queries:

Each update = new root- Each version = snapshot of history Example: Track how array changes over time, query “sum in range [L,R] at version t”.

Build:

Node* build(int L, int R) {
    if (L == R) return newNode(arr[L]);
    int mid = (L+R)/2;
    return newNode(
        build(L, mid),
        build(mid+1, R),
        sum
    );
}

Update: only (O$\log n$) new nodes

Node* update(Node* prev, int L, int R, int pos, int val) {
    if (L == R) return newNode(val);
    int mid = (L+R)/2;
    if (pos <= mid)
        return newNode(update(prev->left, L, mid, pos, val), prev->right);
    else
        return newNode(prev->left, update(prev->right, mid+1, R, pos, val));
}

Each version = new root; old ones still valid.

7. Functional Perspective

In functional programming, data is immutable by default. Instead of mutating, you create a new version.

This allows:

Thread-safety (no races)- Time-travel debugging- Undo/redo systems- Concurrency without locks Languages like Haskell, Clojure, and Elm build everything this way.

For example, Clojure’s persistent vector uses path copying and branching factor 32 for (O$\log_{32} n$) access.

8. Applications

Undo / Redo stacks (text editors, IDEs)- Version control (Git trees)- Immutable databases (Datomic)- Segment trees over time (competitive programming)- Snapshots in memory allocators or games

9. Complexity

Structure	Update	Access	Space per Update	Notes
Persistent Linked List	O(1)	O(1)	O(1)	Simple sharing
Persistent Tree	O(log n)	O(log n)	O(log n)	Path copying
Persistent Array	O(log n)	O(log n)	O(log n)	Tree-backed
Persistent Segment Tree	O(log n)	O(log n)	O(log n)	Versioned queries

Tiny Code

Persistent Linked List Example:

Node* v0 = NULL;
v0 = push(v0, 3);
v0 = push(v0, 2);
Node* v1 = push(v0, 1);
// v0 = [2,3], v1 = [1,2,3]

Why It Matters

Persistence is about time as a first-class citizen. It lets you:

Roll back- Compare versions- Work immutably and safely It’s the algorithmic foundation behind functional programming, time-travel debugging, and immutable data systems.

It teaches this powerful idea:

“Never destroy , always build upon what was.”

Try It Yourself

Implement a persistent stack using linked lists.
Write a persistent segment tree for range sums.
Track array versions after each update and query old states.
Compare space/time with an ephemeral one.
Explore persistent structures in Clojure (conj, assoc) or Rust (im crate).

Persistence transforms data from fleeting state into a history you can navigate , a timeline of structure and meaning.

30. Advanced Trees and Range Queries

So far, you’ve seen balanced trees (AVL, Red-Black, Treap) and segment-based structures (Segment Trees, Fenwick Trees). Now it’s time to combine those ideas and step into advanced trees , data structures that handle dynamic sets, order statistics, intervals, ranges, and geometry-like queries in logarithmic time.

This chapter is about trees that go beyond search , they store order, track ranges, and answer complex queries efficiently.

We’ll explore:

Order Statistic Trees (k-th element, rank queries)- Interval Trees (range overlaps)- Range Trees (multi-dimensional search)- KD-Trees (spatial partitioning)- Merge Sort Trees (offline range queries)

1. Order Statistic Tree

Goal: find the k-th smallest element, or the rank of an element, in (O$\log n$).

Built on top of a balanced BST (e.g. Red-Black) by storing subtree sizes.

A. Augmented Tree Nodes

Each node keeps:

key: element value- left, right: children- size: number of nodes in subtree

typedef struct Node {
    int key, size;
    struct Node *left, *right;
} Node;

Whenever you rotate or insert, update size:

int get_size(Node* n) { return n ? n->size : 0; }
void update_size(Node* n) {
    if (n) n->size = get_size(n->left) + get_size(n->right) + 1;
}

B. Find k-th Element

Recursively use subtree sizes:

Node* kth(Node* root, int k) {
    int left = get_size(root->left);
    if (k == left + 1) return root;
    else if (k <= left) return kth(root->left, k);
    else return kth(root->right, k - left - 1);
}

Time: (O$\log n$)

C. Find Rank

Find position of a key (number of smaller elements):

int rank(Node* root, int key) {
    if (!root) return 0;
    if (key < root->key) return rank(root->left, key);
    if (key > root->key) return get_size(root->left) + 1 + rank(root->right, key);
    return get_size(root->left) + 1;
}

Used in:

Databases (ORDER BY, pagination)- Quantile queries- Online median maintenance

2. Interval Tree

Goal: find all intervals overlapping with a given point or range.

Used in computational geometry, scheduling, and genomic data.

A. Structure

BST ordered by interval low endpoint. Each node stores:

low, high: interval bounds- max: maximum high in its subtree

typedef struct {
    int low, high, max;
    struct Node *left, *right;
} Node;

B. Query Overlap

Check if x overlaps node->interval: If not, go left or right based on max values.

bool overlap(Interval a, Interval b) {
    return a.low <= b.high && b.low <= a.high;
}

Node* overlap_search(Node* root, Interval q) {
    if (!root) return NULL;
    if (overlap(root->interval, q)) return root;
    if (root->left && root->left->max >= q.low)
        return overlap_search(root->left, q);
    return overlap_search(root->right, q);
}

Time: (O$\log n$) average

C. Use Cases

Calendar/schedule conflict detection- Collision detection- Genome region lookup- Segment intersection

3. Range Tree

Goal: multi-dimensional queries like

“How many points fall inside rectangle [x1, x2] × [y1, y2]?”

Structure:

Primary BST on x- Each node stores secondary BST on y Query time: (O$\log^2 n$) Space: (O$n \log n$)

Used in:

2D search- Computational geometry- Databases (spatial joins)

4. KD-Tree

Goal: efficiently search points in k-dimensional space.

Alternate splitting dimensions at each level:

Level 0 → split by x- Level 1 → split by y- Level 2 → split by z Each node stores:
Point (vector)- Split axis Used for:
Nearest neighbor search- Range queries- ML (k-NN classifiers) Time:
Build: (O$n \log n$)- Query: (O$\sqrt{n}$) average in 2D

5. Merge Sort Tree

Goal: query “number of elements ≤ k in range [L, R]”

Built like a segment tree, but each node stores a sorted list of its range.

Build: merge children lists Query: binary search in node lists

Time:

Build: (O$n \log n$)- Query: (O$\log^2 n$) Used in offline queries and order-statistics over ranges.

6. Comparison

Tree Type	Use Case	Query	Update	Notes
Order Statistic	k-th, rank	O(log n)	O(log n)	Augmented BST
Interval	Overlaps	O(log n + k)	O(log n)	Store intervals
Range Tree	2D range	O(log² n + k)	O(log² n)	Multi-dim
KD-Tree	Spatial	O(√n) avg	O(log n)	Nearest neighbor
Merge Sort Tree	Offline rank	O(log² n)	Static	Built from sorted segments

Tiny Code

Order Statistic Example:

Node* root = NULL;
root = insert(root, 10);
root = insert(root, 20);
root = insert(root, 30);
printf("%d", kth(root, 2)->key); // 20

Interval Query:

Interval q = {15, 17};
Node* res = overlap_search(root, q);
if (res) printf("Overlap: [%d, %d]\n", res->low, res->high);

Why It Matters

These trees extend balance into dimensions and ranges. They let you query ordered data efficiently: “How many?”, “Which overlaps?”, “Where is k-th smallest?”.

They teach a deeper design principle:

“Augment structure with knowledge , balance plus metadata equals power.”

Try It Yourself

Implement an order statistic tree , test rank/k-th queries.
Insert intervals and test overlap detection.
Build a simple KD-tree for 2D points.
Solve rectangle counting with a range tree.
Precompute a merge sort tree for offline queries.

These advanced trees form the final evolution of structured queries , blending geometry, order, and logarithmic precision.

Chapter 4. Graph Algorithms

31. Traversals (DFS, BFS, Iterative Deepening)

Graphs are everywhere , maps, networks, dependencies, state spaces. Before you can analyze them, you need a way to visit their vertices , systematically, without getting lost or looping forever.

That’s where graph traversals come in. They’re the foundation for everything that follows: connected components, shortest paths, spanning trees, topological sorts, and more.

This section walks through the three pillars:

DFS (Depth-First Search) , explore deeply before backtracking- BFS (Breadth-First Search) , explore level by level- Iterative Deepening , a memory-friendly hybrid

1. Representing Graphs

Before traversal, you need a good structure.

Adjacency List (most common):

#define MAX 1000
vector<int> adj[MAX];

Add edges:

void add_edge(int u, int v) {
    adj[u].push_back(v);
    adj[v].push_back(u); // omit if directed
}

Track visited vertices:

bool visited[MAX];

2. Depth-First Search (DFS)

DFS dives deep, following one branch fully before exploring others. It’s recursive, like exploring a maze by always turning left until you hit a wall.

A. Recursive Form

void dfs(int u) {
    visited[u] = true;
    for (int v : adj[u]) {
        if (!visited[v])
            dfs(v);
    }
}

Start it:

dfs(start_node);

B. Iterative Form (with Stack)

void dfs_iter(int start) {
    stack<int> s;
    s.push(start);
    while (!s.empty()) {
        int u = s.top(); s.pop();
        if (visited[u]) continue;
        visited[u] = true;
        for (int v : adj[u]) s.push(v);
    }
}

C. Complexity

Graph Type	Time	Space
Adjacency List	O(V + E)	O(V)

DFS is used in:

Connected components- Cycle detection- Topological sort- Backtracking & search- Articulation points / bridges

3. Breadth-First Search (BFS)

BFS explores neighbors first , it’s like expanding in waves. This guarantees shortest path in unweighted graphs.

A. BFS with Queue

void bfs(int start) {
    queue<int> q;
    q.push(start);
    visited[start] = true;
    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (int v : adj[u]) {
            if (!visited[v]) {
                visited[v] = true;
                q.push(v);
            }
        }
    }
}

B. Track Distance

int dist[MAX];
void bfs_dist(int s) {
    fill(dist, dist + MAX, -1);
    dist[s] = 0;
    queue<int> q; q.push(s);
    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (int v : adj[u]) {
            if (dist[v] == -1) {
                dist[v] = dist[u] + 1;
                q.push(v);
            }
        }
    }
}

Now dist[v] gives shortest distance from s.

C. Complexity

Same as DFS:

Time	Space
O(V + E)	O(V)

Used in:

Shortest paths (unweighted)- Level-order traversal- Bipartite check- Connected components

4. Iterative Deepening Search (IDS)

DFS is memory-light but might go too deep. BFS is optimal but can use a lot of memory. Iterative Deepening Search (IDS) combines both.

It performs DFS with increasing depth limits:

bool dls(int u, int target, int depth) {
    if (u == target) return true;
    if (depth == 0) return false;
    for (int v : adj[u])
        if (dls(v, target, depth - 1)) return true;
    return false;
}

bool ids(int start, int target, int max_depth) {
    for (int d = 0; d <= max_depth; d++)
        if (dls(start, target, d)) return true;
    return false;
}

Used in:

AI search problems (state spaces)- Game trees (chess, puzzles)

5. Traversal Order Examples

For a graph:

1 - 2 - 3
|   |
4 - 5

DFS (starting at 1): 1 → 2 → 3 → 5 → 4 BFS (starting at 1): 1 → 2 → 4 → 3 → 5

6. Directed vs Undirected

Undirected: mark both directions- Directed: follow edge direction only DFS on directed graphs is core to:
SCC (Strongly Connected Components)- Topological Sorting- Reachability analysis

7. Traversal Trees

Each traversal implicitly builds a spanning tree:

DFS Tree: based on recursion- BFS Tree: based on levels Use them to:
Detect cross edges, back edges- Classify edges (important for algorithms like Tarjan’s)

8. Comparison

Aspect	DFS	BFS
Strategy	Deep first	Level-wise
Space	O(V) (stack)	O(V) (queue)
Path Optimality	Not guaranteed	Yes (unweighted)
Applications	Cycle detection, backtracking	Shortest path, level order

Tiny Code

DFS + BFS Combo:

void traverse(int n) {
    for (int i = 1; i <= n; i++) visited[i] = false;
    dfs(1);
    for (int i = 1; i <= n; i++) visited[i] = false;
    bfs(1);
}

Why It Matters

DFS and BFS are the roots of graph theory in practice. Every algorithm you’ll meet later , shortest paths, flows, SCCs , builds upon them.

They teach you how to navigate structure, how to systematically explore unknowns, and how search lies at the heart of computation.

Try It Yourself

Build an adjacency list and run DFS/BFS from vertex 1.
Track discovery and finish times in DFS.
Use BFS to compute shortest paths in an unweighted graph.
Modify DFS to count connected components.
Implement IDS for a puzzle like the 8-puzzle.

Graph traversal is the art of exploration , once you master it, the rest of graph theory falls into place.

32. Strongly Connected Components (Tarjan, Kosaraju)

In directed graphs, edges have direction, so connectivity gets tricky. It’s not enough for vertices to be reachable , you need mutual reachability.

That’s the essence of a strongly connected component (SCC):

A set of vertices where every vertex can reach every other vertex.

Think of SCCs as islands of mutual connectivity , inside, you can go anywhere; outside, you can’t. They’re the building blocks for simplifying directed graphs into condensation DAGs (no cycles).

We’ll explore two classic algorithms:

Kosaraju’s Algorithm , clean, intuitive, two-pass- Tarjan’s Algorithm , one-pass, stack-based elegance

1. Definition

A Strongly Connected Component (SCC) in a directed graph ( G = (V, E) ) is a maximal subset of vertices $C \subseteq V$ such that for every pair ( (u, v) C ): $u \to v$ and $v \to u$.

In other words, every node in (C) is reachable from every other node in (C).

Example:

1 → 2 → 3 → 1   forms an SCC  
4 → 5           separate SCCs

2. Applications

Condensation DAG: compress SCCs into single nodes , no cycles remain.- Component-based reasoning: topological sorting on DAG of SCCs.- Program analysis: detecting cycles, dependencies.- Web graphs: find clusters of mutually linked pages.- Control-flow: loops and strongly connected subroutines.

3. Kosaraju’s Algorithm

A simple two-pass algorithm using DFS and graph reversal.

Steps:

Run DFS and push nodes onto a stack in finish-time order.
Reverse the graph (edges flipped).
Pop nodes from stack; DFS on reversed graph; each DFS = one SCC.

A. Implementation

vector<int> adj[MAX], rev[MAX];
bool visited[MAX];
stack<int> st;
vector<vector<int>> sccs;

void dfs1(int u) {
    visited[u] = true;
    for (int v : adj[u])
        if (!visited[v])
            dfs1(v);
    st.push(u);
}

void dfs2(int u, vector<int>& comp) {
    visited[u] = true;
    comp.push_back(u);
    for (int v : rev[u])
        if (!visited[v])
            dfs2(v, comp);
}

void kosaraju(int n) {
    // Pass 1: order by finish time
    for (int i = 1; i <= n; i++)
        if (!visited[i]) dfs1(i);

    // Reverse graph
    for (int u = 1; u <= n; u++)
        for (int v : adj[u])
            rev[v].push_back(u);

    // Pass 2: collect SCCs
    fill(visited, visited + n + 1, false);
    while (!st.empty()) {
        int u = st.top(); st.pop();
        if (!visited[u]) {
            vector<int> comp;
            dfs2(u, comp);
            sccs.push_back(comp);
        }
    }
}

Time Complexity: (O(V + E)) , two DFS passes.

Space Complexity: (O(V + E))

B. Example

Graph:

1 → 2 → 3  
↑   ↓   ↓  
5 ← 4 ← 6

SCCs:

{1,2,4,5}- {3,6}

4. Tarjan’s Algorithm

More elegant: one DFS pass, no reversal, stack-based. It uses discovery times and low-link values to detect SCC roots.

A. Idea

disc[u]: discovery time of node u- low[u]: smallest discovery time reachable from u- A node is root of an SCC if disc[u] == low[u] Maintain a stack of active nodes (in current DFS path).

B. Implementation

vector<int> adj[MAX];
int disc[MAX], low[MAX], timer;
bool inStack[MAX];
stack<int> st;
vector<vector<int>> sccs;

void dfs_tarjan(int u) {
    disc[u] = low[u] = ++timer;
    st.push(u);
    inStack[u] = true;

    for (int v : adj[u]) {
        if (!disc[v]) {
            dfs_tarjan(v);
            low[u] = min(low[u], low[v]);
        } else if (inStack[v]) {
            low[u] = min(low[u], disc[v]);
        }
    }

    if (disc[u] == low[u]) {
        vector<int> comp;
        while (true) {
            int v = st.top(); st.pop();
            inStack[v] = false;
            comp.push_back(v);
            if (v == u) break;
        }
        sccs.push_back(comp);
    }
}

void tarjan(int n) {
    for (int i = 1; i <= n; i++)
        if (!disc[i])
            dfs_tarjan(i);
}

Time Complexity: (O(V + E))

Space Complexity: (O(V))

C. Walkthrough

Graph:

1 → 2 → 3  
↑   ↓   ↓  
5 ← 4 ← 6

DFS visits nodes in order; when it finds a node whose disc == low, it pops from the stack to form an SCC.

Result:

SCC1: 1 2 4 5
SCC2: 3 6

5. Comparison

Feature	Kosaraju	Tarjan
DFS Passes	2	1
Reversal Needed	Yes	No
Stack	Yes (finish order)	Yes (active path)
Implementation	Simple conceptually	Compact, efficient
Time	O(V + E)	O(V + E)

6. Condensation Graph

Once SCCs are found, you can build a DAG: Each SCC becomes a node, edges represent cross-SCC connections. Topological sorting now applies.

Used in:

Dependency analysis- Strong component compression- DAG dynamic programming

Tiny Code

Print SCCs (Tarjan):

tarjan(n);
for (auto &comp : sccs) {
    for (int x : comp) printf("%d ", x);
    printf("\n");
}

Why It Matters

SCC algorithms turn chaotic directed graphs into structured DAGs. They’re the key to reasoning about cycles, dependencies, and modularity.

Understanding them reveals a powerful truth:

“Every complex graph can be reduced to a simple hierarchy , once you find its strongly connected core.”

Try It Yourself

Implement both Kosaraju and Tarjan , verify they match.
Build SCC DAG and run topological sort on it.
Detect cycles via SCC size > 1.
Use SCCs to solve 2-SAT (boolean satisfiability).
Visualize condensation of a graph with 6 nodes.

Once you can find SCCs, you can tame directionality , transforming messy networks into ordered systems.

33. Shortest Paths (Dijkstra, Bellman-Ford, A*, Johnson)

Once you can traverse a graph, the next natural question is:

“What is the shortest path between two vertices?”

Shortest path algorithms are the heart of routing, navigation, planning, and optimization. They compute minimal cost paths , whether distance, time, or weight , and adapt to different edge conditions (non-negative, negative, heuristic).

This section covers the most essential algorithms:

Dijkstra’s Algorithm , efficient for non-negative weights- Bellman-Ford Algorithm , handles negative edges- A* , best-first with heuristics- Johnson’s Algorithm , all-pairs shortest paths in sparse graphs

1. The Shortest Path Problem

Given a weighted graph ( G = (V, E) ) and a source ( s ), find $\text{dist}[v]$, the minimum total weight to reach every vertex ( v ).

Variants:

Single-source shortest path (SSSP) , one source to all- Single-pair , one source to one target- All-pairs shortest path (APSP) , every pair- Dynamic shortest path , with updates

2. Dijkstra’s Algorithm

Best for non-negative weights. Idea: explore vertices in increasing distance order, like water spreading.

A. Steps

Initialize all distances to infinity.
Set source distance = 0.
Use a priority queue to always pick the node with smallest tentative distance.
Relax all outgoing edges.

B. Implementation (Adjacency List)

#include <bits/stdc++.h>
using namespace std;

const int INF = 1e9;
vector<pair<int,int>> adj[1000]; // (neighbor, weight)
int dist[1000];

void dijkstra(int n, int s) {
    fill(dist, dist + n + 1, INF);
    dist[s] = 0;
    priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> pq;
    pq.push({0, s});

    while (!pq.empty()) {
        auto [d, u] = pq.top(); pq.pop();
        if (d != dist[u]) continue;
        for (auto [v, w] : adj[u]) {
            if (dist[v] > dist[u] + w) {
                dist[v] = dist[u] + w;
                pq.push({dist[v], v});
            }
        }
    }
}

Complexity:

Using priority queue (binary heap): $O((V + E)\log V)$
Space: $O(V + E)$

C. Example

Graph:

1 →(2) 2 →(3) 3
↓(4)       ↑(1)
4 →(2)─────┘

dijkstra(1) gives shortest distances:

dist[1] = 0  
dist[2] = 2  
dist[3] = 5  
dist[4] = 4

D. Properties

Works only if all edges $w \ge 0$- Can reconstruct path via parent[v]- Used in:
- GPS and routing systems - Network optimization - Scheduling with positive costs

3. Bellman-Ford Algorithm

Handles negative edge weights, and detects negative cycles.

A. Idea

Relax all edges (V-1) times. If on (V)-th iteration you can still relax → negative cycle exists.

B. Implementation

struct Edge { int u, v, w; };
vector<Edge> edges;
int dist[1000];

bool bellman_ford(int n, int s) {
    fill(dist, dist + n + 1, INF);
    dist[s] = 0;
    for (int i = 1; i <= n - 1; i++) {
        for (auto e : edges) {
            if (dist[e.u] + e.w < dist[e.v])
                dist[e.v] = dist[e.u] + e.w;
        }
    }
    // Check for negative cycle
    for (auto e : edges)
        if (dist[e.u] + e.w < dist[e.v])
            return false; // negative cycle
    return true;
}

Complexity: (O(VE)) Works even when (w < 0).

C. Example

Graph:

1 →(2) 2 →(-5) 3 →(2) 4

Bellman-Ford finds path 1→2→3→4 with total cost (-1).

If a cycle reduces total weight indefinitely, algorithm detects it.

D. Use Cases

Currency exchange arbitrage- Game graphs with penalties- Detecting impossible constraints

4. A* Search Algorithm

Heuristic-guided shortest path, perfect for pathfinding (AI, maps, games).

It combines actual cost and estimated cost: \[ f(v) = g(v) + h(v) \] where

(g(v)): known cost so far- (h(v)): heuristic estimate (must be admissible)

A. Pseudocode

priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> pq;
g[start] = 0;
pq.push({h[start], start});

while (!pq.empty()) {
    auto [f, u] = pq.top(); pq.pop();
    if (u == goal) break;
    for (auto [v, w] : adj[u]) {
        int new_g = g[u] + w;
        if (new_g < g[v]) {
            g[v] = new_g;
            pq.push({g[v] + h[v], v});
        }
    }
}

Heuristic Example:

Euclidean distance (for grids)- Manhattan distance (for 4-direction movement)

B. Use Cases

Game AI (pathfinding)- Robot motion planning- Map navigation Complexity: (O(E)) in best case, depends on heuristic quality.

5. Johnson’s Algorithm

Goal: All-Pairs Shortest Path in sparse graphs with negative edges (no negative cycles).

Idea:

Add new vertex q connected to all others with edge weight 0
Run Bellman-Ford from q to get potential h(v)
Reweight edges: (w’(u, v) = w(u, v) + h(u) - h(v)) (now all weights ≥ 0)
Run Dijkstra from each vertex

Complexity: (O$VE + V^2 \log V$)

6. Summary

Algorithm	Handles Negative Weights	Detects Negative Cycle	Heuristic	Complexity	Use Case
Dijkstra	No	No	No	O((V+E) log V)	Non-negative weights
Bellman-Ford	Yes	Yes	No	O(VE)	Negative edges
A*	No (unless careful)	No	Yes	Depends	Pathfinding
Johnson	Yes (no neg. cycles)	Yes	No	O(VE + V log V)	All-pairs, sparse

Tiny Code

Dijkstra Example:

dijkstra(n, 1);
for (int i = 1; i <= n; i++)
    printf("dist[%d] = %d\n", i, dist[i]);

Why It Matters

Shortest paths are the essence of optimization , not just in graphs, but in reasoning: finding minimal cost, minimal distance, minimal risk.

These algorithms teach:

“The path to a goal isn’t random , it’s guided by structure, weight, and knowledge.”

Try It Yourself

Build a weighted graph and compare Dijkstra vs Bellman-Ford.
Introduce a negative edge and observe Bellman-Ford detecting it.
Implement A* on a grid with obstacles.
Use Dijkstra to plan routes in a city map dataset.
Try Johnson’s algorithm for all-pairs shortest paths.

Master these, and you master direction + cost = intelligence in motion.

34. Shortest Path Variants (0-1 BFS, Bidirectional, Heuristic A*)

Sometimes the classic shortest path algorithms aren’t enough. You might have special edge weights (only 0 or 1), a need for faster searches, or extra structure you can exploit.

That’s where shortest path variants come in , they’re optimized adaptations of the big three (BFS, Dijkstra, A*) for specific scenarios.

In this section, we’ll explore:

0-1 BFS → when edge weights are only 0 or 1- Bidirectional Search → meet-in-the-middle for speed- Heuristic A* → smarter exploration guided by estimates Each shows how structure in your problem can yield speed-ups.

1. 0-1 BFS

If all edge weights are either 0 or 1, you don’t need a priority queue. A deque (double-ended queue) is enough for (O(V + E)) time.

Why? Because edges with weight 0 should be processed immediately, while edges with weight 1 can wait one step longer.

A. Algorithm

Use a deque.

When relaxing an edge with weight 0, push to front.- When relaxing an edge with weight 1, push to back.

const int INF = 1e9;
vector<pair<int,int>> adj[1000]; // (v, w)
int dist[1000];

void zero_one_bfs(int n, int s) {
    fill(dist, dist + n + 1, INF);
    deque<int> dq;
    dist[s] = 0;
    dq.push_front(s);

    while (!dq.empty()) {
        int u = dq.front(); dq.pop_front();
        for (auto [v, w] : adj[u]) {
            if (dist[v] > dist[u] + w) {
                dist[v] = dist[u] + w;
                if (w == 0) dq.push_front(v);
                else dq.push_back(v);
            }
        }
    }
}

B. Example

Graph:

1 -0-> 2 -1-> 3  
|              ^  
1              |  
+--------------+

Shortest path from 1 to 3 = 1 (via edge 1-2-3). Deque ensures weight-0 edges don’t get delayed.

C. Complexity

Time	Space	Notes
O(V + E)	O(V)	Optimal for binary weights

Used in:

Layered BFS- Grid problems with binary costs- BFS with teleportation (weight 0 edges)

2. Bidirectional Search

Sometimes you just need one path , from source to target , in an unweighted graph. Instead of expanding from one side, expand from both ends and stop when they meet.

This reduces search depth from (O$b^d$) to (O$b^{d/2}$) (huge gain for large graphs).

A. Idea

Run BFS from both source and target simultaneously. When their frontiers intersect, you’ve found the shortest path.

B. Implementation

bool visited_from_s[MAX], visited_from_t[MAX];
queue<int> qs, qt;

int bidirectional_bfs(int s, int t) {
    qs.push(s); visited_from_s[s] = true;
    qt.push(t); visited_from_t[t] = true;

    while (!qs.empty() && !qt.empty()) {
        if (step(qs, visited_from_s, visited_from_t)) return 1;
        if (step(qt, visited_from_t, visited_from_s)) return 1;
    }
    return 0;
}

bool step(queue<int>& q, bool vis[], bool other[]) {
    int size = q.size();
    while (size--) {
        int u = q.front(); q.pop();
        if (other[u]) return true;
        for (int v : adj[u]) {
            if (!vis[v]) {
                vis[v] = true;
                q.push(v);
            }
        }
    }
    return false;
}

C. Complexity

Time	Space	Notes
O$b^{d/2}$	O$b^{d/2}$	Doubly fast in practice

Used in:

Maze solvers- Shortest paths in large sparse graphs- Social network “degrees of separation”

3. Heuristic A* (Revisited)

A* generalizes Dijkstra with goal-directed search using heuristics. We revisit it here to show how heuristics change exploration order.

A. Cost Function

\[ f(v) = g(v) + h(v) \]

(g(v)): cost so far- (h(v)): estimated cost to goal- (h(v)) must be admissible ((h(v) ))

B. Implementation

struct Node {
    int v; int f, g;
    bool operator>(const Node& o) const { return f > o.f; }
};

priority_queue<Node, vector<Node>, greater<Node>> pq;

void astar(int start, int goal) {
    g[start] = 0;
    h[start] = heuristic(start, goal);
    pq.push({start, g[start] + h[start], g[start]});

    while (!pq.empty()) {
        auto [u, f_u, g_u] = pq.top(); pq.pop();
        if (u == goal) break;
        for (auto [v, w] : adj[u]) {
            int new_g = g[u] + w;
            if (new_g < g[v]) {
                g[v] = new_g;
                int f_v = new_g + heuristic(v, goal);
                pq.push({v, f_v, new_g});
            }
        }
    }
}

C. Example Heuristics

Grid map: Manhattan distance (h(x, y) = |x - x_g| + |y - y_g|)
Navigation: straight-line (Euclidean)- Game tree: evaluation function

D. Performance

Heuristic	Effect
Perfect (h = true cost)	Optimal, visits minimal nodes
Admissible but weak	Still correct, more nodes
Overestimate	May fail (non-admissible)

4. Comparison

Algorithm	Weight Type	Strategy	Time	Space	Notes
0-1 BFS	0 or 1	Deque-based	O(V+E)	O(V)	No heap
Bidirectional BFS	Unweighted	Two-way search	O$b^{d/2}$	O$b^{d/2}$	Meets in middle
A*	Non-negative	Heuristic search	Depends	O(V)	Guided

5. Example Scenario

Problem	Variant
Grid with teleport (cost 0)	0-1 BFS
Huge social graph (find shortest chain)	Bidirectional BFS
Game AI pathfinding	A* with Manhattan heuristic

Tiny Code

0-1 BFS Quick Demo:

add_edge(1, 2, 0);
add_edge(2, 3, 1);
zero_one_bfs(3, 1);
printf("%d\n", dist[3]); // shortest = 1

Why It Matters

Special cases deserve special tools. These variants show that understanding structure (like edge weights or symmetry) can yield huge gains.

They embody a principle:

“Don’t just run faster , run smarter, guided by what you know.”

Try It Yourself

Implement 0-1 BFS for a grid with cost 0 teleports.
Compare BFS vs Bidirectional BFS on a large maze.
Write A* for an 8x8 chessboard knight’s move puzzle.
Tune heuristics , see how overestimating breaks A*.
Combine A* and 0-1 BFS for hybrid search.

With these in hand, you can bend shortest path search to the shape of your problem , efficient, elegant, and exact.

35. Minimum Spanning Trees (Kruskal, Prim, Borůvka)

When a graph connects multiple points with weighted edges, sometimes you don’t want the shortest path, but the cheapest network that connects everything.

That’s the Minimum Spanning Tree (MST) problem:

Given a connected, weighted, undirected graph, find a subset of edges that connects all vertices with minimum total weight and no cycles.

MSTs are everywhere , from building networks and designing circuits to clustering and approximation algorithms.

Three cornerstone algorithms solve it beautifully:

Kruskal’s , edge-based, union-find- Prim’s , vertex-based, greedy expansion- Borůvka’s , component merging in parallel

1. What Is a Spanning Tree?

A spanning tree connects all vertices with exactly (V-1) edges. Among all spanning trees, the one with minimum total weight is the MST.

Properties:

Contains no cycles- Connects all vertices- Edge count = (V - 1)- Unique if all weights distinct

2. MST Applications

Network design (roads, cables, pipelines)- Clustering (e.g., hierarchical clustering)- Image segmentation- Approximation (e.g., TSP ~ 2 × MST)- Graph simplification

3. Kruskal’s Algorithm

Build the MST edge-by-edge, in order of increasing weight. Use Union-Find (Disjoint Set Union) to avoid cycles.

A. Steps

Sort all edges by weight.
Initialize each vertex as its own component.
For each edge (u, v):
- If u and v are in different components → include edge - Union their sets Stop when (V-1) edges chosen.

B. Implementation

struct Edge { int u, v, w; };
vector<Edge> edges;
int parent[MAX], rank_[MAX];

int find(int x) {
    return parent[x] == x ? x : parent[x] = find(parent[x]);
}
bool unite(int a, int b) {
    a = find(a); b = find(b);
    if (a == b) return false;
    if (rank_[a] < rank_[b]) swap(a, b);
    parent[b] = a;
    if (rank_[a] == rank_[b]) rank_[a]++;
    return true;
}

int kruskal(int n) {
    iota(parent, parent + n + 1, 0);
    sort(edges.begin(), edges.end(), [](Edge a, Edge b){ return a.w < b.w; });
    int total = 0;
    for (auto &e : edges)
        if (unite(e.u, e.v))
            total += e.w;
    return total;
}

Complexity:

Sorting edges: (O$E \log E$)- Union-Find operations: (O((V))) (almost constant)- Total: (O$E \log E$)

C. Example

Graph:

1 -4- 2  
|     |  
2     3  
 \-1-/

Edges sorted: (1-3,1), (1-2,4), (2-3,3)

Pick 1-3, 2-3 → MST weight = 1 + 3 = 4

4. Prim’s Algorithm

Grow MST from a starting vertex, adding the smallest outgoing edge each step.

Similar to Dijkstra , but pick edges, not distances.

A. Steps

Start with one vertex, mark as visited.
Use priority queue for candidate edges.
Pick smallest edge that connects to an unvisited vertex.
Add vertex to MST, repeat until all visited.

B. Implementation

vector<pair<int,int>> adj[MAX]; // (v, w)
bool used[MAX];
int prim(int n, int start) {
    priority_queue<pair<int,int>, vector<pair<int,int>>, greater<>> pq;
    pq.push({0, start});
    int total = 0;

    while (!pq.empty()) {
        auto [w, u] = pq.top(); pq.pop();
        if (used[u]) continue;
        used[u] = true;
        total += w;
        for (auto [v, w2] : adj[u])
            if (!used[v]) pq.push({w2, v});
    }
    return total;
}

Complexity:

$O((V+E) \log V)$ with binary heap

Used when:

Graph is dense
Easier to grow tree than sort all edges

C. Example

Graph:

1 -2- 2  
|     |  
4     1  
 \-3-/

Start at 1 → choose (1-2), (1-3) → MST weight = 2 + 3 = 5

5. Borůvka’s Algorithm

Less famous, but elegant , merges cheapest outgoing edge per component in parallel.

Each component picks one cheapest outgoing edge, adds it, merges components. Repeat until one component left.

Complexity: (O$E \log V$)

Used in parallel/distributed MST computations.

6. Comparison

Algorithm	Strategy	Time	Space	Best For
Kruskal	Edge-based, sort all edges	O(E log E)	O(E)	Sparse graphs
Prim	Vertex-based, grow tree	O(E log V)	O(V+E)	Dense graphs
Borůvka	Component merging	O(E log V)	O(E)	Parallel MST

7. MST Properties

Cut Property: For any cut, smallest crossing edge ∈ MST.- Cycle Property: For any cycle, largest edge not ∈ MST.- MST may not be unique if equal weights.

8. Building the Tree

Store MST edges:

vector<Edge> mst_edges;
if (unite(e.u, e.v)) mst_edges.push_back(e);

Then use MST for:

Path queries- Clustering (remove largest edge)- Approximation TSP (preorder traversal)

Tiny Code

Kruskal Example:

edges.push_back({1,2,4});
edges.push_back({1,3,1});
edges.push_back({2,3,3});
printf("MST = %d\n", kruskal(3)); // 4

Why It Matters

MSTs model connection without redundancy. They’re about efficiency , connecting everything at minimal cost, a principle that appears in infrastructure, data, and even ideas.

They teach:

“You can connect the whole with less , if you choose wisely.”

Try It Yourself

Implement Kruskal’s algorithm using union-find.
Run Prim’s algorithm and compare output.
Build MST on random weighted graph , visualize tree.
Remove heaviest edge from MST to form two clusters.
Explore Borůvka for parallel execution.

MSTs are how you span complexity with minimal effort , a tree of balance, economy, and order.

36. Flows (Ford-Fulkerson, Edmonds-Karp, Dinic)

Some graphs don’t just connect , they carry something. Imagine water flowing through pipes, traffic through roads, data through a network. Each edge has a capacity, and you want to know:

“How much can I send from source to sink before the system clogs?”

That’s the Maximum Flow problem , a cornerstone of combinatorial optimization, powering algorithms for matching, cuts, scheduling, and more.

This section covers the big three:

Ford-Fulkerson , the primal idea- Edmonds-Karp , BFS-based implementation- Dinic’s Algorithm , layered speed

1. Problem Definition

Given a directed graph ( G = (V, E) ), each edge ( (u, v) ) has a capacity ( c(u, v) ).

We have:

Source ( s )- Sink ( t ) We want the maximum flow from ( s ) to ( t ): a function ( f(u, v) ) that satisfies:

Capacity constraint: ( 0 f(u, v) c(u, v) )
Flow conservation: For every vertex $v \neq s, t$: (f(u, v) = f(v, w))

Total flow = (f(s, v))

2. The Big Picture

Max Flow - Min Cut Theorem:

The value of the maximum flow equals the capacity of the minimum cut.

So finding a max flow is equivalent to finding the bottleneck.

3. Ford-Fulkerson Method

The idea:

While there exists a path from (s) to (t) with available capacity, push flow along it.

Each step:

Find augmenting path
Send flow = min residual capacity along it
Update residual capacities

Repeat until no augmenting path.

A. Residual Graph

Residual capacity: \[ r(u, v) = c(u, v) - f(u, v) \] If ( f(u, v) > 0 ), then add reverse edge ( (v, u) ) with capacity ( f(u, v) ).

This allows undoing flow if needed.

B. Implementation (DFS-style)

const int INF = 1e9;
vector<pair<int,int>> adj[MAX];
int cap[MAX][MAX];

int dfs(int u, int t, int flow, vector<int>& vis) {
    if (u == t) return flow;
    vis[u] = 1;
    for (auto [v, _] : adj[u]) {
        if (!vis[v] && cap[u][v] > 0) {
            int pushed = dfs(v, t, min(flow, cap[u][v]), vis);
            if (pushed > 0) {
                cap[u][v] -= pushed;
                cap[v][u] += pushed;
                return pushed;
            }
        }
    }
    return 0;
}

int ford_fulkerson(int s, int t, int n) {
    int flow = 0;
    while (true) {
        vector<int> vis(n + 1, 0);
        int pushed = dfs(s, t, INF, vis);
        if (pushed == 0) break;
        flow += pushed;
    }
    return flow;
}

Complexity: (O$E \cdot \text{max flow}$) , depends on flow magnitude.

4. Edmonds-Karp Algorithm

A refinement:

Always choose shortest augmenting path (by edges) using BFS.

Guarantees polynomial time.

A. Implementation (BFS + parent tracking)

int bfs(int s, int t, vector<int>& parent, int n) {
    fill(parent.begin(), parent.end(), -1);
    queue<pair<int,int>> q;
    q.push({s, INF});
    parent[s] = -2;
    while (!q.empty()) {
        auto [u, flow] = q.front(); q.pop();
        for (auto [v, _] : adj[u]) {
            if (parent[v] == -1 && cap[u][v] > 0) {
                int new_flow = min(flow, cap[u][v]);
                parent[v] = u;
                if (v == t) return new_flow;
                q.push({v, new_flow});
            }
        }
    }
    return 0;
}

int edmonds_karp(int s, int t, int n) {
    int flow = 0;
    vector<int> parent(n + 1);
    int new_flow;
    while ((new_flow = bfs(s, t, parent, n))) {
        flow += new_flow;
        int v = t;
        while (v != s) {
            int u = parent[v];
            cap[u][v] -= new_flow;
            cap[v][u] += new_flow;
            v = u;
        }
    }
    return flow;
}

Complexity: (O$VE^2$) Always terminates (no dependence on flow values).

5. Dinic’s Algorithm

A modern classic , uses BFS to build level graph, and DFS to send blocking flow.

It works layer-by-layer, avoiding useless exploration.

A. Steps

Build level graph via BFS (assign levels to reachable nodes).
DFS sends flow along level-respecting paths.
Repeat until no path remains.

B. Implementation

vector<int> level, ptr;

bool bfs_level(int s, int t, int n) {
    fill(level.begin(), level.end(), -1);
    queue<int> q;
    q.push(s);
    level[s] = 0;
    while (!q.empty()) {
        int u = q.front(); q.pop();
        for (auto [v, _] : adj[u])
            if (level[v] == -1 && cap[u][v] > 0) {
                level[v] = level[u] + 1;
                q.push(v);
            }
    }
    return level[t] != -1;
}

int dfs_flow(int u, int t, int pushed) {
    if (u == t || pushed == 0) return pushed;
    for (int &cid = ptr[u]; cid < (int)adj[u].size(); cid++) {
        int v = adj[u][cid].first;
        if (level[v] == level[u] + 1 && cap[u][v] > 0) {
            int tr = dfs_flow(v, t, min(pushed, cap[u][v]));
            if (tr > 0) {
                cap[u][v] -= tr;
                cap[v][u] += tr;
                return tr;
            }
        }
    }
    return 0;
}

int dinic(int s, int t, int n) {
    int flow = 0;
    level.resize(n + 1);
    ptr.resize(n + 1);
    while (bfs_level(s, t, n)) {
        fill(ptr.begin(), ptr.end(), 0);
        while (int pushed = dfs_flow(s, t, INF))
            flow += pushed;
    }
    return flow;
}

Complexity: (O$EV^2$) worst case, (O$E \sqrt{V}$) in practice.

6. Comparison

Algorithm	Strategy	Handles	Time	Notes
Ford-Fulkerson	DFS augmenting paths	Integral capacities	O$E × max_flow$	Simple, may loop on reals
Edmonds-Karp	BFS augmenting paths	All capacities	O(VE²)	Always terminates
Dinic	Level graph + DFS	All capacities	O(V²E)	Fast in practice

7. Applications

Network routing- Bipartite matching- Task assignment (flows = people → jobs)- Image segmentation (min-cut)- Circulation with demands- Data pipelines, max throughput systems

Tiny Code

Ford-Fulkerson Example:

add_edge(1, 2, 3);
add_edge(1, 3, 2);
add_edge(2, 3, 5);
add_edge(2, 4, 2);
add_edge(3, 4, 3);
printf("Max flow = %d\n", ford_fulkerson(1, 4, 4)); // 5

Why It Matters

Flow algorithms transform capacity constraints into solvable systems. They reveal the deep unity between optimization and structure: every maximum flow defines a minimum bottleneck cut.

They embody a timeless truth:

“To understand limits, follow the flow.”

Try It Yourself

Implement Ford-Fulkerson using DFS.
Switch to Edmonds-Karp and observe performance gain.
Build Dinic’s level graph and visualize layers.
Model job assignment as bipartite flow.
Verify Max Flow = Min Cut on small examples.

Once you master flows, you’ll see them hidden in everything that moves , from data to decisions.

37. Cuts (Stoer-Wagner, Karger, Gomory-Hu)

Where flow problems ask “How much can we send?”, cut problems ask “Where does it break?”

A cut splits a graph into two disjoint sets. The minimum cut is the smallest set of edges whose removal disconnects the graph , the tightest “bottleneck” holding it together.

This chapter explores three major algorithms:

Stoer-Wagner , deterministic min-cut for undirected graphs- Karger’s Randomized Algorithm , fast, probabilistic- Gomory-Hu Tree , compress all-pairs min-cuts into one tree Cuts reveal hidden structure , clusters, vulnerabilities, boundaries , and form the dual to flows via the Max-Flow Min-Cut Theorem.

1. The Min-Cut Problem

Given a weighted undirected graph ( G = (V, E) ): Find the minimum total weight of edges whose removal disconnects the graph.

Equivalent to:

The smallest sum of edge weights crossing any partition ( $S, V \setminus S$ ).

For directed graphs, you use max-flow methods; For undirected graphs, specialized algorithms exist.

2. Applications

Network reliability , weakest link detection- Clustering , partition graph by minimal interconnection- Circuit design , splitting components- Image segmentation , separating regions- Community detection , sparse connections between groups

3. Stoer-Wagner Algorithm (Deterministic)

A clean, deterministic method for global minimum cut in undirected graphs.

A. Idea

Start with the full vertex set ( V ).
Repeatedly run Maximum Adjacency Search:
- Start from a vertex - Grow a set by adding the most tightly connected vertex - The last added vertex defines a cut3. Contract the last two added vertices into one.
Keep track of smallest cut seen.

Repeat until one vertex remains.

B. Implementation (Adjacency Matrix)

const int INF = 1e9;
int g[MAX][MAX], w[MAX];
bool added[MAX], exist[MAX];

int stoer_wagner(int n) {
    int best = INF;
    vector<int> v(n);
    iota(v.begin(), v.end(), 0);

    while (n > 1) {
        fill(w, w + n, 0);
        fill(added, added + n, false);
        int prev = 0;
        for (int i = 0; i < n; i++) {
            int sel = -1;
            for (int j = 0; j < n; j++)
                if (!added[j] && (sel == -1 || w[j] > w[sel])) sel = j;
            if (i == n - 1) {
                best = min(best, w[sel]);
                for (int j = 0; j < n; j++)
                    g[prev][j] = g[j][prev] += g[sel][j];
                v.erase(v.begin() + sel);
                n--;
                break;
            }
            added[sel] = true;
            for (int j = 0; j < n; j++) w[j] += g[sel][j];
            prev = sel;
        }
    }
    return best;
}

Complexity: (O$V^3$), or (O$VE + V^2 \log V$) with heaps Input: weighted undirected graph Output: global min cut value

C. Example

Graph:

1 -3- 2  
|     |  
4     2  
 \-5-/

Cuts:

{1,2}|{3} → 7- {1,3}|{2} → 5 Min cut = 5

4. Karger’s Algorithm (Randomized)

A simple, elegant probabilistic method. Repeatedly contract random edges until two vertices remain; the remaining crossing edges form a cut.

Run multiple times → high probability of finding min cut.

A. Algorithm

While ( |V| > 2 ):
- Choose random edge ((u, v)) - Contract (u, v) into one node - Remove self-loops2. Return number of edges between remaining nodes

Repeat (O$n^2 \log n$) times for high confidence.

B. Implementation Sketch

struct Edge { int u, v; };
vector<Edge> edges;
int parent[MAX];

int find(int x) { return parent[x] == x ? x : parent[x] = find(parent[x]); }
void unite(int a, int b) { parent[find(b)] = find(a); }

int karger(int n) {
    int m = edges.size();
    iota(parent, parent + n, 0);
    int vertices = n;
    while (vertices > 2) {
        int i = rand() % m;
        int u = find(edges[i].u), v = find(edges[i].v);
        if (u == v) continue;
        unite(u, v);
        vertices--;
    }
    int cuts = 0;
    for (auto e : edges)
        if (find(e.u) != find(e.v)) cuts++;
    return cuts;
}

Expected Time: (O$n^2$) per run Probability of success: (2 / (n(n-1))) per run Run multiple trials and take minimum.

C. Use Case

Great for large sparse graphs, or when approximate solutions are acceptable. Intuitive: the min cut survives random contractions if chosen carefully enough.

5. Gomory-Hu Tree

A compact way to store all-pairs min-cuts. It compresses (O$V^2$) flow computations into V-1 cuts.

A. Idea

Build a tree where the min cut between any two vertices = the minimum edge weight on their path in the tree.

B. Algorithm

Pick vertex (s).
For each vertex $t \neq s$,
- Run max flow to find min cut between (s, t). - Partition vertices accordingly.3. Connect partitions to form a tree.

Result: Gomory-Hu tree (V-1 edges).

Now any pair’s min cut = smallest edge on path between them.

Complexity: (O(V)) max flow runs.

C. Uses

Quickly answer all-pairs cut queries- Network reliability- Hierarchical clustering

6. Comparison

Algorithm	Type	Randomized	Graph	Complexity	Output
Stoer-Wagner	Deterministic	No	Undirected	O(V³)	Global min cut
Karger	Randomized	Yes	Undirected	O(n² log n) (multi-run)	Probabilistic min cut
Gomory-Hu	Deterministic	No	Undirected	O(V × MaxFlow)	All-pairs min cuts

7. Relationship to Flows

By Max-Flow Min-Cut, min-cut capacity = max-flow value.

So you can find:

s-t min cut = via max flow- global min cut = min over all (s, t) pairs Specialized algorithms just make it faster.

Tiny Code

Stoer-Wagner Example:

printf("Global Min Cut = %d\n", stoer_wagner(n));

Karger Multi-Run:

int ans = INF;
for (int i = 0; i < 100; i++)
    ans = min(ans, karger(n));
printf("Approx Min Cut = %d\n", ans);

Why It Matters

Cuts show you fragility , the weak seams of connection. While flows tell you how much can pass, cuts reveal where it breaks first.

They teach:

“To understand strength, study what happens when you pull things apart.”

Try It Yourself

Implement Stoer-Wagner and test on small graphs.
Run Karger 100 times and track success rate.
Build a Gomory-Hu tree and answer random pair queries.
Verify Max-Flow = Min-Cut equivalence on examples.
Use cuts for community detection in social graphs.

Mastering cuts gives you both grip and insight , where systems hold, and where they give way.

38. Matchings (Hopcroft-Karp, Hungarian, Blossom)

In many problems, we need to pair up elements efficiently: students to schools, jobs to workers, tasks to machines.

These are matching problems , find sets of edges with no shared endpoints that maximize cardinality or weight.

Depending on graph type, different algorithms apply:

Hopcroft-Karp , fast matching in bipartite graphs- Hungarian Algorithm , optimal weighted assignment- Edmonds’ Blossom Algorithm , general graphs (non-bipartite) Matching is a fundamental combinatorial structure, appearing in scheduling, flow networks, and resource allocation.

1. Terminology

Matching: set of edges with no shared vertices- Maximum Matching: matching with largest number of edges- Perfect Matching: covers all vertices (each vertex matched once)- Maximum Weight Matching: matching with largest total edge weight Graph Types:
Bipartite: vertices split into two sets (L, R); edges only between sets- General: arbitrary connections (may contain odd cycles)

2. Applications

Job assignment- Network flows- Resource allocation- Student-project pairing- Stable marriages (with preferences)- Computer vision (feature correspondence)

3. Hopcroft-Karp Algorithm (Bipartite Matching)

A highly efficient algorithm for maximum cardinality matching in bipartite graphs.

It uses layered BFS + DFS to find multiple augmenting paths simultaneously.

A. Idea

Initialize matching empty.
While augmenting paths exist:
- BFS builds layer graph (shortest augmenting paths). - DFS finds all augmenting paths along those layers. Each phase increases matching size significantly.

B. Complexity

\[ O(E \sqrt{V}) \]

Much faster than augmenting one path at a time (like Ford-Fulkerson).

C. Implementation

Let pairU[u] = matched vertex in R, or 0 if unmatched pairV[v] = matched vertex in L, or 0 if unmatched

vector<int> adjL[MAX];
int pairU[MAX], pairV[MAX], dist[MAX];
int nL, nR;

bool bfs() {
    queue<int> q;
    for (int u = 1; u <= nL; u++) {
        if (!pairU[u]) dist[u] = 0, q.push(u);
        else dist[u] = INF;
    }
    int found = INF;
    while (!q.empty()) {
        int u = q.front(); q.pop();
        if (dist[u] < found) {
            for (int v : adjL[u]) {
                if (!pairV[v]) found = dist[u] + 1;
                else if (dist[pairV[v]] == INF) {
                    dist[pairV[v]] = dist[u] + 1;
                    q.push(pairV[v]);
                }
            }
        }
    }
    return found != INF;
}

bool dfs(int u) {
    for (int v : adjL[u]) {
        if (!pairV[v] || (dist[pairV[v]] == dist[u] + 1 && dfs(pairV[v]))) {
            pairU[u] = v;
            pairV[v] = u;
            return true;
        }
    }
    dist[u] = INF;
    return false;
}

int hopcroft_karp() {
    int matching = 0;
    while (bfs()) {
        for (int u = 1; u <= nL; u++)
            if (!pairU[u] && dfs(u)) matching++;
    }
    return matching;
}

D. Example

Graph:

U = {1,2,3}, V = {a,b}
Edges: 1–a, 2–a, 3–b

Matching: {1-a, 3-b} (size 2)

4. Hungarian Algorithm (Weighted Bipartite Matching)

Solves assignment problem , given cost matrix $c_{ij}$, assign each (i) to one (j) minimizing total cost (or maximizing profit).

A. Idea

Subtract minimums row- and column-wise → expose zeros → find minimal zero-cover → adjust matrix → repeat.

Equivalent to solving min-cost perfect matching on a bipartite graph.

B. Complexity

\[ O(V^3) \]

Works for dense graphs, moderate sizes.

C. Implementation Sketch (Matrix Form)

int hungarian(const vector<vector<int>>& cost) {
    int n = cost.size();
    vector<int> u(n+1), v(n+1), p(n+1), way(n+1);
    for (int i = 1; i <= n; i++) {
        p[0] = i; int j0 = 0;
        vector<int> minv(n+1, INF);
        vector<char> used(n+1, false);
        do {
            used[j0] = true;
            int i0 = p[j0], delta = INF, j1;
            for (int j = 1; j <= n; j++) if (!used[j]) {
                int cur = cost[i0-1][j-1] - u[i0] - v[j];
                if (cur < minv[j]) minv[j] = cur, way[j] = j0;
                if (minv[j] < delta) delta = minv[j], j1 = j;
            }
            for (int j = 0; j <= n; j++)
                if (used[j]) u[p[j]] += delta, v[j] -= delta;
                else minv[j] -= delta;
            j0 = j1;
        } while (p[j0]);
        do { int j1 = way[j0]; p[j0] = p[j1]; j0 = j1; } while (j0);
    }
    return -v[0]; // minimal cost
}

D. Example

Cost matrix:

Optimal assignment = 1-c, 2-a, 3-b Cost = 1 + 2 + 2 = 5

5. Edmonds’ Blossom Algorithm (General Graphs)

For non-bipartite graphs, simple augmenting path logic breaks down (odd cycles). Blossom algorithm handles this via contraction of blossoms (odd cycles).

A. Idea

Find augmenting paths- When odd cycle encountered (blossom), shrink it into one vertex- Continue search- Expand blossoms at end

B. Complexity

\[ O(V^3) \]

Though complex to implement, it’s the general-purpose solution for matchings.

C. Use Cases

Non-bipartite job/task assignments- General pairing problems- Network design

6. Comparison

Algorithm	Graph Type	Weighted	Complexity	Output
Hopcroft-Karp	Bipartite	No	O(E√V)	Max cardinality
Hungarian	Bipartite	Yes	O(V³)	Min/Max cost matching
Blossom	General	Yes	O(V³)	Max cardinality or weight

7. Relation to Flows

Bipartite matching = max flow on network:

Left → Source edges (capacity 1)- Right → Sink edges (capacity 1)- Between sets → edges (capacity 1) Matching size = flow value

Tiny Code

Hopcroft-Karp Demo:

nL = 3; nR = 2;
adjL[1] = {1};
adjL[2] = {1};
adjL[3] = {2};
printf("Max Matching = %d\n", hopcroft_karp()); // 2

Why It Matters

Matchings are the language of pairing and assignment. They express cooperation without overlap , a structure of balance.

They reveal a deep duality:

“Every match is a flow, every assignment an optimization.”

Try It Yourself

Build a bipartite graph and run Hopcroft-Karp.
Solve an assignment problem with Hungarian algorithm.
Explore Blossom’s contraction idea conceptually.
Compare max-flow vs matching approach.
Use matching to model scheduling (people ↔︎ tasks).

Matching teaches how to pair without conflict, a lesson both mathematical and universal.

39. Tree Algorithms (LCA, HLD, Centroid Decomposition)

Trees are the backbone of many algorithms , they are connected, acyclic, and wonderfully structured.

Because of their simplicity, they allow elegant divide-and-conquer, dynamic programming, and query techniques. This section covers three fundamental patterns:

Lowest Common Ancestor (LCA) , answer ancestor queries fast- Heavy-Light Decomposition (HLD) , break trees into chains for segment trees / path queries- Centroid Decomposition , recursively split tree by balance for divide-and-conquer Each reveals a different way to reason about trees , by depth, by chains, or by balance.

1. Lowest Common Ancestor (LCA)

Given a tree, two nodes (u, v). The LCA is the lowest node (farthest from root) that is an ancestor of both.

Applications:

Distance queries- Path decomposition- RMQ / binary lifting- Tree DP and rerooting

A. Naive Approach

Climb ancestors until they meet. But this is (O(n)) per query , too slow for many queries.

B. Binary Lifting

Precompute ancestors at powers of 2. Then jump up by powers to align depths.

Preprocessing:

DFS to record depth
up[v][k] = 2^k-th ancestor of v

Answering query:

Lift deeper node up to same depth
Lift both together while up[u][k] != up[v][k]
Return parent

Code:

const int LOG = 20;
vector<int> adj[MAX];
int up[MAX][LOG], depth[MAX];

void dfs(int u, int p) {
    up[u][0] = p;
    for (int k = 1; k < LOG; k++)
        up[u][k] = up[up[u][k-1]][k-1];
    for (int v : adj[u]) if (v != p) {
        depth[v] = depth[u] + 1;
        dfs(v, u);
    }
}

int lca(int u, int v) {
    if (depth[u] < depth[v]) swap(u, v);
    int diff = depth[u] - depth[v];
    for (int k = 0; k < LOG; k++)
        if (diff & (1 << k)) u = up[u][k];
    if (u == v) return u;
    for (int k = LOG-1; k >= 0; k--)
        if (up[u][k] != up[v][k])
            u = up[u][k], v = up[v][k];
    return up[u][0];
}

Complexity:

Preprocess: (O$n \log n$)- Query: (O$\log n$)

C. Example

Tree:

LCA(4,5) = 2- LCA(4,3) = 1

2. Heavy-Light Decomposition (HLD)

When you need to query paths (sum, max, min, etc.) on trees efficiently, you can use Heavy-Light Decomposition.

A. Idea

Decompose the tree into chains:

Heavy edge = edge to child with largest subtree- Light edges = others Result: Every path from root to leaf crosses at most (O$\log n$) light edges.

So, a path query can be broken into (O$\log^2 n$) segment tree queries.

B. Steps

DFS to compute subtree sizes and identify heavy child
Decompose into chains
Assign IDs for segment tree
Use Segment Tree / BIT on linearized array

Key functions:

dfs_sz(u) → compute subtree sizes- decompose(u, head) → assign chain heads Code (core):

int parent[MAX], depth[MAX], heavy[MAX], head[MAX], pos[MAX];
int cur_pos = 0;

int dfs_sz(int u) {
    int size = 1, max_sz = 0;
    for (int v : adj[u]) if (v != parent[u]) {
        parent[v] = u;
        depth[v] = depth[u] + 1;
        int sz = dfs_sz(v);
        if (sz > max_sz) max_sz = sz, heavy[u] = v;
        size += sz;
    }
    return size;
}

void decompose(int u, int h) {
    head[u] = h;
    pos[u] = cur_pos++;
    if (heavy[u] != -1) decompose(heavy[u], h);
    for (int v : adj[u])
        if (v != parent[u] && v != heavy[u])
            decompose(v, v);
}

Query path(u, v):

While heads differ, move up chain by chain- Query segment tree in [pos[head[u]], pos[u]]- When in same chain, query segment [pos[v], pos[u]] Complexity:
Build: (O(n))- Query/Update: (O$\log^2 n$)

C. Use Cases

Path sums- Path maximums- Edge updates- Subtree queries

3. Centroid Decomposition

Centroid = node that splits tree into subtrees ≤ n/2 each. By removing centroids recursively, we form a centroid tree.

Used for divide-and-conquer on trees.

A. Steps

Find centroid
- DFS to compute subtree sizes - Choose node where largest subtree ≤ n/22. Decompose:
- Remove centroid - Recurse on subtrees Code (core):

int subtree[MAX];
bool removed[MAX];
vector<int> adj[MAX];

int dfs_size(int u, int p) {
    subtree[u] = 1;
    for (int v : adj[u])
        if (v != p && !removed[v])
            subtree[u] += dfs_size(v, u);
    return subtree[u];
}

int find_centroid(int u, int p, int n) {
    for (int v : adj[u])
        if (v != p && !removed[v])
            if (subtree[v] > n / 2)
                return find_centroid(v, u, n);
    return u;
}

void decompose(int u, int p) {
    int n = dfs_size(u, -1);
    int c = find_centroid(u, -1, n);
    removed[c] = true;
    // process centroid here
    for (int v : adj[c])
        if (!removed[v])
            decompose(v, c);
}

Complexity: (O$n \log n$)

B. Applications

Distance queries (decompose + store distance to centroid)- Tree problems solvable by divide-and-conquer- Dynamic queries (add/remove nodes)

4. Comparison

Algorithm	Purpose	Query	Preprocess	Complexity
LCA	Ancestor query	(O$\log n$)	(O$n \log n$)	Fast ancestor lookup
HLD	Path queries	(O$\log^2 n$)	(O(n))	Segment tree-friendly
Centroid Decomposition	Divide tree	-	(O$n \log n$)	Balanced splits

5. Interconnections

HLD often uses LCA internally.- Centroid decomposition may use distance to ancestor (via LCA).- All exploit tree structure to achieve sublinear queries.

Tiny Code

LCA(4,5):

dfs(1,1);
printf("%d\n", lca(4,5)); // 2

HLD Path Sum: Build segment tree on pos[u] order, query along chains.

Centroid: decompose(1, -1);

Why It Matters

Tree algorithms show how structure unlocks efficiency. They transform naive traversals into fast, layered, or recursive solutions.

To master data structures, you must learn to “climb” and “cut” trees intelligently.

“Every rooted path hides a logarithm.”

Try It Yourself

Implement binary lifting LCA and test queries.
Add segment tree over HLD and run path sums.
Decompose tree by centroid and count nodes at distance k.
Combine LCA + HLD for path min/max.
Draw centroid tree of a simple graph.

Master these, and trees will stop being “just graphs” , they’ll become tools.

40. Advanced Graph Algorithms and Tricks

By now you’ve seen the big families , traversals, shortest paths, flows, matchings, cuts, and trees. But real-world graphs often bring extra constraints: dynamic updates, multiple sources, layered structures, or special properties (planar, DAG, sparse).

This section gathers powerful advanced graph techniques , tricks and patterns that appear across problems once you’ve mastered the basics.

We’ll explore:

Topological Sorting & DAG DP- Strongly Connected Components (Condensation Graphs)- Articulation Points & Bridges (2-Edge/Vertex Connectivity)- Eulerian & Hamiltonian Paths- Graph Coloring & Bipartiteness Tests- Cycle Detection & Directed Acyclic Reasoning- Small-to-Large Merging, DSU on Tree, Mo’s Algorithm on Trees- Bitmask DP on Graphs- Dynamic Graphs (Incremental/Decremental BFS/DFS)- Special Graphs (Planar, Sparse, Dense) These aren’t just algorithms , they’re patterns that let you attack harder graph problems with insight.

1. Topological Sorting & DAG DP

In a DAG (Directed Acyclic Graph), edges always point forward. This makes it possible to order vertices linearly so all edges go from left to right , a topological order.

Use cases:

Task scheduling- Dependency resolution- DP on DAG (longest/shortest path, counting paths) Algorithm (Kahn’s):

vector<int> topo_sort(int n) {
    vector<int> indeg(n+1), res;
    queue<int> q;
    for (int u = 1; u <= n; u++)
        for (int v : adj[u]) indeg[v]++;
    for (int u = 1; u <= n; u++)
        if (!indeg[u]) q.push(u);
    while (!q.empty()) {
        int u = q.front(); q.pop();
        res.push_back(u);
        for (int v : adj[u])
            if (--indeg[v] == 0) q.push(v);
    }
    return res;
}

DAG DP:

vector<int> dp(n+1, 0);
for (int u : topo_order)
    for (int v : adj[u])
        dp[v] = max(dp[v], dp[u] + weight(u,v));

Complexity: O(V + E)

2. Strongly Connected Components (Condensation)

In directed graphs, vertices may form SCCs (mutually reachable components). Condensing SCCs yields a DAG, often easier to reason about.

Use:

Component compression- Meta-graph reasoning- Cycle condensation Tarjan’s Algorithm: DFS with low-link values, single pass.

Kosaraju’s Algorithm: Two passes , DFS on graph and reversed graph.

Complexity: O(V + E)

Once SCCs are built, you can run DP or topological sort on the condensed DAG.

3. Articulation Points & Bridges

Find critical vertices/edges whose removal disconnects the graph.

Articulation point: vertex whose removal increases component count- Bridge: edge whose removal increases component count Algorithm: Tarjan’s DFS Track discovery time tin[u] and lowest reachable ancestor low[u].

void dfs(int u, int p) {
    tin[u] = low[u] = ++timer;
    for (int v : adj[u]) {
        if (v == p) continue;
        if (!tin[v]) {
            dfs(v, u);
            low[u] = min(low[u], low[v]);
            if (low[v] > tin[u]) bridge(u, v);
            if (low[v] >= tin[u] && p != -1) cut_vertex(u);
        } else low[u] = min(low[u], tin[v]);
    }
}

Applications:

Network reliability- Biconnected components- 2-edge/vertex connectivity tests

4. Eulerian & Hamiltonian Paths

Eulerian Path: visits every edge exactly once
- Exists if graph is connected and 0 or 2 vertices have odd degree- Hamiltonian Path: visits every vertex exactly once (NP-hard) Euler Tour Construction: Hierholzer’s algorithm (O(E))

Applications:

Route reconstruction (e.g., word chains)- Postman problems

5. Graph Coloring & Bipartiteness

Bipartite Check: DFS/ BFS alternating colors Fails if odd cycle found.

bool bipartite(int n) {
    vector<int> color(n+1, -1);
    for (int i = 1; i <= n; i++) if (color[i] == -1) {
        queue<int> q; q.push(i); color[i] = 0;
        while (!q.empty()) {
            int u = q.front(); q.pop();
            for (int v : adj[u]) {
                if (color[v] == -1)
                    color[v] = color[u] ^ 1, q.push(v);
                else if (color[v] == color[u])
                    return false;
            }
        }
    }
    return true;
}

Applications:

2-SAT reduction- Planar graph coloring- Conflict-free assignment

6. Cycle Detection

DFS + recursion stack for directed graphs- Union-Find for undirected graphs Used to test acyclicity, detect back edges, or find cycles for rollback or consistency checks.

7. DSU on Tree (Small-to-Large Merging)

For queries like “count distinct colors in subtree,” merge results from smaller to larger subtrees to maintain O(n log n).

Pattern:

DFS through children
Keep large child’s data structure
Merge small child’s data in

Applications:

Offline subtree queries- Heavy subproblem caching

8. Mo’s Algorithm on Trees

Offline algorithm to answer path queries efficiently:

Convert path queries to ranges via Euler Tour- Use Mo’s ordering to process in O((N + Q)√N) Useful when online updates aren’t required.

9. Bitmask DP on Graphs

For small graphs (n ≤ 20): State = subset of vertices e.g., Traveling Salesman Problem (TSP)

dp[mask][u] = min cost to visit mask, end at u

Transition:

dp[mask | (1<<v)][v] = min(dp[mask][u] + cost[u][v])

Complexity: O(n² 2ⁿ)

10. Dynamic Graphs

Graphs that change:

Incremental BFS: maintain distances as edges added- Decremental connectivity: union-find rollback or dynamic trees Used in online queries, evolving networks, or real-time systems.

11. Special Graph Classes

Planar graphs: ≤ 3V-6E; use face counting- Sparse graphs: adjacency lists best- Dense graphs: adjacency matrix / bitset Optimizations often hinge on density.

Tiny Code

Topological Order:

auto order = topo_sort(n);
for (int u : order) printf("%d ", u);

Bridge Check: if (low[v] > tin[u]) edge is a bridge.

Euler Path Check: Count odd-degree nodes == 0 or 2.

Why It Matters

These advanced techniques complete your toolkit. They’re not isolated , they combine to solve real-world puzzles: dependency graphs, robust networks, optimized paths, compressed states.

They teach a mindset:

“Graphs are not obstacles , they’re shapes of possibility.”

Try It Yourself

Implement topological sort and DAG DP.
Find SCCs and build condensation graph.
Detect articulation points and bridges.
Check Euler path conditions on random graphs.
Try DSU on tree for subtree statistics.
Solve TSP via bitmask DP for n ≤ 15.

Once you can mix and match these tools, you’re no longer just navigating graphs , you’re shaping them.

Chapter 5. Dynamic Programming

41. DP Basics and State Transitions

Dynamic Programming (DP) is one of the most powerful ideas in algorithm design. It’s about breaking a big problem into smaller overlapping subproblems, solving each once, and reusing their answers.

When brute force explodes exponentially, DP brings it back under control. This section introduces the mindset, the mechanics, and the math behind DP.

1. The Core Idea

Many problems have two key properties:

Overlapping subproblems: The same smaller computations repeat many times.
Optimal substructure: The optimal solution to a problem can be built from optimal solutions to its subproblems.

DP solves each subproblem once, stores the result, and reuses it. This saves exponential time , often reducing ( O$2^n$ ) to ( O$n^2$ ) or ( O(n) ).

2. The Recipe

When approaching a DP problem, follow this pattern:

Define the state. Decide what subproblems you’ll solve. Example: dp[i] = best answer for first i elements.
Write the recurrence. Express each state in terms of smaller ones. Example: dp[i] = dp[i-1] + cost(i)
Set the base cases. Where does the recursion start? Example: dp[0] = 0
Decide the order. Bottom-up (iterative) or top-down (recursive with memoization).
Return the final answer. Often dp[n] or max(dp[i]).

3. Example: Fibonacci Numbers

Let’s begin with a classic , the nth Fibonacci number ( F(n) = F(n-1) + F(n-2) ).

Recursive (slow):

int fib(int n) {
    if (n <= 1) return n;
    return fib(n - 1) + fib(n - 2);
}

This recomputes the same values over and over , exponential time.

Top-Down DP (Memoization):

int dp[MAX];
int fib(int n) {
    if (n <= 1) return n;
    if (dp[n] != -1) return dp[n];
    return dp[n] = fib(n-1) + fib(n-2);
}

Bottom-Up DP (Tabulation):

int fib(int n) {
    int dp[n+1];
    dp[0] = 0; dp[1] = 1;
    for (int i = 2; i <= n; i++)
        dp[i] = dp[i-1] + dp[i-2];
    return dp[n];
}

Space Optimized:

int fib(int n) {
    int a = 0, b = 1, c;
    for (int i = 2; i <= n; i++) {
        c = a + b;
        a = b;
        b = c;
    }
    return b;
}

4. States, Transitions, and Dependencies

A DP table is a map from states to answers. Each state depends on others via a transition function.

Think of it like a graph , each edge represents a recurrence relation.

Example:

State: dp[i] = number of ways to reach step i- Transition: dp[i] = dp[i-1] + dp[i-2] (like stairs)- Base: dp[0] = 1

5. Common DP Patterns

1D Linear DP
- Problems like Fibonacci, climbing stairs, LIS.
2D DP
- Grids, sequences, or combinations (LCS, knapsack).
Bitmask DP
- Subsets, TSP, combinatorial optimization.
DP on Trees
- Subtree computations (sum, diameter).
Digit DP
- Counting numbers with properties in a range.
Segment DP
- Matrix chain multiplication, interval merges.

6. Top-Down vs Bottom-Up

Approach	Method	Pros	Cons
Top-Down	Recursion + Memoization	Easy to write, intuitive	Stack overhead, needs memo
Bottom-Up	Iteration	Fast, space-optimizable	Harder to derive order

When dependencies are simple and acyclic, bottom-up shines. When they’re complex, top-down is easier.

7. Example 2: Climbing Stairs

You can climb 1 or 2 steps at a time. How many distinct ways to reach step ( n )?

State: dp[i] = ways to reach step i Transition: dp[i] = dp[i-1] + dp[i-2] Base: dp[0] = 1, dp[1] = 1

Code:

int climb(int n) {
    int dp[n+1];
    dp[0] = dp[1] = 1;
    for (int i = 2; i <= n; i++)
        dp[i] = dp[i-1] + dp[i-2];
    return dp[n];
}

8. Debugging DP

To debug DP:

Print intermediate states.- Visualize table (especially 2D).- Check base cases.- Trace one small example by hand.

9. Complexity

Most DP algorithms are linear or quadratic in number of states:

Time = (#states) × (work per transition)- Space = (#states) Example: Fibonacci: ( O(n) ) time, ( O(1) ) space Knapsack: ( O$n \times W$ ) LCS: ( O$n \times m$ )

Tiny Code

Fibonacci (tabulated):

int dp[100];
dp[0] = 0; dp[1] = 1;
for (int i = 2; i <= n; i++)
    dp[i] = dp[i-1] + dp[i-2];
printf("%d", dp[n]);

Why It Matters

DP is the art of remembering. It transforms recursion into iteration, chaos into order.

From optimization to counting, from paths to sequences , once you see substructure, DP becomes your hammer.

“Every repetition hides a recurrence.”

Try It Yourself

Write top-down and bottom-up Fibonacci.
Count ways to climb stairs with steps {1,2,3}.
Compute number of paths in an n×m grid.
Try to spot state, recurrence, base in each problem.
Draw dependency graphs to visualize transitions.

DP isn’t a formula , it’s a mindset: break problems into parts, remember the past, and build from it.

42. Classic Problems (Knapsack, Subset Sum, Coin Change)

Now that you know what dynamic programming is, let’s dive into the classic trio , problems that every programmer meets early on:

Knapsack (maximize value under weight constraint)- Subset Sum (can we form a given sum?)- Coin Change (how many ways or fewest coins to reach a total) These are the training grounds of DP: each shows how to define states, transitions, and base cases clearly.

1. 0/1 Knapsack Problem

Problem: You have n items, each with weight w[i] and value v[i]. A knapsack with capacity W. Pick items (each at most once) to maximize total value, without exceeding weight.

A. State

dp[i][w] = max value using first i items with capacity w

B. Recurrence

For item i:

If we don’t take it: dp[i-1][w]- If we take it (if w[i] ≤ w): dp[i-1][w - w[i]] + v[i] So, \[ dp[i][w] = \max(dp[i-1][w], dp[i-1][w - w[i]] + v[i]) \]

C. Base Case

dp[0][w] = 0 for all w (no items = no value)

D. Implementation

int knapsack(int n, int W, int w[], int v[]) {
    int dp[n+1][W+1];
    for (int i = 0; i <= n; i++) {
        for (int j = 0; j <= W; j++) {
            if (i == 0 || j == 0) dp[i][j] = 0;
            else if (w[i-1] <= j)
                dp[i][j] = max(dp[i-1][j], dp[i-1][j - w[i-1]] + v[i-1]);
            else
                dp[i][j] = dp[i-1][j];
        }
    }
    return dp[n][W];
}

Complexity: Time: (O(nW)) Space: (O(nW)) (can be optimized to 1D (O(W)))

E. Space Optimization (1D DP)

int dp[W+1] = {0};
for (int i = 0; i < n; i++)
    for (int w = W; w >= weight[i]; w--)
        dp[w] = max(dp[w], dp[w - weight[i]] + value[i]);

F. Example

Items:

w = [2, 3, 4, 5]
v = [3, 4, 5, 6]
W = 5

Best: take items 1 + 2 → value 7

2. Subset Sum

Problem: Given a set S of integers, can we pick some to sum to target?

A. State

dp[i][sum] = true if we can form sum sum using first i elements.

B. Recurrence

Don’t take: dp[i-1][sum]- Take (if a[i] ≤ sum): dp[i-1][sum - a[i]] So, \[ dp[i][sum] = dp[i-1][sum] ; || ; dp[i-1][sum - a[i]] \]

C. Base Case

dp[0][0] = true (sum 0 possible with no elements) dp[0][sum] = false for sum > 0

D. Implementation

bool subset_sum(int a[], int n, int target) {
    bool dp[n+1][target+1];
    for (int i = 0; i <= n; i++) dp[i][0] = true;
    for (int j = 1; j <= target; j++) dp[0][j] = false;

    for (int i = 1; i <= n; i++) {
        for (int j = 1; j <= target; j++) {
            if (a[i-1] > j) dp[i][j] = dp[i-1][j];
            else dp[i][j] = dp[i-1][j] || dp[i-1][j - a[i-1]];
        }
    }
    return dp[n][target];
}

Complexity: Time: (O$n \cdot target$)

E. Example

S = [3, 34, 4, 12, 5, 2], target = 9 Yes → 4 + 5

3. Coin Change

Two variants:

(a) Count Ways (Unbounded Coins)

“How many ways to make total T with coins c[]?”

Order doesn’t matter.

State: dp[i][t] = ways using first i coins for total t

Recurrence:

Skip coin: dp[i-1][t]- Take coin (unlimited): dp[i][t - c[i]] \[ dp[i][t] = dp[i-1][t] + dp[i][t - c[i]] \]

Base: dp[0][0] = 1

1D Simplified:

int dp[T+1] = {0};
dp[0] = 1;
for (int coin : coins)
    for (int t = coin; t <= T; t++)
        dp[t] += dp[t - coin];

(b) Min Coins (Fewest Coins to Reach Total)

State: dp[t] = min coins to reach t

Recurrence: \[ dp[t] = \min_{c_i \le t}(dp[t - c_i] + 1) \]

Base: dp[0] = 0, rest = INF

int dp[T+1];
fill(dp, dp+T+1, INF);
dp[0] = 0;
for (int t = 1; t <= T; t++)
    for (int c : coins)
        if (t >= c) dp[t] = min(dp[t], dp[t - c] + 1);

Example

Coins = [1,2,5], Total = 5

Ways: 4 (5; 2+2+1; 2+1+1+1; 1+1+1+1+1)- Min Coins: 1 (5)

4. Summary

Problem	Type	State	Transition	Complexity
0/1 Knapsack	Max value	dp[i][w]	max(take, skip)	O(nW)
Subset Sum	Feasibility	dp[i][sum]	OR of include/exclude	O(n * sum)
Coin Change (ways)	Counting	dp[t]	dp[t] + dp[t - coin]	O(nT)
Coin Change (min)	Optimization	dp[t]	min(dp[t - coin] + 1)	O(nT)

Tiny Code

Min Coin Change (1D):

int dp[T+1];
fill(dp, dp+T+1, INF);
dp[0] = 0;
for (int c : coins)
    for (int t = c; t <= T; t++)
        dp[t] = min(dp[t], dp[t - c] + 1);
printf("%d\n", dp[T]);

Why It Matters

These three are archetypes:

Knapsack: optimize under constraint- Subset Sum: choose feasibility- Coin Change: count or minimize Once you master them, you can spot their patterns in harder problems , from resource allocation to pathfinding.

“Every constraint hides a choice; every choice hides a state.”

Try It Yourself

Implement 0/1 Knapsack (2D and 1D).
Solve Subset Sum for target 30 with random list.
Count coin combinations for amount 10.
Compare “min coins” vs “ways to form.”
Write down state-transition diagram for each.

These three form your DP foundation , the grammar for building more complex algorithms.

43. Sequence Problems (LIS, LCS, Edit Distance)

Sequence problems form the heart of dynamic programming. They appear in strings, arrays, genomes, text comparison, and version control. Their power comes from comparing prefixes , building large answers from aligned smaller ones.

This section explores three cornerstones:

LIS (Longest Increasing Subsequence)- LCS (Longest Common Subsequence)- Edit Distance (Levenshtein Distance) Each teaches a new way to think about subproblems, transitions, and structure.

1. Longest Increasing Subsequence (LIS)

Problem: Given an array, find the length of the longest subsequence that is strictly increasing.

A subsequence isn’t necessarily contiguous , you can skip elements.

Example: [10, 9, 2, 5, 3, 7, 101, 18] → LIS is [2, 3, 7, 18] → length 4

A. State

dp[i] = length of LIS ending at index i

B. Recurrence

\[ dp[i] = 1 + \max_{j < i \land a[j] < a[i]} dp[j] \]

If no smaller a[j], then dp[i] = 1.

C. Base

dp[i] = 1 for all i (each element alone is an LIS)

D. Implementation

int lis(int a[], int n) {
    int dp[n], best = 0;
    for (int i = 0; i < n; i++) {
        dp[i] = 1;
        for (int j = 0; j < i; j++)
            if (a[j] < a[i])
                dp[i] = max(dp[i], dp[j] + 1);
        best = max(best, dp[i]);
    }
    return best;
}

Complexity: (O$n^2$)

E. Binary Search Optimization

Use a tail array:

tail[len] = min possible ending value of LIS of length len For each x:
Replace tail[idx] via lower_bound

int lis_fast(vector<int>& a) {
    vector<int> tail;
    for (int x : a) {
        auto it = lower_bound(tail.begin(), tail.end(), x);
        if (it == tail.end()) tail.push_back(x);
        else *it = x;
    }
    return tail.size();
}

Complexity: (O$n \log n$)

2. Longest Common Subsequence (LCS)

Problem: Given two strings, find the longest subsequence present in both.

Example: s1 = "ABCBDAB", s2 = "BDCABA" LCS = “BCBA” → length 4

A. State

dp[i][j] = LCS length between s1[0..i-1] and s2[0..j-1]

B. Recurrence

\[ dp[i][j] = \begin{cases} dp[i-1][j-1] + 1, & \text{if } s_1[i-1] = s_2[j-1], \\ \max(dp[i-1][j],\, dp[i][j-1]), & \text{otherwise.} \end{cases} \]

C. Base

dp[0][*] = dp[*][0] = 0 (empty string)

D. Implementation

int lcs(string a, string b) {
    int n = a.size(), m = b.size();
    int dp[n+1][m+1];
    for (int i = 0; i <= n; i++)
        for (int j = 0; j <= m; j++)
            if (i == 0 || j == 0) dp[i][j] = 0;
            else if (a[i-1] == b[j-1])
                dp[i][j] = dp[i-1][j-1] + 1;
            else
                dp[i][j] = max(dp[i-1][j], dp[i][j-1]);
    return dp[n][m];
}

Complexity: (O(nm))

E. Reconstruct LCS

Trace back from dp[n][m]:

If chars equal → take it and move diagonally- Else move toward larger neighbor

F. Example

a = “AGGTAB”, b = “GXTXAYB” LCS = “GTAB” → 4

3. Edit Distance (Levenshtein Distance)

Problem: Minimum operations (insert, delete, replace) to convert string a → b.

Example: kitten → sitting = 3 (replace k→s, insert i, insert g)

A. State

dp[i][j] = min edits to convert a[0..i-1] → b[0..j-1]

B. Recurrence

If a[i-1] == b[j-1]: \[ dp[i][j] = dp[i-1][j-1] \]

Else: \[ dp[i][j] = 1 + \min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]) \] (Delete, Insert, Replace)

C. Base

dp[0][j] = j (insert all)- dp[i][0] = i (delete all)

D. Implementation

int edit_distance(string a, string b) {
    int n = a.size(), m = b.size();
    int dp[n+1][m+1];
    for (int i = 0; i <= n; i++)
        for (int j = 0; j <= m; j++) {
            if (i == 0) dp[i][j] = j;
            else if (j == 0) dp[i][j] = i;
            else if (a[i-1] == b[j-1])
                dp[i][j] = dp[i-1][j-1];
            else
                dp[i][j] = 1 + min({dp[i-1][j], dp[i][j-1], dp[i-1][j-1]});
        }
    return dp[n][m];
}

Complexity: (O(nm))

E. Example

a = “horse”, b = “ros”

replace h→r, delete r, delete e → 3

4. Summary

Problem	Type	State	Transition	Complexity
LIS	Single seq	dp[i]	1 + max(dp[j])	O(n²) / O(n log n)
LCS	Two seqs	dp[i][j]	if match +1 else max	O(nm)
Edit Distance	Two seqs	dp[i][j]	if match 0 else 1 + min	O(nm)

5. Common Insights

LIS builds upward , from smaller sequences.- LCS aligns two sequences , compare prefixes.- Edit Distance quantifies difference , minimal edits. They’re templates for bioinformatics, text diffing, version control, and more.

Tiny Code

LCS:

if (a[i-1] == b[j-1])
    dp[i][j] = dp[i-1][j-1] + 1;
else
    dp[i][j] = max(dp[i-1][j], dp[i][j-1]);

Why It Matters

Sequence DPs teach you how to compare progressions , how structure and similarity evolve over time.

They transform vague “compare these” tasks into crisp recurrence relations.

“To align is to understand.”

Try It Yourself

Implement LIS (O(n²) and O(n log n))
Find LCS of two given strings
Compute edit distance between “intention” and “execution”
Modify LCS to print one valid subsequence
Try to unify LCS and Edit Distance in a single table

Master these, and you can handle any DP on sequences , the DNA of algorithmic thinking.

44. Matrix and Chain Problems

Dynamic programming shines when a problem involves choices over intervals , which order, which split, which parenthesis. This chapter explores a class of problems built on chains and matrices, where order matters and substructure is defined by intervals.

We’ll study:

Matrix Chain Multiplication (MCM) - optimal parenthesization- Polygon Triangulation - divide shape into minimal-cost triangles- Optimal BST / Merge Patterns - weighted merging decisions These problems teach interval DP, where each state represents a segment ([i, j]).

1. Matrix Chain Multiplication (MCM)

Problem: Given matrices $A_1, A_2, ..., A_n$, find the parenthesization that minimizes total scalar multiplications.

Matrix $A_i$ has dimensions $p[i-1] \times p[i]$. We can multiply $A_i \cdot A_{i+1}$ only if inner dimensions match.

Goal: Minimize operations: \[ \text{cost}(i, j) = \min_k \big(\text{cost}(i, k) + \text{cost}(k+1, j) + p[i-1] \cdot p[k] \cdot p[j]\big) \]

A. State

dp[i][j] = min multiplications to compute $A_i...A_j$

B. Base

dp[i][i] = 0 (single matrix needs no multiplication)

C. Recurrence

\[ dp[i][j] = \min_{i \le k < j} { dp[i][k] + dp[k+1][j] + p[i-1] \times p[k] \times p[j] } \]

D. Implementation

int matrix_chain(int p[], int n) {
    int dp[n][n];
    for (int i = 1; i < n; i++) dp[i][i] = 0;

    for (int len = 2; len < n; len++) {
        for (int i = 1; i + len - 1 < n; i++) {
            int j = i + len - 1;
            dp[i][j] = INT_MAX;
            for (int k = i; k < j; k++)
                dp[i][j] = min(dp[i][j],
                    dp[i][k] + dp[k+1][j] + p[i-1]*p[k]*p[j]);
        }
    }
    return dp[1][n-1];
}

Complexity: (O$n^3$) time, (O$n^2$) space

E. Example

p = [10, 20, 30, 40, 30] Optimal order: ((A1A2)A3)A4 → cost 30000

2. Polygon Triangulation

Given a convex polygon with n vertices, connect non-intersecting diagonals to minimize total cost. Cost of a triangle = perimeter or product of side weights.

This is the same structure as MCM , divide polygon by diagonals.

A. State

dp[i][j] = min triangulation cost for polygon vertices from i to j.

B. Recurrence

\[ dp[i][j] = \min_{i < k < j} (dp[i][k] + dp[k][j] + cost(i, j, k)) \]

Base: dp[i][i+1] = 0 (fewer than 3 points)

C. Implementation

double polygon_triangulation(vector<Point> &p) {
    int n = p.size();
    double dp[n][n];
    for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) dp[i][j] = 0;
    for (int len = 2; len < n; len++) {
        for (int i = 0; i + len < n; i++) {
            int j = i + len;
            dp[i][j] = 1e18;
            for (int k = i+1; k < j; k++)
                dp[i][j] = min(dp[i][j],
                    dp[i][k] + dp[k][j] + dist(p[i],p[k])+dist(p[k],p[j])+dist(p[j],p[i]));
        }
    }
    return dp[0][n-1];
}

Complexity: (O$n^3$)

3. Optimal Binary Search Tree (OBST)

Given sorted keys $k_1 < k_2 < \dots < k_n$ with search frequencies ( f[i] ), construct a BST with minimal expected search cost.

The more frequently accessed nodes should be nearer the root.

A. State

dp[i][j] = min cost to build BST from keys i..j sum[i][j] = sum of frequencies from i to j (precomputed)

B. Recurrence

\[ dp[i][j] = \min_{k=i}^{j} (dp[i][k-1] + dp[k+1][j] + sum[i][j]) \]

Each root adds one to depth of its subtrees → extra cost = sum[i][j]

C. Implementation

int optimal_bst(int freq[], int n) {
    int dp[n][n], sum[n][n];
    for (int i = 0; i < n; i++) {
        dp[i][i] = freq[i];
        sum[i][i] = freq[i];
        for (int j = i+1; j < n; j++)
            sum[i][j] = sum[i][j-1] + freq[j];
    }
    for (int len = 2; len <= n; len++) {
        for (int i = 0; i+len-1 < n; i++) {
            int j = i + len - 1;
            dp[i][j] = INT_MAX;
            for (int r = i; r <= j; r++) {
                int left = (r > i) ? dp[i][r-1] : 0;
                int right = (r < j) ? dp[r+1][j] : 0;
                dp[i][j] = min(dp[i][j], left + right + sum[i][j]);
            }
        }
    }
    return dp[0][n-1];
}

Complexity: (O$n^3$)

4. Merge Pattern Problems

Many problems , merging files, joining ropes, Huffman coding , involve repeatedly combining elements with minimal total cost.

All follow this template: \[ dp[i][j] = \min_{k} (dp[i][k] + dp[k+1][j] + \text{merge cost}) \]

Same structure as MCM.

5. Key Pattern: Interval DP

State: dp[i][j] = best answer for subarray [i..j] Transition: Try all splits k between i and j

Template:

for (len = 2; len <= n; len++)
 for (i = 0; i + len - 1 < n; i++) {
    j = i + len - 1;
    dp[i][j] = INF;
    for (k = i; k < j; k++)
       dp[i][j] = min(dp[i][j], dp[i][k] + dp[k+1][j] + cost(i,j,k));
 }

6. Summary

Problem	State	Recurrence	Complexity
MCM	dp[i][j]	min(dp[i][k]+dp[k+1][j]+p[i-1]p[k]p[j])	O(n³)
Polygon Triangulation	dp[i][j]	min(dp[i][k]+dp[k][j]+cost)	O(n³)
OBST	dp[i][j]	min(dp[i][k-1]+dp[k+1][j]+sum[i][j])	O(n³)
Merge Problems	dp[i][j]	min(dp[i][k]+dp[k+1][j]+merge cost)	O(n³)

Tiny Code

Matrix Chain (Compact):

for (len = 2; len < n; len++)
  for (i = 1; i + len - 1 < n; i++) {
    j = i + len - 1; dp[i][j] = INF;
    for (k = i; k < j; k++)
      dp[i][j] = min(dp[i][j], dp[i][k] + dp[k+1][j] + p[i-1]*p[k]*p[j]);
  }

Why It Matters

These problems are DP in 2D , reasoning over intervals and splits. They train your ability to “cut the problem” at every possible point.

“Between every start and end lies a choice of where to divide.”

Try It Yourself

Implement MCM and print parenthesization.
Solve polygon triangulation with edge weights.
Build OBST for frequencies [34, 8, 50].
Visualize DP table diagonally.
Generalize to merging k segments at a time.

Master these, and you’ll see interval DP patterns hiding in parsing, merging, and even AI planning.

45. Bitmask DP and Traveling Salesman

Some dynamic programming problems require you to track which items have been used, or which subset of elements is active at a given point. This is where Bitmask DP shines. It encodes subsets as binary masks, allowing you to represent state space efficiently.

This technique is a must-know for:

Traveling Salesman Problem (TSP)- Subset covering / visiting problems- Permutations and combinations of sets- Game states and toggles

1. The Idea of Bitmask DP

A bitmask is an integer whose binary representation encodes a subset.

For ( n ) elements:

There are $2^n$ subsets.- A subset is represented by a mask from 0 to (1 << n) - 1. Example for n = 4:

Subset	Mask (binary)	Mask (decimal)
∅	0000	0
{0}	0001	1
{1}	0010	2
{0,1,3}	1011	11

We can check membership:

mask & (1 << i) → whether element i is in subset We can add elements:
mask | (1 << i) → add element i We can remove elements:
mask & ~(1 << i) → remove element i

2. Example: Traveling Salesman Problem (TSP)

Problem: Given n cities and cost matrix cost[i][j], find the minimum cost Hamiltonian cycle visiting all cities exactly once and returning to start.

A. State

dp[mask][i] = minimum cost to reach city i having visited subset mask

mask → set of visited cities- i → current city

B. Base Case

dp[1<<0][0] = 0 (start at city 0, only 0 visited)

C. Transition

For each subset mask and city i in mask, try moving from i to j not in mask:

\[ dp[mask \cup (1 << j)][j] = \min \big(dp[mask \cup (1 << j)][j], dp[mask][i] + cost[i][j]\big) \]

D. Implementation

int tsp(int n, int cost[20][20]) {
    int N = 1 << n;
    const int INF = 1e9;
    int dp[N][n];
    for (int m = 0; m < N; m++)
        for (int i = 0; i < n; i++)
            dp[m][i] = INF;

    dp[1][0] = 0; // start at city 0

    for (int mask = 1; mask < N; mask++) {
        for (int i = 0; i < n; i++) {
            if (!(mask & (1 << i))) continue;
            for (int j = 0; j < n; j++) {
                if (mask & (1 << j)) continue;
                int next = mask | (1 << j);
                dp[next][j] = min(dp[next][j], dp[mask][i] + cost[i][j]);
            }
        }
    }

    int ans = INF;
    for (int i = 1; i < n; i++)
        ans = min(ans, dp[N-1][i] + cost[i][0]);
    return ans;
}

Complexity:

States: ( O$n \cdot 2^n$ )- Transitions: ( O(n) )- Total: ( O$n^2 \cdot 2^n$ )

E. Example

n = 4
cost = {
 {0, 10, 15, 20},
 {10, 0, 35, 25},
 {15, 35, 0, 30},
 {20, 25, 30, 0}
}

Optimal path: 0 → 1 → 3 → 2 → 0 Cost = 80

3. Other Common Bitmask DP Patterns

Subset Sum / Partition dp[mask] = true if subset represented by mask satisfies property
Counting Set Bits __builtin_popcount(mask) gives number of elements in subset.
Iterating Over Submasks

for (int sub = mask; sub; sub = (sub-1) & mask)
    // handle subset sub

Assigning Tasks (Assignment Problem)

Each mask represents set of workers assigned.- State: dp[mask] = min cost for assigned tasks.

for (mask) for (task)
 if (!(mask & (1 << task)))
   dp[mask | (1 << task)] = min(dp[mask | (1 << task)],
        dp[mask] + cost[__builtin_popcount(mask)][task]);

4. Memory Tricks

If only previous masks needed, use rolling arrays:

dp[next][j] = ...
swap(dp, next_dp)

Compress dimensions: (O$2^n$) memory for small n

5. Summary

Problem	State	Transition	Complexity
TSP	dp[mask][i]	min(dp[mask][i] + cost[i][j])	O(n²·2ⁿ)
Assignment	dp[mask]	add one new element	O(n²·2ⁿ)
Subset Sum	dp[mask]	union of valid subsets	O(2ⁿ·n)

Tiny Code

Core Transition:

for (mask)
  for (i)
    if (mask & (1<<i))
      for (j)
        if (!(mask & (1<<j)))
          dp[mask|(1<<j)][j] = min(dp[mask|(1<<j)][j], dp[mask][i] + cost[i][j]);

Why It Matters

Bitmask DP is how you enumerate subsets efficiently. It bridges combinatorics and optimization, solving exponential problems with manageable constants.

“Every subset is a story, and bits are its alphabet.”

Try It Yourself

Solve TSP with 4 cities (hand-trace the table).
Implement Assignment Problem using bitmask DP.
Count subsets with even sum.
Use bitmask DP to find maximum compatible set of tasks.
Explore how to optimize memory with bit tricks.

Bitmask DP unlocks the world of subset-based reasoning , the foundation of combinatorial optimization.

46. Digit DP and SOS DP

In some problems, you don’t iterate over indices or subsets , you iterate over digits or masks to count or optimize over structured states. Two major flavors stand out:

Digit DP - counting numbers with certain properties (e.g. digit sum, constraints)- SOS DP (Sum Over Subsets) - efficiently computing functions over all subsets These are essential techniques when brute force would require enumerating every number or subset, which quickly becomes impossible.

1. Digit DP (Counting with Constraints)

Digit DP is used to count or sum over all numbers ≤ N that satisfy a condition, such as:

The sum of digits equals a target.- The number doesn’t contain a forbidden digit.- The number has certain parity or divisibility. Instead of iterating over all numbers (up to 10¹⁸!), we iterate digit-by-digit.

A. State Design

Typical DP state:

dp[pos][sum][tight][leading_zero]

pos: current digit index (from most significant to least)- sum: property tracker (e.g. sum of digits, remainder)- tight: whether we’re still restricted by N’s prefix- leading_zero: whether we’ve started placing nonzero digits

B. Transition

At each digit position, we choose a digit d:

limit = tight ? (digit at pos in N) : 9
for (d = 0; d <= limit; d++) {
    new_tight = tight && (d == limit)
    new_sum = sum + d
    // or new_mod = (mod * 10 + d) % M
}

Transition accumulates results across valid choices.

C. Base Case

When pos == len(N) (end of digits):

Return 1 if condition holds (e.g. sum == target), else 0

D. Example: Count numbers ≤ N with digit sum = S

long long dp[20][200][2];

long long solve(string s, int pos, int sum, bool tight) {
    if (pos == s.size()) return sum == 0;
    if (sum < 0) return 0;
    if (dp[pos][sum][tight] != -1) return dp[pos][sum][tight];

    int limit = tight ? (s[pos] - '0') : 9;
    long long res = 0;
    for (int d = 0; d <= limit; d++)
        res += solve(s, pos+1, sum-d, tight && (d==limit));

    return dp[pos][sum][tight] = res;
}

Usage:

string N = "12345";
int S = 9;
memset(dp, -1, sizeof dp);
cout << solve(N, 0, S, 1);

Complexity: O(number of digits × sum × 2) → typically O(20 × 200 × 2)

E. Example Variants

Count numbers divisible by 3 → track remainder: new_rem = (rem*10 + d) % 3
Count numbers without consecutive equal digits → add last_digit to state.
Count beautiful numbers (like palindromes, no repeated digits) → track bitmask of used digits.

F. Summary

Problem	State	Transition	Complexity
Sum of digits = S	dp[pos][sum][tight]	sum-d	O(len·S)
Divisible by k	dp[pos][rem][tight]	(rem*10+d)%k	O(len·k)
No repeated digits	dp[pos][mask][tight]	mask	O(len·2¹⁰)

Tiny Code

for (int d = 0; d <= limit; d++)
    res += solve(pos+1, sum-d, tight && (d==limit));

2. SOS DP (Sum Over Subsets)

When dealing with functions on subsets, we sometimes need to compute:

\[ f(S) = \sum_{T \subseteq S} g(T) \]

Naively O(3ⁿ). SOS DP reduces it to O(n·2ⁿ).

A. Setup

Let f[mask] = g[mask] initially. For each bit i:

for (mask = 0; mask < (1<<n); mask++)
    if (mask & (1<<i))
        f[mask] += f[mask^(1<<i)];

After this, f[mask] = sum of g[sub] for all sub ⊆ mask.

B. Example

Given array a[mask], compute sum[mask] = sum_{sub ⊆ mask} a[sub]

int n = 3;
int N = 1 << n;
int f[N], a[N];
// initialize a[]
for (int mask = 0; mask < N; mask++) f[mask] = a[mask];
for (int i = 0; i < n; i++)
  for (int mask = 0; mask < N; mask++)
    if (mask & (1 << i))
        f[mask] += f[mask ^ (1 << i)];

C. Why It Works

Each iteration adds contributions from subsets differing by one bit. By processing all bits, every subset’s contribution propagates upward.

D. Variants

Sum over supersets: reverse direction.- Max instead of sum: replace += with max=.- XOR convolution: combine values under XOR subset relation.

E. Applications

Inclusion-exclusion acceleration- Precomputing subset statistics- DP over masks with subset transitions

F. Complexity

Problem	Naive	SOS DP
Subset sum	O(3ⁿ)	O(n·2ⁿ)
Superset sum	O(3ⁿ)	O(n·2ⁿ)

Why It Matters

Digit DP teaches counting under constraints , thinking digit by digit. SOS DP teaches subset propagation , spreading information efficiently.

Together, they show how to tame exponential state spaces with structure.

“When the search space explodes, symmetry and structure are your compass.”

Try It Yourself

Count numbers ≤ 10⁹ whose digit sum = 10.
Count numbers ≤ 10⁶ without repeated digits.
Compute f[mask] = sum_{sub⊆mask} a[sub] for n=4.
Use SOS DP to find how many subsets of bits have even sum.
Modify Digit DP to handle leading zeros explicitly.

Master these, and you can handle structured exponential problems with elegance and speed.

47. DP Optimizations (Divide & Conquer, Convex Hull Trick, Knuth)

Dynamic Programming often starts with a simple recurrence, but naïve implementations can be too slow (e.g., ( O$n^2$ ) or worse). When the recurrence has special structure , such as monotonicity or convexity , we can exploit it to reduce time complexity drastically.

This chapter introduces three powerful optimization families:

Divide and Conquer DP
Convex Hull Trick (CHT)
Knuth Optimization

Each one is based on discovering order or geometry hidden inside transitions.

1. Divide and Conquer Optimization

If you have a recurrence like: \[ dp[i] = \min_{k < i} { dp[k] + C(k, i) } \]

and the optimal k for dp[i] ≤ optimal k for dp[i+1], you can use divide & conquer to compute dp in ( O$n \log n$ ) or ( O$n \log^2 n$ ).

This property is called monotonicity of argmin.

A. Conditions

Let ( C(k, i) ) be the cost to transition from ( k ) to ( i ). Divide and conquer optimization applies if:

\[ opt(i) \le opt(i+1) \]

and ( C ) satisfies quadrangle inequality (or similar convex structure).

B. Template

void compute(int l, int r, int optL, int optR) {
    if (l > r) return;
    int mid = (l + r) / 2;
    pair<long long,int> best = {INF, -1};
    for (int k = optL; k <= min(mid, optR); k++) {
        long long val = dp_prev[k] + cost(k, mid);
        if (val < best.first) best = {val, k};
    }
    dp[mid] = best.first;
    int opt = best.second;
    compute(l, mid-1, optL, opt);
    compute(mid+1, r, opt, optR);
}

You call it as:

compute(1, n, 0, n-1);

C. Example: Divide Array into K Segments

Given array a[1..n], divide into k parts to minimize \[ dp[i][k] = \min_{j < i} dp[j][k-1] + cost(j+1, i) \] If cost satisfies quadrangle inequality, you can optimize each layer in ( O$n \log n$ ).

D. Complexity

Naive: ( O$n^2$ ) → Optimized: ( O$n \log n$ )

2. Convex Hull Trick (CHT)

Applies when DP recurrence is linear in i and k: \[ dp[i] = \min_{k < i} (m_k \cdot x_i + b_k) \]

where:

$m_k$ is slope (depends on k)- ( b_k = dp[k] + c(k) )- $x_i$ is known (monotonic) You can maintain lines $y = m_k x + b_k$ in a convex hull, and query min efficiently.

A. Conditions

Slopes $m_k$ are monotonic (either increasing or decreasing)- Query points $x_i$ are sorted If both monotonic, we can use pointer walk in O(1) amortized per query. Otherwise, use Li Chao Tree (O(log n)).

B. Implementation (Monotonic Slopes)

struct Line { long long m, b; };
deque<Line> hull;

bool bad(Line l1, Line l2, Line l3) {
    return (l3.b - l1.b)*(l1.m - l2.m) <= (l2.b - l1.b)*(l1.m - l3.m);
}

void add(long long m, long long b) {
    Line l = {m, b};
    while (hull.size() >= 2 && bad(hull[hull.size()-2], hull.back(), l))
        hull.pop_back();
    hull.push_back(l);
}

long long query(long long x) {
    while (hull.size() >= 2 && 
          hull[0].m*x + hull[0].b >= hull[1].m*x + hull[1].b)
        hull.pop_front();
    return hull.front().m*x + hull.front().b;
}

C. Example: DP for Line-Based Recurrence

\[ dp[i] = a_i^2 + \min_{j < i} (dp[j] + b_j \cdot a_i) \] Here $m_j = b_j$, $x_i = a_i$, $b_j = dp[j]$

D. Complexity

Naive: ( O$n^2$ )- CHT: ( O(n) ) or ( O$n \log n$ )

3. Knuth Optimization

Used in interval DP problems (like Matrix Chain, Merging Stones).

If:

$dp[i][j] = \min_{k=i}^{j-1} (dp[i][k] + dp[k+1][j] + w(i,j))$
The cost $w(i,j)$ satisfies the quadrangle inequality: \[ w(a,c) + w(b,d) \le w(a,d) + w(b,c) \]
And the monotonicity condition: \[ opt[i][j-1] \le opt[i][j] \le opt[i+1][j] \]

Then you can reduce the search space from ( O(n) ) to ( O(1) ) per cell, making total complexity ( O$n^2$ ) instead of ( O$n^3$ ).

A. Implementation

for (int len = 2; len <= n; len++) {
    for (int i = 1; i + len - 1 <= n; i++) {
        int j = i + len - 1;
        dp[i][j] = INF;
        for (int k = opt[i][j-1]; k <= opt[i+1][j]; k++) {
            long long val = dp[i][k] + dp[k+1][j] + cost(i,j);
            if (val < dp[i][j]) {
                dp[i][j] = val;
                opt[i][j] = k;
            }
        }
    }
}

B. Example

Optimal Binary Search Tree or Merging Stones (with additive cost). Typical improvement: ( O$n^3$ O$n^2$ )

4. Summary

Technique	Applies To	Key Property	Complexity
Divide & Conquer DP	1D transitions	Monotonic argmin	O(n log n)
Convex Hull Trick	Linear transitions	Monotonic slopes	O(n) / O(n log n)
Knuth Optimization	Interval DP	Quadrangle + Monotonicity	O(n²)

Tiny Code

Divide & Conquer Template

void compute(int l, int r, int optL, int optR);

CHT Query

while (size>=2 && f[1](x) < f[0](x)) pop_front();

Why It Matters

These optimizations show that DP isn’t just brute force with memory , it’s mathematical reasoning on structure.

Once you spot monotonicity or linearity, you can shrink a quadratic solution into near-linear time.

“Optimization is the art of seeing simplicity hiding in structure.”

Try It Yourself

Optimize Matrix Chain DP using Knuth.
Apply Divide & Conquer on dp[i] = min_{k<i}(dp[k]+(i-k)^2).
Solve Slope DP with CHT for convex cost functions.
Compare runtime vs naive DP on random data.
Derive conditions for opt monotonicity in your custom recurrence.

Master these techniques, and you’ll turn your DPs from slow prototypes into lightning-fast solutions.

48. Tree DP and Rerooting

Dynamic Programming on trees is one of the most elegant and powerful patterns in algorithm design. Unlike linear arrays or grids, trees form hierarchical structures, where each node depends on its children or parent. Tree DP teaches you how to aggregate results up and down the tree, handling problems where subtrees interact.

In this section, we’ll cover:

Basic Tree DP (rooted trees)
DP over children (bottom-up aggregation)
Rerooting technique (top-down propagation)
Common applications and examples

1. Basic Tree DP: The Idea

We define dp[u] to represent some property of the subtree rooted at u. Then we combine children’s results to compute dp[u].

This bottom-up approach is like postorder traversal.

Example structure:

function dfs(u, parent):
    dp[u] = base_value
    for v in adj[u]:
        if v == parent: continue
        dfs(v, u)
        dp[u] = combine(dp[u], dp[v])

Example 1: Size of Subtree

Let dp[u] = number of nodes in subtree rooted at u

void dfs(int u, int p) {
    dp[u] = 1;
    for (int v : adj[u]) {
        if (v == p) continue;
        dfs(v, u);
        dp[u] += dp[v];
    }
}

Key idea: Combine children’s sizes to get parent size. Complexity: ( O(n) )

Example 2: Height of Tree

Let dp[u] = height of subtree rooted at u

void dfs(int u, int p) {
    dp[u] = 0;
    for (int v : adj[u]) {
        if (v == p) continue;
        dfs(v, u);
        dp[u] = max(dp[u], 1 + dp[v]);
    }
}

This gives you the height when rooted at any node.

2. DP Over Children (Bottom-Up Aggregation)

Tree DP is all about merging children.

For example, if you want the number of ways to color or number of independent sets, you compute children’s dp and merge results at parent.

Example 3: Counting Independent Sets

In a tree, an independent set is a set of nodes with no two adjacent.

State:

dp[u][0] = ways if u is not selected- dp[u][1] = ways if u is selected Recurrence: \[ dp[u][0] = \prod_{v \in children(u)} (dp[v][0] + dp[v][1]) \]

\[ dp[u][1] = \prod_{v \in children(u)} dp[v][0] \]

Implementation:

void dfs(int u, int p) {
    dp[u][0] = dp[u][1] = 1;
    for (int v : adj[u]) {
        if (v == p) continue;
        dfs(v, u);
        dp[u][0] *= (dp[v][0] + dp[v][1]);
        dp[u][1] *= dp[v][0];
    }
}

Final answer = dp[root][0] + dp[root][1]

Example 4: Maximum Path Sum in Tree

Let dp[u] = max path sum starting at u and going down To find best path anywhere, store a global max over child pairs.

int ans = 0;
int dfs(int u, int p) {
    int best1 = 0, best2 = 0;
    for (int v : adj[u]) {
        if (v == p) continue;
        int val = dfs(v, u) + weight(u, v);
        if (val > best1) swap(best1, val);
        if (val > best2) best2 = val;
    }
    ans = max(ans, best1 + best2);
    return best1;
}

This gives tree diameter or max path sum.

3. Rerooting Technique

Rerooting DP allows you to compute answers for every node as root, without recomputing from scratch ( O$n^2$ ). It’s also known as DP on trees with re-rooting.

Idea

First, compute dp_down[u] = answer for subtree when rooted at u.
Then, propagate info from parent to child (dp_up[u]), so each node gets info from outside its subtree.
Combine dp_down and dp_up to get dp_all[u].

Example 5: Sum of Distances from Each Node

Let’s find ans[u] = sum of distances from u to all nodes.

Root the tree at 0.
Compute subtree sizes and total distance from root.
Reroot to adjust distances using parent’s info.

Step 1: Bottom-up:

void dfs1(int u, int p) {
    sz[u] = 1;
    for (int v : adj[u]) {
        if (v == p) continue;
        dfs1(v, u);
        sz[u] += sz[v];
        dp[u] += dp[v] + sz[v];
    }
}

Step 2: Top-down:

void dfs2(int u, int p) {
    for (int v : adj[u]) {
        if (v == p) continue;
        dp[v] = dp[u] - sz[v] + (n - sz[v]);
        dfs2(v, u);
    }
}

After dfs2, dp[u] = sum of distances from node u.

Complexity: ( O(n) )

4. General Rerooting Template

// 1. Postorder: compute dp_down[u] from children
void dfs_down(u, p):
    dp_down[u] = base
    for v in adj[u]:
        if v != p:
            dfs_down(v, u)
            dp_down[u] = merge(dp_down[u], dp_down[v])

// 2. Preorder: use parent's dp_up to compute dp_all[u]
void dfs_up(u, p, dp_up_parent):
    ans[u] = merge(dp_down[u], dp_up_parent)
    prefix, suffix = prefix products of children
    for each child v:
        dp_up_v = merge(prefix[v-1], suffix[v+1], dp_up_parent)
        dfs_up(v, u, dp_up_v)

This template generalizes rerooting to many problems:

Maximum distance from each node- Number of ways to select subtrees- Sum of subtree sizes seen from each root

5. Summary

Pattern	Description	Complexity
Basic Tree DP	Combine child subresults	O(n)
DP Over Children	Each node depends on children	O(n)
Rerooting DP	Compute result for every root	O(n)
Multiple States	Track choices (e.g. include/exclude)	O(n·state)

Tiny Code

Subtree Size

void dfs(int u, int p) {
    dp[u] = 1;
    for (int v: adj[u]) if (v != p) {
        dfs(v,u);
        dp[u] += dp[v];
    }
}

Reroot Sum Distances

dp[v] = dp[u] - sz[v] + (n - sz[v]);

Why It Matters

Tree DP is how we think recursively over structure , each node’s truth emerges from its children. Rerooting expands this idea globally, giving every node its own perspective.

“In the forest of states, each root sees a different world , yet all follow the same law.”

Try It Yourself

Count number of nodes in each subtree.
Compute sum of depths from each node.
Find tree diameter using DP.
Count number of independent sets modulo 1e9+7.
Implement rerooting to find max distance from each node.

Tree DP turns recursive patterns into universal strategies for hierarchical data.

49. DP Reconstruction and Traceback

So far, we’ve focused on computing optimal values (min cost, max score, count of ways). But in most real problems, we don’t just want the number , we want to know how we got it.

That’s where reconstruction comes in: once you’ve filled your DP table, you can trace back the decisions that led to the optimal answer.

This chapter shows you how to:

Record transitions during DP computation
Reconstruct paths, subsets, or sequences
Handle multiple reconstructions (paths, sets, parent links)
Understand traceback in 1D, 2D, and graph-based DPs

1. The Core Idea

Each DP state comes from a choice. If you store which choice was best, you can walk backward from the final state to rebuild the solution.

Think of it as:

dp[i] = best over options
choice[i] = argmin or argmax option

Then:

reconstruction_path = []
i = n
while i > 0:
    reconstruction_path.push(choice[i])
    i = choice[i].prev

You’re not just solving , you’re remembering the path.

2. Reconstruction in 1D DP

Example: Coin Change (Minimum Coins)

Problem: Find minimum number of coins to make value n.

Recurrence: \[ dp[x] = 1 + \min_{c \in coins, c \le x} dp[x-c] \]

To reconstruct which coins were used:

int dp[MAXN], prev_coin[MAXN];
dp[0] = 0;
for (int x = 1; x <= n; x++) {
    dp[x] = INF;
    for (int c : coins) {
        if (x >= c && dp[x-c] + 1 < dp[x]) {
            dp[x] = dp[x-c] + 1;
            prev_coin[x] = c;
        }
    }
}

Reconstruction:

vector<int> used;
int cur = n;
while (cur > 0) {
    used.push_back(prev_coin[cur]);
    cur -= prev_coin[cur];
}

Output: coins used in one optimal solution.

Example: LIS Reconstruction

You know how to find LIS length. Now reconstruct the sequence.

int dp[n], prev[n];
int best_end = 0;
for (int i = 0; i < n; i++) {
    dp[i] = 1; prev[i] = -1;
    for (int j = 0; j < i; j++)
        if (a[j] < a[i] && dp[j] + 1 > dp[i]) {
            dp[i] = dp[j] + 1;
            prev[i] = j;
        }
    if (dp[i] > dp[best_end]) best_end = i;
}

Rebuild LIS:

vector<int> lis;
for (int i = best_end; i != -1; i = prev[i])
    lis.push_back(a[i]);
reverse(lis.begin(), lis.end());

3. Reconstruction in 2D DP

Example: LCS (Longest Common Subsequence)

We have dp[i][j] filled using:

\[ dp[i][j] = \begin{cases} dp[i-1][j-1] + 1, & \text{if } a[i-1] = b[j-1], \\ \max(dp[i-1][j], dp[i][j-1]), & \text{otherwise.} \end{cases} \]

To reconstruct LCS:

int i = n, j = m;
string lcs = "";
while (i > 0 && j > 0) {
    if (a[i-1] == b[j-1]) {
        lcs.push_back(a[i-1]);
        i--; j--;
    }
    else if (dp[i-1][j] > dp[i][j-1]) i--;
    else j--;
}
reverse(lcs.begin(), lcs.end());

Output: one valid LCS string.

Example: Edit Distance

Operations: insert, delete, replace.

You can store the operation:

if (a[i-1] == b[j-1]) op[i][j] = "match";
else if (dp[i][j] == dp[i-1][j-1] + 1) op[i][j] = "replace";
else if (dp[i][j] == dp[i-1][j] + 1) op[i][j] = "delete";
else op[i][j] = "insert";

Then backtrack to list operations:

while (i > 0 || j > 0) {
    if (op[i][j] == "match") i--, j--;
    else if (op[i][j] == "replace") { print("Replace"); i--; j--; }
    else if (op[i][j] == "delete") { print("Delete"); i--; }
    else { print("Insert"); j--; }
}

4. Reconstruction in Path Problems

When DP tracks shortest paths, you can keep parent pointers.

Example: Bellman-Ford Path Reconstruction

int dist[n], parent[n];
dist[src] = 0;
for (int k = 0; k < n-1; k++)
  for (auto [u,v,w] : edges)
    if (dist[u] + w < dist[v]) {
        dist[v] = dist[u] + w;
        parent[v] = u;
    }

vector<int> path;
for (int v = dest; v != src; v = parent[v])
    path.push_back(v);
path.push_back(src);
reverse(path.begin(), path.end());

You now have the actual shortest path.

5. Handling Multiple Solutions

Sometimes multiple optimal paths exist. You can:

Store all predecessors instead of one- Backtrack recursively to enumerate all solutions- Tie-break deterministically (e.g., lexicographically smallest) Example:

if (new_val == dp[i]) parents[i].push_back(j);

Then recursively generate all possible paths.

6. Visualization

DP reconstruction often looks like following arrows in a grid or graph:

LCS: diagonal (↖), up (↑), left (←)- Shortest path: parent edges- LIS: predecessor chain You’re walking through decisions, not just numbers.

7. Summary

Type	State	Reconstruction
1D DP	`prev[i]`	Trace chain
2D DP	`op[i][j]`	Follow choices
Graph DP	`parent[v]`	Follow edges
Counting DP	optional	Recover counts / paths

Tiny Code

General pattern:

for (state)
  for (choice)
    if (better) {
        dp[state] = value;
        parent[state] = choice;
    }

Then:

while (state != base) {
    path.push_back(parent[state]);
    state = parent[state];
}

Why It Matters

Solving DP gets you the score , reconstructing shows you the story. It’s the difference between knowing the answer and understanding the reasoning.

“Numbers tell you the outcome; pointers tell you the path.”

Try It Yourself

Reconstruct one LIS path.
Print all LCSs for small strings.
Show edit operations to transform “cat” → “cut”.
Track subset used in Knapsack to reach exact weight.
Recover optimal merge order in Matrix Chain DP.

Reconstruction turns DP from a static table into a narrative of decisions , a map back through the maze of optimal choices.

50. Meta-DP and Optimization Templates

We’ve now explored many flavors of dynamic programming , on sequences, grids, trees, graphs, subsets, and digits. This final chapter in the DP arc zooms out to the meta-level: how to see DP patterns, generalize them, and turn them into reusable templates.

If classical DP is about solving one problem, meta-DP is about recognizing families of problems that share structure. You’ll learn how to build your own DP frameworks, use common templates, and reason from first principles.

1. What Is Meta-DP?

A Meta-DP is a high-level abstraction of a dynamic programming pattern. It encodes:

State definition pattern- Transition pattern- Optimization structure- Dimensional dependencies By learning these patterns, you can design DPs faster, reuse logic across problems, and spot optimizations early.

Think of Meta-DP as:

“Instead of memorizing 100 DPs, master 10 DP blueprints.”

2. The Four Building Blocks

Every DP has the same core ingredients:

State: what subproblem you’re solving
- Often dp[i], dp[i][j], or dp[mask] - Represents smallest unit of progress2. Transition: how to build larger subproblems from smaller ones
- E.g. dp[i] = min(dp[j] + cost(j, i))3. Base Case: known trivial answers
- E.g. dp[0] = 04. Order: how to fill the states
- E.g. increasing i, decreasing i, or topological order Once you can describe a problem in these four, it is a DP.

3. Meta-Templates for Common Structures

Below are generalized templates to use and adapt.

A. Line DP (1D Sequential)

Shape: linear progression Examples:

Fibonacci- Knapsack (1D capacity)- LIS (sequential dependency)

for (int i = 1; i <= n; i++) {
    dp[i] = base;
    for (int j : transitions(i))
        dp[i] = min(dp[i], dp[j] + cost(j, i));
}

Visualization: → → → Each state depends on previous positions.

B. Grid DP (2D Spatial)

Shape: grid or matrix Examples:

Paths in a grid- Edit Distance- Counting paths with obstacles

for (i = 0; i < n; i++)
  for (j = 0; j < m; j++)
    dp[i][j] = combine(dp[i-1][j], dp[i][j-1]);

Visualization: ⬇️ ⬇️ ➡️ Moves from top-left to bottom-right.

C. Interval DP

Shape: segments or subarrays Examples:

Matrix Chain Multiplication- Optimal BST- Merging Stones

for (len = 2; len <= n; len++)
  for (i = 0; i + len - 1 < n; i++) {
      j = i + len - 1;
      dp[i][j] = INF;
      for (k = i; k < j; k++)
          dp[i][j] = min(dp[i][j], dp[i][k] + dp[k+1][j] + cost(i,j));
  }

Key Insight: overlapping intervals, split points.

D. Subset DP

Shape: subsets of a set Examples:

Traveling Salesman (TSP)- Assignment problem- SOS DP

for (mask = 0; mask < (1<<n); mask++)
  for (sub = mask; sub; sub = (sub-1)&mask)
      dp[mask] = combine(dp[mask], dp[sub]);

Key Insight: use bitmasks to represent subsets.

E. Tree DP

Shape: hierarchical dependencies Examples:

Subtree sizes- Independent sets- Rerooting

void dfs(u, p):
  dp[u] = base
  for (v in children)
    if (v != p)
      dfs(v, u)
      dp[u] = merge(dp[u], dp[v])

F. Graph DP (Topological Order)

Shape: DAG structure Examples:

Longest path in DAG- Counting paths- DAG shortest path

for (u in topo_order)
  for (v in adj[u])
    dp[v] = combine(dp[v], dp[u] + weight(u,v));

Key: process nodes in topological order.

G. Digit DP

Shape: positional digits, constrained transitions Examples:

Count numbers satisfying digit conditions- Divisibility / digit sum problems

dp[pos][sum][tight] = ∑ dp[next_pos][new_sum][new_tight];

H. Knuth / Divide & Conquer / Convex Hull Trick

Shape: optimization over monotone or convex transitions Examples:

Cost-based splits- Line-based transitions

dp[i] = min_k (dp[k] + cost(k, i))

Key: structure in opt[i] or slope.

4. Recognizing DP Type

Ask these diagnostic questions:

Question	Clue
“Does each step depend on smaller subproblems?”	DP
“Do I split a segment?”	Interval DP
“Do I choose subsets?”	Subset / Bitmask DP
“Do I move along positions?”	Line DP
“Do I merge children?”	Tree DP
“Do I process in a DAG?”	Graph DP
“Do I track digits or constraints?”	Digit DP

5. Optimization Layer

Once you have a working DP, ask:

Can transitions be reduced (monotonicity)?- Can overlapping cost be cached (prefix sums)?- Can dimensions be compressed (rolling arrays)?- Can you reuse solutions for each segment (Divide & Conquer / Knuth)? This transforms your DP from conceptual to efficient.

6. Meta-DP: Transformations

Compress dimensions: if only dp[i-1] needed, use 1D array.- Invert loops: bottom-up ↔︎ top-down.- Change base: prefix-sums for range queries.- State lifting: add dimension for new property (like remainder, parity, bitmask). > “When stuck, add a dimension. When slow, remove one.”

7. Common Template Snippets

Rolling 1D Knapsack:

for (c = C; c >= w[i]; c--)
  dp[c] = max(dp[c], dp[c-w[i]] + val[i]);

Top-Down Memoization:

int solve(state):
  if (visited[state]) return dp[state];
  dp[state] = combine(solve(next_states));

Iterative DP:

for (state in order)
  dp[state] = merge(prev_states);

8. Building Your Own DP Framework

You can design a generic DP(state, transition) class:

struct DP {
    vector<long long> dp;
    function<long long(int,int)> cost;
    DP(int n, auto cost): dp(n, INF), cost(cost) {}
    void solve() { for (int i=1; i<n; i++) for (int j=0; j<i; j++) 
         dp[i] = min(dp[i], dp[j] + cost(j, i)); }
};

Reusable, readable, flexible.

9. Summary

DP Type	Core State	Shape	Typical Complexity
Line DP	dp[i]	Linear	O(n²) → O(n)
Grid DP	dp[i][j]	Matrix	O(n·m)
Interval DP	dp[i][j]	Triangular	O(n³)
Subset DP	dp[mask]	Exponential	O(n·2ⁿ)
Tree DP	dp[u]	Tree	O(n)
Digit DP	dp[pos][sum]	Recursive	O(len·sum)
Graph DP	dp[v]	DAG	O(V+E)

Tiny Code

for (state in order)
  dp[state] = combine(all_prev(state));

Why It Matters

Meta-DP turns your thinking from case-by-case to pattern-by-pattern. You stop memorizing formulas and start seeing shapes: lines, grids, intervals, trees, masks.

Once you can name the shape, you can write the DP.

“DP is not about filling tables. It’s about recognizing structure.”

Try It Yourself

Classify each classic DP problem into a type.
Write one template per pattern (Line, Grid, Tree, etc.).
Create a dp_solve(state, transitions) function to generalize logic.
For each pattern, practice a small example.
Build your own “Little Book of DP Patterns” with code snippets.

This is your bridge from concrete solutions to algorithmic fluency , the foundation for mastering the next 950 algorithms ahead.

Chapter 6. Strings and Text Algorithms

51. Number Theory (GCD, Modular Arithmetic, CRT)

Number theory forms the mathematical backbone of many algorithms , from hashing and cryptography to modular combinatorics and primality testing. In algorithmic problem-solving, it’s all about working with integers, divisibility, and modular systems efficiently.

This section covers the essential toolkit:

GCD and Extended Euclidean Algorithm- Modular Arithmetic (addition, subtraction, multiplication, inverse)- Modular Exponentiation- Chinese Remainder Theorem (CRT)

1. The Greatest Common Divisor (GCD)

The GCD of two integers $a$ and $b$, denoted $\gcd(a, b)$, is the largest integer that divides both. It’s the cornerstone for fraction simplification, Diophantine equations, and modular inverses.

A. Euclidean Algorithm

Based on: \[ \gcd(a, b) = \gcd(b, a \bmod b) \]

int gcd(int a, int b) {
    return b == 0 ? a : gcd(b, a % b);
}

Time complexity: $O(\log \min(a,b))$

B. Extended Euclidean Algorithm

Finds integers ( x, y ) such that: \[ ax + by = \gcd(a, b) \]

This is critical for finding modular inverses.

int ext_gcd(int a, int b, int &x, int &y) {
    if (b == 0) {
        x = 1; y = 0;
        return a;
    }
    int x1, y1;
    int g = ext_gcd(b, a % b, x1, y1);
    x = y1;
    y = x1 - (a / b) * y1;
    return g;
}

C. Bezout’s Identity

If $g = \gcd(a,b)$, then $ax + by = g$ has integer solutions. If $g = 1$, $x$ is the modular inverse of $a modulo b$.

2. Modular Arithmetic

A modular system “wraps around” after a certain value ( m ).

We write: \[ a \equiv b \pmod{m} \quad \text{if } m \mid (a - b) \]

It behaves like ordinary arithmetic, with the rule:

$(a + b) \bmod m = ((a \bmod m) + (b \bmod m)) \bmod m$
$(a \cdot b) \bmod m = ((a \bmod m) \cdot (b \bmod m)) \bmod m$
$(a - b) \bmod m = ((a \bmod m) - (b \bmod m) + m) \bmod m$

A. Modular Exponentiation

Compute $a^b \bmod m$ efficiently using binary exponentiation.

long long modpow(long long a, long long b, long long m) {
    long long res = 1;
    a %= m;
    while (b > 0) {
        if (b & 1) res = (res * a) % m;
        a = (a * a) % m;
        b >>= 1;
    }
    return res;
}

Complexity: ( O$\log b$ )

B. Modular Inverse

Given ( a ), find $a^{-1}$ such that: \[ a \cdot a^{-1} \equiv 1 \pmod{m} \]

Case 1: If ( m ) is prime, use Fermat’s Little Theorem: \[ a^{-1} \equiv a^{m-2} \pmod{m} \]

int modinv(int a, int m) {
    return modpow(a, m-2, m);
}

Case 2: If ( a ) and ( m ) are coprime, use Extended GCD:

int inv(int a, int m) {
    int x, y;
    int g = ext_gcd(a, m, x, y);
    if (g != 1) return -1; // no inverse
    return (x % m + m) % m;
}

C. Modular Division

To divide $a / b \bmod m$: \[ a / b \equiv a \cdot b^{-1} \pmod{m} \]

So compute the inverse and multiply.

3. Chinese Remainder Theorem (CRT)

The CRT solves systems of congruences: \[ x \equiv a_1 \pmod{m_1} \]

\[ x \equiv a_2 \pmod{m_2} \] If moduli $m_1, m_2, \dots, m_k$ are pairwise coprime, there exists a unique solution modulo $M = m_1 m_2 \dots m_k$.

A. 2-Equation Example

Solve: \[ x \equiv a_1 \pmod{m_1}, \quad x \equiv a_2 \pmod{m_2} \]

Let:

$M = m_1 m_2$- $M_1 = M / m_1$- $M_2 = M / m_2$ Find inverses $inv_1 = M_1^{-1} \bmod m_1$, $inv_2 = M_2^{-1} \bmod m_2$

Then: \[ x = (a_1 \cdot M_1 \cdot inv_1 + a_2 \cdot M_2 \cdot inv_2) \bmod M \]

B. Implementation

long long crt(vector<int> a, vector<int> m) {
    long long M = 1;
    for (int mod : m) M *= mod;
    long long res = 0;
    for (int i = 0; i < a.size(); i++) {
        long long Mi = M / m[i];
        long long inv = modinv(Mi % m[i], m[i]);
        res = (res + 1LL * a[i] * Mi % M * inv % M) % M;
    }
    return (res % M + M) % M;
}

C. Example

Solve:

x ≡ 2 (mod 3)
x ≡ 3 (mod 5)
x ≡ 2 (mod 7)

Solution: ( x = 23 ) (mod 105)

Check:

( 23 % 3 = 2 )- ( 23 % 5 = 3 )- ( 23 % 7 = 2 )

4. Tiny Code

GCD

int gcd(int a, int b) { return b ? gcd(b, a % b) : a; }

Modular Power

modpow(a, b, m)

Modular Inverse

modinv(a, m)

CRT

crt(a[], m[])

5. Summary

Concept	Formula	Purpose
GCD	$\gcd(a,b) = \gcd(b, a \bmod b)$	Simplify ratios
Extended GCD	$ax + by = \gcd(a,b)$	Find modular inverse
Modular Inverse	$a^{-1} \equiv a^{m-2} \pmod{m}$	Solve modular equations
Modular Exp	$a^b \bmod m$	Fast exponentiation
CRT	Combine congruences	Multi-mod system

Why It Matters

Number theory lets algorithms speak the language of integers , turning huge computations into modular games. From hashing to RSA, from combinatorics to cryptography, it’s everywhere.

“When numbers wrap around, math becomes modular , and algorithms become magical.”

Try It Yourself

Compute gcd(48, 180).
Find inverse of 7 mod 13.
Solve $x ≡ 1 \pmod{2}, x ≡ 2 \pmod{3}, x ≡ 3 \pmod{5}$.
Implement modular division $a / b \bmod m$.
Use modpow to compute $3^{200} \bmod 13$.

These basics unlock higher algorithms in cryptography, combinatorics, and beyond.

52. Primality and Factorization (Miller-Rabin, Pollard Rho)

Primality and factorization are core to number theory, cryptography, and competitive programming. Many modern systems (RSA, ECC) rely on the hardness of factoring large numbers. Here, we learn how to test if a number is prime and break it into factors efficiently.

We’ll cover:

Trial Division
Sieve of Eratosthenes (for precomputation)
Probabilistic Primality Test (Miller-Rabin)
Integer Factorization (Pollard Rho)

1. Trial Division

The simplest way to test primality is by dividing by all integers up to √n.

bool is_prime(long long n) {
    if (n < 2) return false;
    if (n % 2 == 0) return n == 2;
    for (long long i = 3; i * i <= n; i += 2)
        if (n % i == 0) return false;
    return true;
}

Time Complexity: ( O$\sqrt{n}$ ) Good for $n \le 10^6$, impractical for large ( n ).

2. Sieve of Eratosthenes

For checking many numbers at once, use a sieve.

Idea: Mark all multiples of each prime starting from 2.

vector<bool> sieve(int n) {
    vector<bool> is_prime(n+1, true);
    is_prime[0] = is_prime[1] = false;
    for (int i = 2; i * i <= n; i++)
        if (is_prime[i])
            for (int j = i * i; j <= n; j += i)
                is_prime[j] = false;
    return is_prime;
}

Time Complexity: ( O$n \log \log n$ )

Useful for generating primes up to $10^7$.

3. Modular Multiplication

Before we do probabilistic tests or factorization, we need safe modular multiplication for large numbers.

long long modmul(long long a, long long b, long long m) {
    __int128 res = (__int128)a * b % m;
    return (long long)res;
}

Avoid overflow for $n \approx 10^{18}$.

4. Miller-Rabin Primality Test

A probabilistic test that can check if ( n ) is prime or composite in ( O$k \log^3 n$ ).

Idea: For a prime ( n ): \[ a^{n-1} \equiv 1 \pmod{n} \] But for composites, most ( a ) fail this.

We write $n - 1 = 2^s \cdot d$, ( d ) odd.

For each base ( a ):

Compute $x = a^d \bmod n$- If ( x = 1 ) or ( x = n - 1 ), pass- Else, square ( s-1 ) times- If none equal ( n - 1 ), composite

bool miller_rabin(long long n) {
    if (n < 2) return false;
    for (long long p : {2,3,5,7,11,13,17,19,23,29,31,37})
        if (n % p == 0) return n == p;
    long long d = n - 1, s = 0;
    while ((d & 1) == 0) d >>= 1, s++;
    auto modpow = [&](long long a, long long b) {
        long long r = 1;
        while (b) {
            if (b & 1) r = modmul(r, a, n);
            a = modmul(a, a, n);
            b >>= 1;
        }
        return r;
    };
    for (long long a : {2, 325, 9375, 28178, 450775, 9780504, 1795265022}) {
        if (a % n == 0) continue;
        long long x = modpow(a, d);
        if (x == 1 || x == n - 1) continue;
        bool composite = true;
        for (int r = 1; r < s; r++) {
            x = modmul(x, x, n);
            if (x == n - 1) {
                composite = false;
                break;
            }
        }
        if (composite) return false;
    }
    return true;
}

Deterministic for:

$n < 2^{64}$ with bases above. Complexity: ( O$k \log^3 n$ )

5. Pollard Rho Factorization

Efficient for finding nontrivial factors of large composites. Based on Floyd’s cycle detection (Tortoise and Hare).

Idea: Define a pseudo-random function: \[ f(x) = (x^2 + c) \bmod n \] Then find $\gcd(|x - y|, n)$ where $x, y$ move at different speeds.

long long pollard_rho(long long n) {
    if (n % 2 == 0) return 2;
    auto f = [&](long long x, long long c) {
        return (modmul(x, x, n) + c) % n;
    };
    while (true) {
        long long x = rand() % (n - 2) + 2;
        long long y = x;
        long long c = rand() % (n - 1) + 1;
        long long d = 1;
        while (d == 1) {
            x = f(x, c);
            y = f(f(y, c), c);
            d = gcd(abs(x - y), n);
        }
        if (d != n) return d;
    }
}

Use:

Check if ( n ) is prime (Miller-Rabin)
If not, find a factor using Pollard Rho
Recurse on factors

Complexity: ~ ( O$n^{1/4}$ ) average

6. Example

Factorize ( n = 8051 ):

Miller-Rabin → composite
Pollard Rho → factor 83
( 8051 / 83 = 97 )
Both primes ⇒ ( 8051 = 83 × 97 )

7. Tiny Code

void factor(long long n, vector<long long> &f) {
    if (n == 1) return;
    if (miller_rabin(n)) {
        f.push_back(n);
        return;
    }
    long long d = pollard_rho(n);
    factor(d, f);
    factor(n / d, f);
}

Call factor(n, f) to get prime factors.

8. Summary

Algorithm	Purpose	Complexity	Type
Trial Division	Small primes	( O$\sqrt{n}$ )	Deterministic
Sieve	Precompute primes	( O$n \log \log n$ )	Deterministic
Miller-Rabin	Primality test	( O$k \log^3 n$ )	Probabilistic
Pollard Rho	Factorization	( O$n^{1/4}$ )	Probabilistic

Why It Matters

Modern security, number theory problems, and many algorithmic puzzles depend on knowing when a number is prime and factoring it quickly when it isn’t. These tools are the entry point to RSA, modular combinatorics, and advanced cryptography.

Try It Yourself

Check if 97 is prime using trial division and Miller-Rabin.
Factorize 5959 (should yield 59 × 101).
Generate all primes ≤ 100 using a sieve.
Write a recursive factorizer using Pollard Rho + Miller-Rabin.
Measure performance difference between $\sqrt{n}$ trial and Pollard Rho for $n \approx 10^{12}$.

These techniques make huge numbers approachable , one factor at a time.

53. Combinatorics (Permutations, Combinations, Subsets)

Combinatorics is the art of counting structures , how many ways can we arrange, select, or group things? In algorithms, it’s everywhere: DP transitions, counting paths, bitmask enumeration, and probabilistic reasoning. Here we’ll build a toolkit for computing factorials, nCr, nPr, and subset counts, both exactly and under a modulus.

1. Factorials and Precomputation

Most combinatorial formulas rely on factorials: \[ n! = 1 \times 2 \times 3 \times \dots \times n \]

We can precompute them modulo ( m ) (often $10^9+7$) for efficiency.

const int MOD = 1e9 + 7;
const int MAXN = 1e6;
long long fact[MAXN + 1], invfact[MAXN + 1];

long long modpow(long long a, long long b) {
    long long res = 1;
    while (b > 0) {
        if (b & 1) res = res * a % MOD;
        a = a * a % MOD;
        b >>= 1;
    }
    return res;
}

void init_factorials() {
    fact[0] = 1;
    for (int i = 1; i <= MAXN; i++)
        fact[i] = fact[i - 1] * i % MOD;
    invfact[MAXN] = modpow(fact[MAXN], MOD - 2);
    for (int i = MAXN - 1; i >= 0; i--)
        invfact[i] = invfact[i + 1] * (i + 1) % MOD;
}

Now you can compute ( nCr ) and ( nPr ) in ( O(1) ) time.

2. Combinations ( nCr )

The number of ways to choose r items from ( n ) items: \[ C(n, r) = \frac{n!}{r!(n-r)!} \]

long long nCr(int n, int r) {
    if (r < 0 || r > n) return 0;
    return fact[n] * invfact[r] % MOD * invfact[n - r] % MOD;
}

Properties:

$(C(n, 0) = 1),\ (C(n, n) = 1)$
$C(n, r) = C(n, n - r)$
Pascal’s Rule: $C(n, r) = C(n - 1, r - 1) + C(n - 1, r)$

Example

( C(5, 2) = 10 ) There are 10 ways to pick 2 elements from a 5-element set.

3. Permutations ( nPr )

Number of ways to arrange r elements chosen from ( n ): \[ P(n, r) = \frac{n!}{(n-r)!} \]

long long nPr(int n, int r) {
    if (r < 0 || r > n) return 0;
    return fact[n] * invfact[n - r] % MOD;
}

Example

( P(5, 2) = 20 ) Choosing 2 out of 5 elements and arranging them yields 20 orders.

4. Subsets and Power Set

Each element has 2 choices: include or exclude. Hence, number of subsets: \[ 2^n \]

long long subsets_count(int n) {
    return modpow(2, n);
}

Enumerating subsets using bitmasks:

for (int mask = 0; mask < (1 << n); mask++) {
    for (int i = 0; i < n; i++)
        if (mask & (1 << i))
            ; // include element i
}

Total: $2^n$ subsets, ( O$n2^n$ ) time to enumerate.

5. Multisets and Repetition

Number of ways to choose ( r ) items from ( n ) with repetition: \[ C(n + r - 1, r) \]

For example, number of ways to give 5 candies to 3 kids (each can get 0): ( C(3+5-1, 5) = C(7,5) = 21 )

6. Modular Combinatorics

When working modulo ( p ): - Use modular inverse for division. - $C(n, r) \bmod p = fact[n] \cdot invfact[r] \cdot invfact[n - r] \bmod p$

When $n \ge p$, use Lucas’ Theorem: \[ C(n, r) \bmod p = C(n/p, r/p) \cdot C(n%p, r%p) \bmod p \]

7. Stirling and Bell Numbers (Advanced)

Stirling Numbers of 2nd Kind: ways to partition ( n ) items into ( k ) non-empty subsets \[ S(n,k) = k \cdot S(n-1,k) + S(n-1,k-1) \]
Bell Numbers: total number of partitions \[ B(n) = \sum_{k=0}^{n} S(n,k) \]

Used in set partition and grouping problems.

8. Tiny Code

init_factorials();
printf("%lld\n", nCr(10, 3));  // 120
printf("%lld\n", nPr(10, 3));  // 720
printf("%lld\n", subsets_count(5)); // 32

9. Summary

Concept	Formula	Meaning	Example
Factorial	$n!$	All arrangements	$5! = 120$
Combination	$C(n, r) = \frac{n!}{r!(n - r)!}$	Choose	$C(5, 2) = 10$
Permutation	$P(n, r) = \frac{n!}{(n - r)!}$	Arrange	$P(5, 2) = 20$
Subsets	$2^n$	All combinations	$2^3 = 8$
Multisets	$C(n + r - 1, r)$	Repetition allowed	$C(4, 2) = 6$

Why It Matters

Combinatorics underlies probability, DP counting, and modular problems. You can’t master competitive programming or algorithm design without counting possibilities correctly. It teaches how structure emerges from choice , and how to count it efficiently.

Try It Yourself

Compute $C(1000, 500) \bmod (10^9 + 7)$.
Count the number of 5-bit subsets with exactly 3 bits on, i.e. $C(5, 3)$.
Write a loop to print all subsets of {a, b, c, d}.
Use Lucas’ theorem for $C(10^6, 1000) \bmod 13$.
Implement Stirling recursion and print $S(5, 2)$.

Every algorithmic counting trick , from Pascal’s triangle to binomial theorem , starts right here.

54. Probability and Randomized Algorithms

Probability introduces controlled randomness into computation. Instead of deterministic steps, randomized algorithms use random choices to achieve speed, simplicity, or robustness. This section bridges probability theory and algorithm design , teaching how to model, analyze, and exploit randomness.

We’ll cover:

Probability Basics
Expected Value
Monte Carlo and Las Vegas Algorithms
Randomized Data Structures and Algorithms

1. Probability Basics

Every event has a probability between 0 and 1.
If a sample space has $n$ equally likely outcomes and $k$ of them satisfy a condition, then

\[ P(E) = \frac{k}{n} \]

Examples

Rolling a fair die: $P(\text{even}) = \frac{3}{6} = \frac{1}{2}$
Drawing an ace from a deck: $P(\text{ace}) = \frac{4}{52} = \frac{1}{13}$

Key Rules

Complement: $P(\bar{E}) = 1 - P(E)$
Addition: $P(A \cup B) = P(A) + P(B) - P(A \cap B)$
Multiplication: $P(A \cap B) = P(A) \cdot P(B \mid A)$

2. Expected Value

The expected value is the weighted average of outcomes.

\[ E[X] = \sum_{i} P(x_i) \cdot x_i \]

Example: Expected value of a die: \[ E[X] = \frac{1+2+3+4+5+6}{6} = 3.5 \]

Properties:

Linearity: $E[aX + bY] = aE[X] + bE[Y]$
Useful for average-case analysis

In algorithms:

Expected number of comparisons in QuickSort is $O(n \log n)$
Expected time for hash table lookup is $O(1)$

3. Monte Carlo vs Las Vegas

Randomized algorithms are broadly two types:

Type	Output	Runtime	Example
Monte Carlo	May be wrong (probabilistically)	Fixed	Miller-Rabin Primality
Las Vegas	Always correct	Random runtime	Randomized QuickSort

Monte Carlo:

Faster, approximate
You can control error probability
E.g. primality test returns “probably prime”

Las Vegas:

Output guaranteed correct
Runtime varies by luck
E.g. QuickSort with random pivot

4. Randomization in Algorithms

Randomization helps break worst-case patterns.

A. Randomized QuickSort

Pick a random pivot instead of first element. Expected time becomes ( O$n \log n$ ) regardless of input order.

int partition(int a[], int l, int r) {
    int pivot = a[l + rand() % (r - l + 1)];
    // move pivot to end, then normal partition
}

This avoids adversarial inputs like sorted arrays.

B. Randomized Hashing

Hash collisions can be exploited by adversaries. Using random coefficients in hash functions makes attacks infeasible.

long long hash(long long x, long long a, long long b, long long p) {
    return (a * x + b) % p;
}

Pick random ( a, b ) for robustness.

C. Randomized Data Structures

Skip List: uses random levels for nodes Expected ( O$\log n$ ) search/insert/delete
Treap: randomized heap priority + BST order Maintains balance in expectation

struct Node {
    int key, priority;
    Node *l, *r;
};

Randomized balancing gives good average performance without rotation logic.

D. Random Sampling

Pick random elements efficiently:

Reservoir Sampling: sample ( k ) items from a stream of unknown size- Shuffle: Fisher-Yates Algorithm

for (int i = n - 1; i > 0; i--) {
    int j = rand() % (i + 1);
    swap(a[i], a[j]);
}

5. Probabilistic Guarantees

Randomized algorithms often use Chernoff bounds and Markov’s inequality to bound errors:

Markov: $P(X \ge kE[X]) \le \frac{1}{k}$
Chebyshev: $P(|X - E[X]| \ge c\sigma) \le \frac{1}{c^2}$
Chernoff: Exponentially small tail bounds

These ensure “with high probability” ($1 - \frac{1}{n^c}$) guarantees.

6. Tiny Code

Randomized QuickSort:

int partition(int arr[], int low, int high) {
    int pivotIdx = low + rand() % (high - low + 1);
    swap(arr[pivotIdx], arr[high]);
    int pivot = arr[high], i = low;
    for (int j = low; j < high; j++) {
        if (arr[j] < pivot) swap(arr[i++], arr[j]);
    }
    swap(arr[i], arr[high]);
    return i;
}

void quicksort(int arr[], int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        quicksort(arr, low, pi - 1);
        quicksort(arr, pi + 1, high);
    }
}

7. Summary

Concept	Key Idea	Use Case
Expected Value	Weighted average outcome	Analyze average case
Monte Carlo	Probabilistic correctness	Primality test
Las Vegas	Probabilistic runtime	QuickSort
Random Pivot	Break worst-case	Sorting
Skip List / Treap	Random balancing	Data Structures
Reservoir Sampling	Stream selection	Large data

Why It Matters

Randomization is not “luck” , it’s a design principle. It transforms rigid algorithms into adaptive, robust systems. In complexity theory, randomness helps achieve bounds impossible deterministically.

“A bit of randomness turns worst cases into best friends.”

Try It Yourself

Simulate rolling two dice and compute expected sum.
Implement randomized QuickSort and measure average runtime.
Write a Monte Carlo primality checker.
Create a random hash function for integers.
Implement reservoir sampling for a large input stream.

These experiments show how uncertainty can become a powerful ally in algorithm design.

55. Sieve Methods and Modular Math

Sieve methods are essential tools in number theory for generating prime numbers, prime factors, and function values (φ, μ) efficiently. Combined with modular arithmetic, they form the backbone of algorithms in cryptography, combinatorics, and competitive programming.

This section introduces:

Sieve of Eratosthenes- Optimized Linear Sieve- Sieve for Smallest Prime Factor (SPF)- Euler’s Totient Function (φ)- Modular Applications

1. The Sieve of Eratosthenes

The classic algorithm to find all primes ≤ ( n ).

Idea: Start from 2, mark all multiples as composite. Continue to √n.

vector<int> sieve(int n) {
    vector<int> primes;
    vector<bool> is_prime(n + 1, true);
    is_prime[0] = is_prime[1] = false;
    for (int i = 2; i * i <= n; i++)
        if (is_prime[i])
            for (int j = i * i; j <= n; j += i)
                is_prime[j] = false;
    for (int i = 2; i <= n; i++)
        if (is_prime[i]) primes.push_back(i);
    return primes;
}

Time Complexity: ( O$n \log \log n$ )

Space: ( O(n) )

Example: Primes up to 20 → 2, 3, 5, 7, 11, 13, 17, 19

2. Linear Sieve (O(n))

Unlike the basic sieve, each number is marked exactly once by its smallest prime factor (SPF).

Idea:

For each prime ( p ), mark $p \times i$ only once.- Use spf[i] to store smallest prime factor.

const int N = 1e6;
int spf[N + 1];
vector<int> primes;

void linear_sieve() {
    for (int i = 2; i <= N; i++) {
        if (!spf[i]) {
            spf[i] = i;
            primes.push_back(i);
        }
        for (int p : primes) {
            if (p > spf[i] || 1LL * i * p > N) break;
            spf[i * p] = p;
        }
    }
}

Benefits:

Get primes, SPF, and factorizations in ( O(n) ).- Ideal for problems needing many factorizations.

3. Smallest Prime Factor (SPF) Table

With spf[], factorization becomes ( O$\log n$ ).

vector<int> factorize(int x) {
    vector<int> f;
    while (x != 1) {
        f.push_back(spf[x]);
        x /= spf[x];
    }
    return f;
}

Example: spf[12] = 2 → factors = [2, 2, 3]

4. Euler’s Totient Function ( (n) )

The number of integers ≤ ( n ) that are coprime with ( n ).

Formula: \[ \varphi(n) = n \prod_{p|n} \left(1 - \frac{1}{p}\right) \]

Properties:

$\varphi(p) = p - 1$ if $p$ is prime
Multiplicative: if $\gcd(a, b) = 1$, then $\varphi(ab) = \varphi(a)\varphi(b)$

Implementation (Linear Sieve):

const int N = 1e6;
int phi[N + 1];
bool is_comp[N + 1];
vector<int> primes;

void phi_sieve() {
    phi[1] = 1;
    for (int i = 2; i <= N; i++) {
        if (!is_comp[i]) {
            primes.push_back(i);
            phi[i] = i - 1;
        }
        for (int p : primes) {
            if (1LL * i * p > N) break;
            is_comp[i * p] = true;
            if (i % p == 0) {
                phi[i * p] = phi[i] * p;
                break;
            } else {
                phi[i * p] = phi[i] * (p - 1);
            }
        }
    }
}

Example:

$\varphi(6) = 6(1 - \frac{1}{2})(1 - \frac{1}{3}) = 2$
Numbers coprime with 6: 1, 5

5. Modular Math Applications

Once we have primes and totients, we can do many modular computations.

A. Fermat’s Little Theorem

If ( p ) is prime, \[ a^{p-1} \equiv 1 \pmod{p} \] Hence, \[ a^{-1} \equiv a^{p-2} \pmod{p} \]

Used in: modular inverses, combinatorics.

B. Euler’s Theorem

If $\gcd(a, n) = 1$, then

\[ a^{\varphi(n)} \equiv 1 \pmod{n} \]

Generalizes Fermat’s theorem to composite moduli.

C. Modular Exponentiation with Totient Reduction

For very large powers:

\[ a^b \bmod n = a^{b \bmod \varphi(n)} \bmod n \]

(when $a$ and $n$ are coprime)

6. Tiny Code

Primes up to n:

auto primes = sieve(100);

Totients up to n:

phi_sieve();
cout << phi[10]; // 4

Factorization:

auto f = factorize(60); // [2, 2, 3, 5]

7. Summary

Concept	Description	Time	Use
Eratosthenes	Mark multiples	(O$n \log \log n$)	Simple prime gen
Linear Sieve	Mark once	(O(n))	Prime + SPF
SPF Table	Smallest prime factor	(O(1)) query	Fast factorization
φ(n)	Coprime count	(O(n))	Modular exponent
Fermat / Euler	Inverses, reduction	(O$\log n$)	Modular arithmetic

Why It Matters

Sieve methods are the fastest way to preprocess arithmetic information. They unlock efficient solutions to problems involving primes, divisors, modular equations, and cryptography.

“Before you can reason about numbers, you must first sieve them clean.”

Try It Yourself

Generate all primes $\le 10^6$ using a linear sieve.
Factorize $840$ using the SPF array.
Compute $\varphi(n)$ for $n = 1..20$.
Verify $a^{\varphi(n)} \equiv 1 \pmod{n}$ for $a = 3$, $n = 10$.
Solve $a^b \bmod n$ with $b$ very large using $\varphi(n)$.

Sieve once, and modular math becomes effortless forever after.

56. Linear Algebra (Gaussian Elimination, LU, SVD)

Linear algebra gives algorithms their mathematical backbone. From solving equations to powering ML models, it’s the hidden engine behind optimization, geometry, and numerical computation.

In this section, we’ll focus on the algorithmic toolkit:

Gaussian Elimination (solve systems, invert matrices)
LU Decomposition (efficient repeated solving)
SVD (Singular Value Decomposition) overview

You’ll see how algebra becomes code , step by step.

1. Systems of Linear Equations

We want to solve: \[ A \cdot x = b \] where ( A ) is an $n \times n$ matrix, and ( x, b ) are vectors.

For example: \[\begin{cases} 2x + 3y = 8 \ x + 2y = 5 \end{cases}\]

The solution is the intersection of two lines. In general, $A^{-1}b$ gives ( x ), but we usually solve it more directly using Gaussian elimination.

2. Gaussian Elimination (Row Reduction)

Idea: Transform ( [A|b] ) (augmented matrix) into upper-triangular form, then back-substitute.

Steps:

For each row, select a pivot (non-zero leading element).
Eliminate below it using row operations.
After all pivots, back-substitute to get the solution.

A. Implementation (C)

const double EPS = 1e-9;

vector<double> gauss(vector<vector<double>> A, vector<double> b) {
    int n = A.size();
    for (int i = 0; i < n; i++) {
        // 1. Find pivot
        int pivot = i;
        for (int j = i + 1; j < n; j++)
            if (fabs(A[j][i]) > fabs(A[pivot][i]))
                pivot = j;
        swap(A[i], A[pivot]);
        swap(b[i], b[pivot]);

        // 2. Normalize pivot row
        double div = A[i][i];
        if (fabs(div) < EPS) continue;
        for (int k = i; k < n; k++) A[i][k] /= div;
        b[i] /= div;

        // 3. Eliminate below
        for (int j = i + 1; j < n; j++) {
            double factor = A[j][i];
            for (int k = i; k < n; k++) A[j][k] -= factor * A[i][k];
            b[j] -= factor * b[i];
        }
    }

    // 4. Back substitution
    vector<double> x(n);
    for (int i = n - 1; i >= 0; i--) {
        x[i] = b[i];
        for (int j = i + 1; j < n; j++)
            x[i] -= A[i][j] * x[j];
    }
    return x;
}

Time complexity: ( O$n^3$ )

B. Example

Solve: \[\begin{cases} 2x + 3y = 8 \ x + 2y = 5 \end{cases}\]

Augmented matrix: \[\begin{bmatrix} 2 & 3 & | & 8 \ 1 & 2 & | & 5 \end{bmatrix}\]

Reduce:

Row2 ← Row2 − ½ Row1 → $[1, 2 | 5] \to [0, 0.5 | 1]$- Back substitute → ( y = 2, x = 1 )

3. LU Decomposition

LU factorization expresses: \[ A = L \cdot U \] where ( L ) is lower-triangular (1s on diagonal), ( U ) is upper-triangular.

This allows solving ( A x = b ) in two triangular solves:

Solve ( L y = b )
Solve ( U x = y )

Efficient when solving for multiple b’s (same A).

A. Decomposition Algorithm

void lu_decompose(vector<vector<double>>& A, vector<vector<double>>& L, vector<vector<double>>& U) {
    int n = A.size();
    L.assign(n, vector<double>(n, 0));
    U.assign(n, vector<double>(n, 0));

    for (int i = 0; i < n; i++) {
        // Upper
        for (int k = i; k < n; k++) {
            double sum = 0;
            for (int j = 0; j < i; j++)
                sum += L[i][j] * U[j][k];
            U[i][k] = A[i][k] - sum;
        }
        // Lower
        for (int k = i; k < n; k++) {
            if (i == k) L[i][i] = 1;
            else {
                double sum = 0;
                for (int j = 0; j < i; j++)
                    sum += L[k][j] * U[j][i];
                L[k][i] = (A[k][i] - sum) / U[i][i];
            }
        }
    }
}

Solve with forward + backward substitution.

4. Singular Value Decomposition (SVD)

SVD generalizes diagonalization for non-square matrices: \[ A = U \Sigma V^T \]

Where:

( U ): left singular vectors (orthogonal)- $\Sigma$: diagonal of singular values- $V^T$: right singular vectors Applications:
Data compression (PCA)- Noise reduction- Rank estimation- Pseudoinverse $A^+ = V \Sigma^{-1} U^T$ In practice, use libraries (e.g. LAPACK, Eigen).

5. Numerical Stability and Pivoting

In floating-point math:

Always pick largest pivot (partial pivoting)- Avoid dividing by small numbers- Use EPS = 1e-9 threshold Small numerical errors can amplify quickly , stability is key.

6. Tiny Code

vector<vector<double>> A = {{2, 3}, {1, 2}};
vector<double> b = {8, 5};
auto x = gauss(A, b);
// Output: x = [1, 2]

7. Summary

Algorithm	Purpose	Complexity	Notes
Gaussian Elimination	Solve Ax=b	(O$n^3$)	Direct method
LU Decomposition	Repeated solves	(O$n^3$)	Triangular factorization
SVD	General decomposition	(O$n^3$)	Robust, versatile

Why It Matters

Linear algebra is the language of algorithms , it solves equations, optimizes functions, and projects data. Whether building solvers or neural networks, these methods are your foundation.

“Every algorithm lives in a vector space , it just needs a basis to express itself.”

Try It Yourself

Solve a 3×3 linear system with Gaussian elimination.
Implement LU decomposition and test $L \cdot U = A$.
Use LU to solve multiple ( b ) vectors.
Explore SVD using a math library; compute singular values of a 2×2 matrix.
Compare results between naive and pivoted elimination for unstable systems.

Start with row operations , and you’ll see how geometry and algebra merge into code.

57. FFT and NTT (Fast Transforms)

The Fast Fourier Transform (FFT) is one of the most beautiful and practical algorithms ever invented. It converts data between time (or coefficient) domain and frequency (or point) domain efficiently. The Number Theoretic Transform (NTT) is its modular counterpart for integer arithmetic , ideal for polynomial multiplication under a modulus.

This section covers:

Why we need transforms- Discrete Fourier Transform (DFT)- Cooley-Tukey FFT (complex numbers)- NTT (modular version)- Applications (polynomial multiplication, convolution)

1. Motivation

Suppose you want to multiply two polynomials: \[ A(x) = a_0 + a_1x + a_2x^2 \]

\[ B(x) = b_0 + b_1x + b_2x^2 \]

Their product has coefficients: \[ c_k = \sum_{i+j=k} a_i \cdot b_j \]

This is convolution: \[ C = A * B \] Naively, this takes ( O$n^2$ ). FFT reduces it to ( O$n \log n$ ).

2. Discrete Fourier Transform (DFT)

The DFT maps coefficients $a_0, a_1, \ldots, a_{n-1}$ to evaluations at ( n )-th roots of unity:

\[ A_k = \sum_{j=0}^{n-1} a_j \cdot e^{-2\pi i \cdot jk / n} \]

and the inverse transform recovers $a_j$ from $A_k$.

3. Cooley-Tukey FFT

Key idea: recursively split the sum into even and odd parts:

\[ A_k = A_{even}(w_n^2) + w_n^k \cdot A_{odd}(w_n^2) \]

Where $w_n = e^{-2\pi i / n}$ is an ( n )-th root of unity.

Implementation (C++)

#include <complex>
#include <vector>
#include <cmath>
using namespace std;

using cd = complex<double>;
const double PI = acos(-1);

void fft(vector<cd> &a, bool invert) {
    int n = a.size();
    for (int i = 1, j = 0; i < n; i++) {
        int bit = n >> 1;
        for (; j & bit; bit >>= 1) j ^= bit;
        j ^= bit;
        if (i < j) swap(a[i], a[j]);
    }

    for (int len = 2; len <= n; len <<= 1) {
        double ang = 2 * PI / len * (invert ? -1 : 1);
        cd wlen(cos(ang), sin(ang));
        for (int i = 0; i < n; i += len) {
            cd w(1);
            for (int j = 0; j < len / 2; j++) {
                cd u = a[i + j], v = a[i + j + len / 2] * w;
                a[i + j] = u + v;
                a[i + j + len / 2] = u - v;
                w *= wlen;
            }
        }
    }

    if (invert) {
        for (cd &x : a) x /= n;
    }
}

Polynomial Multiplication with FFT

vector<long long> multiply(vector<int> const& a, vector<int> const& b) {
    vector<cd> fa(a.begin(), a.end()), fb(b.begin(), b.end());
    int n = 1;
    while (n < (int)a.size() + (int)b.size()) n <<= 1;
    fa.resize(n);
    fb.resize(n);

    fft(fa, false);
    fft(fb, false);
    for (int i = 0; i < n; i++) fa[i] *= fb[i];
    fft(fa, true);

    vector<long long> result(n);
    for (int i = 0; i < n; i++)
        result[i] = llround(fa[i].real());
    return result;
}

Complexity: ( O$n \log n$ )

4. Number Theoretic Transform (NTT)

FFT uses complex numbers , NTT uses modular arithmetic with roots of unity mod p. We need a prime ( p ) such that: \[ p = c \cdot 2^k + 1 \] so a primitive root ( g ) exists.

Popular choices:

( p = 998244353, g = 3 )- ( p = 7340033, g = 3 )

Implementation (NTT)

const int MOD = 998244353;
const int G = 3;

int modpow(int a, int b) {
    long long res = 1;
    while (b) {
        if (b & 1) res = res * a % MOD;
        a = 1LL * a * a % MOD;
        b >>= 1;
    }
    return res;
}

void ntt(vector<int> &a, bool invert) {
    int n = a.size();
    for (int i = 1, j = 0; i < n; i++) {
        int bit = n >> 1;
        for (; j & bit; bit >>= 1) j ^= bit;
        j ^= bit;
        if (i < j) swap(a[i], a[j]);
    }
    for (int len = 2; len <= n; len <<= 1) {
        int wlen = modpow(G, (MOD - 1) / len);
        if (invert) wlen = modpow(wlen, MOD - 2);
        for (int i = 0; i < n; i += len) {
            long long w = 1;
            for (int j = 0; j < len / 2; j++) {
                int u = a[i + j];
                int v = (int)(a[i + j + len / 2] * w % MOD);
                a[i + j] = u + v < MOD ? u + v : u + v - MOD;
                a[i + j + len / 2] = u - v >= 0 ? u - v : u - v + MOD;
                w = w * wlen % MOD;
            }
        }
    }
    if (invert) {
        int inv_n = modpow(n, MOD - 2);
        for (int &x : a) x = 1LL * x * inv_n % MOD;
    }
}

5. Applications

Polynomial Multiplication: ( O$n \log n$ )
Convolution: digital signal processing
Big Integer Multiplication (Karatsuba, FFT)
Subset Convolution and combinatorial transforms
Number-theoretic sums (NTT-friendly modulus)

6. Tiny Code

vector<int> A = {1, 2, 3};
vector<int> B = {4, 5, 6};
// Result = {4, 13, 28, 27, 18}
auto C = multiply(A, B);

7. Summary

Algorithm	Domain	Complexity	Type
DFT	Complex	( O$n^2$ )	Naive
FFT	Complex	( O$n \log n$ )	Recursive
NTT	Modular	( O$n \log n$ )	Integer arithmetic

Why It Matters

FFT and NTT bring polynomial algebra to life. They turn slow convolutions into lightning-fast transforms. From multiplying huge integers to compressing signals, they’re the ultimate divide-and-conquer on structure.

“To multiply polynomials fast, you first turn them into music , then back again.”

Try It Yourself

Multiply ($1 + 2x + 3x^2$) and ($4 + 5x + 6x^2$) using FFT.
Implement NTT over 998244353 and verify results mod p.
Compare ( O$n^2$ ) vs FFT performance for n = 1024.
Experiment with inverse FFT (invert = true).
Explore circular convolution for signal data.

Once you master FFT/NTT, you hold the power of speed in algebraic computation.

58. Numerical Methods (Newton, Simpson, Runge-Kutta)

Numerical methods let us approximate solutions when exact algebraic answers are hard or impossible. They are the foundation of scientific computing, simulation, and optimization , bridging the gap between continuous math and discrete computation.

In this section, we’ll explore three classics:

Newton-Raphson: root finding- Simpson’s Rule: numerical integration- Runge-Kutta (RK4): solving differential equations These algorithms showcase how iteration, approximation, and convergence build powerful tools.

1. Newton-Raphson Method

Used to find a root of ( f(x) = 0 ). Starting from a guess $x_0$, iteratively refine:

\[ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} \]

Convergence is quadratic if ( f ) is smooth and $x_0$ is close enough.

A. Example

Solve ( f(x) = x^2 - 2 = 0 ) We know root = $\sqrt{2}$

Start $x_0 = 1$

Iter	$x_n$	(f$x_n$)	(f’$x_n$)	$x_{n+1}$
0	1.000	-1.000	2.000	1.500
1	1.500	0.250	3.000	1.417
2	1.417	0.006	2.834	1.414

Converged: $1.414 \approx \sqrt{2}$

B. Implementation

#include <math.h>
#include <stdio.h>

double f(double x) { return x * x - 2; }
double df(double x) { return 2 * x; }

double newton(double x0) {
    for (int i = 0; i < 20; i++) {
        double fx = f(x0);
        double dfx = df(x0);
        if (fabs(fx) < 1e-9) break;
        x0 = x0 - fx / dfx;
    }
    return x0;
}

int main() {
    printf("Root: %.6f\n", newton(1.0)); // 1.414214
}

Time Complexity: ( O(k) ) iterations, each ( O(1) )

2. Simpson’s Rule (Numerical Integration)

When you can’t integrate ( f(x) ) analytically, approximate the area under the curve.

Divide interval ([a, b]) into even ( n ) subintervals (width ( h )).

\[ I \approx \frac{h}{3} \Big( f(a) + 4 \sum f(x_{odd}) + 2 \sum f(x_{even}) + f(b) \Big) \]

A. Implementation

#include <math.h>
#include <stdio.h>

double f(double x) { return x * x; } // integrate x^2

double simpson(double a, double b, int n) {
    double h = (b - a) / n;
    double s = f(a) + f(b);
    for (int i = 1; i < n; i++) {
        double x = a + i * h;
        s += f(x) * (i % 2 == 0 ? 2 : 4);
    }
    return s * h / 3;
}

int main() {
    printf("∫₀¹ x² dx ≈ %.6f\n", simpson(0, 1, 100)); // ~0.333333
}

Accuracy: ( O$h^4$ ) Note: ( n ) must be even.

B. Example

\[ \int_0^1 x^2 dx = \frac{1}{3} \] With ( n = 100 ), Simpson gives ( 0.333333 ).

3. Runge-Kutta (RK4)

Used to solve first-order ODEs: \[ y' = f(x, y), \quad y(x_0) = y_0 \]

RK4 Formula:

\[\begin{aligned} k_1 &= f(x_n, y_n) \ k_2 &= f(x_n + \frac{h}{2}, y_n + \frac{h}{2}k_1) \ k_3 &= f(x_n + \frac{h}{2}, y_n + \frac{h}{2}k_2) \ k_4 &= f(x_n + h, y_n + hk_3) \ y_{n+1} &= y_n + \frac{h}{6}(k_1 + 2k_2 + 2k_3 + k_4) \end{aligned}\]

Accuracy: ( O$h^4$ )

A. Example

Solve ( y’ = x + y ), ( y(0) = 1 ), step ( h = 0.1 ).

Each iteration refines ( y ) with weighted slope average.

B. Implementation

#include <stdio.h>

double f(double x, double y) {
    return x + y;
}

double runge_kutta(double x0, double y0, double h, double xn) {
    double x = x0, y = y0;
    while (x < xn) {
        double k1 = f(x, y);
        double k2 = f(x + h / 2, y + h * k1 / 2);
        double k3 = f(x + h / 2, y + h * k2 / 2);
        double k4 = f(x + h, y + h * k3);
        y += h * (k1 + 2*k2 + 2*k3 + k4) / 6;
        x += h;
    }
    return y;
}

int main() {
    printf("y(0.1) ≈ %.6f\n", runge_kutta(0, 1, 0.1, 0.1));
}

4. Tiny Code Summary

Method	Purpose	Formula	Accuracy	Type
Newton-Raphson	Root finding	$x_{n+1}=x_n-\frac{f}{f'}$	Quadratic	Iterative
Simpson’s Rule	Integration	(h/3(…))	(O$h^4$)	Deterministic
Runge-Kutta (RK4)	ODEs	Weighted slope avg	(O$h^4$)	Iterative

5. Numerical Stability

Small step ( h ): better accuracy, more cost- Large ( h ): faster, less stable- Always check convergence ($|x_{n+1} - x_n| < \varepsilon$)- Avoid division by small derivatives in Newton’s method

Why It Matters

Numerical methods let computers simulate the continuous world. From physics to AI training, they solve what calculus cannot symbolically.

“When equations won’t talk, iterate , and they’ll whisper their answers.”

Try It Yourself

Use Newton’s method for $\cos x - x = 0$.
Approximate $\displaystyle \int_0^{\pi/2} \sin x\,dx$ with Simpson’s rule.
Solve $y' = y - x^2 + 1,\ y(0) = 0.5$ using RK4.
Compare RK4 with Euler’s method for the same ODE.
Experiment with step sizes $h \in \{0.1, 0.01, 0.001\}$ and observe convergence.

Numerical thinking turns continuous problems into iterative algorithms , precise enough to power every simulation and solver you’ll ever write.

59. Mathematical Optimization (Simplex, Gradient, Convex)

Mathematical optimization is about finding the best solution , smallest cost, largest profit, shortest path , under given constraints. It’s the heart of machine learning, operations research, and engineering design.

In this section, we’ll explore three pillars:

Simplex Method , for linear programs
Gradient Descent , for continuous optimization
Convex Optimization , the theory ensuring global optima

1. What Is Optimization?

A general optimization problem looks like:

\[ \min_x \ f(x) \] subject to constraints: \[ g_i(x) \le 0, \quad h_j(x) = 0 \]

When ( f ) and $g_i, h_j$ are linear, it’s a Linear Program (LP). When ( f ) is differentiable, we can use gradients. When ( f ) is convex, every local minimum is global , the ideal world.

2. The Simplex Method (Linear Programming)

A linear program has the form:

\[ \max \ c^T x \] subject to \[ A x \le b, \quad x \ge 0 \]

Geometrically, each constraint forms a half-space. The feasible region is a convex polytope, and the optimum lies at a vertex.

A. Example

Maximize ( z = 3x + 2y ) subject to \[\begin{cases} 2x + y \le 18 \ 2x + 3y \le 42 \ x, y \ge 0 \end{cases}\]

Solution: ( x=9, y=8 ), ( z=43 )

B. Algorithm (Sketch)

Convert inequalities to equalities by adding slack variables.
Initialize at a vertex (basic feasible solution).
At each step:
- Choose entering variable (most negative coefficient in objective). - Choose leaving variable (min ratio test). - Pivot to new vertex.4. Repeat until optimal.

C. Implementation (Simplified Pseudocode)

// Basic simplex-like outline
while (exists negative coefficient in objective row) {
    choose entering column j;
    choose leaving row i (min b[i]/a[i][j]);
    pivot(i, j);
}

Libraries (like GLPK or Eigen) handle full implementations.

Time Complexity: worst ( O$2^n$ ), but fast in practice.

3. Gradient Descent

For differentiable ( f(x) ), we move opposite the gradient:

\[ x_{k+1} = x_k - \eta \nabla f(x_k) \]

where $\eta$ is the learning rate.

Intuition: ( f(x) ) points uphill, so step opposite it.

A. Example

Minimize ( f(x) = (x-3)^2 )

\[ f'(x) = 2(x-3) \]

Start $x_0 = 0$, $\eta = 0.1$

Iter	$x_k$	(f$x_k$)	Gradient	New (x)
0	0	9	-6	0.6
1	0.6	5.76	-4.8	1.08
2	1.08	3.69	-3.84	1.46
…	→3	→0	→0	→3

Converges to ( x = 3 )

B. Implementation

#include <math.h>
#include <stdio.h>

double f(double x) { return (x - 3) * (x - 3); }
double df(double x) { return 2 * (x - 3); }

double gradient_descent(double x0, double lr) {
    for (int i = 0; i < 100; i++) {
        double g = df(x0);
        if (fabs(g) < 1e-6) break;
        x0 -= lr * g;
    }
    return x0;
}

int main() {
    printf("Min at x = %.6f\n", gradient_descent(0, 0.1));
}

C. Variants

Momentum: ( v = v + $1-\beta$f(x) )- Adam: adaptive learning rates- Stochastic Gradient Descent (SGD): random subset of data All used heavily in machine learning.

4. Convex Optimization

A function ( f ) is convex if: \[ f(\lambda x + (1-\lambda)y) \le \lambda f(x) + (1-\lambda)f(y) \]

This means any local minimum is global.

Examples:

( f(x) = x^2 ) (convex)- ( f(x) = x^3 ) (not convex) For convex functions with linear constraints, gradient-based methods always converge to the global optimum.

A. Checking Convexity

1D: ( f’’(x) )- Multivariate: Hessian ( ^2 f(x) ) is positive semidefinite

5. Applications

Linear Programming (Simplex): logistics, scheduling- Quadratic Programming: portfolio optimization- Gradient Methods: ML, curve fitting- Convex Programs: control systems, regularization

6. Tiny Code

Simple gradient descent to minimize ( f(x,y)=x^2+y2 ):

double f(double x, double y) { return x*x + y*y; }
void grad(double x, double y, double *gx, double *gy) {
    *gx = 2*x; *gy = 2*y;
}

void optimize() {
    double x=5, y=3, lr=0.1;
    for(int i=0; i<100; i++){
        double gx, gy;
        grad(x, y, &gx, &gy);
        x -= lr * gx;
        y -= lr * gy;
    }
    printf("Min at (%.3f, %.3f)\n", x, y);
}

7. Summary

Algorithm	Domain	Complexity	Notes
Simplex	Linear	Polynomial (average case)	LP solver
Gradient Descent	Continuous	$O(k)$	Needs step size
Convex Methods	Convex	$O(k \log \frac{1}{\varepsilon})$	Global optima guaranteed

Why It Matters

Optimization turns math into decisions. From fitting curves to planning resources, it formalizes trade-offs and efficiency. It’s where computation meets purpose , finding the best in all possible worlds.

“Every algorithm is, at heart, an optimizer , searching for something better.”

Try It Yourself

Solve a linear program with 2 constraints manually via Simplex.
Implement gradient descent for $f(x) = (x - 5)^2 + 2$.
Add momentum to your gradient descent loop.
Check convexity by plotting $f(x) = x^4 - 3x^2$.
Experiment with learning rates: too small leads to slow convergence; too large can diverge.

Mastering optimization means mastering how algorithms improve themselves , step by step, iteration by iteration.

60. Algebraic Tricks and Transform Techniques

In algorithm design, algebra isn’t just theory , it’s a toolbox for transforming problems. By expressing computations algebraically, we can simplify, accelerate, or generalize solutions. This section surveys common algebraic techniques that turn hard problems into manageable ones.

We’ll explore:

Algebraic identities and factorizations
Generating functions and transforms
Convolution tricks
Polynomial methods and FFT applications
Matrix and linear transforms for acceleration

1. Algebraic Identities

These let you rewrite or decompose expressions to reveal structure or reduce complexity.

Classic Forms:

Difference of squares: \[ a^2 - b^2 = (a-b)(a+b) \]
Sum of cubes: \[ a^3 + b^3 = (a+b)(a^2 - ab + b^2) \]
Square of sum: \[ (a+b)^2 = a^2 + 2ab + b^2 \]

Used in dynamic programming, geometry, and optimization when simplifying recurrence terms or constraints.

Example: Transforming $(x+y)^2$ lets you compute both $x^2 + y^2$ and cross terms efficiently.

2. Generating Functions

A generating function encodes a sequence $a_0, a_1, a_2, \ldots$ into a formal power series:

\[ G(x) = a_0 + a_1x + a_2x^2 + \ldots \]

They turn recurrence relations and counting problems into algebraic equations.

Example: Fibonacci sequence \[ F(x) = F_0 + F_1x + F_2x^2 + \ldots \] with recurrence $F_n = F_{n-1} + F_{n-2}$

Solve algebraically: \[ F(x) = \frac{x}{1 - x - x^2} \]

Applications: combinatorics, probability, counting partitions.

3. Convolution Tricks

Convolution arises in combining sequences: \[ (c_n) = (a * b)*n = \sum*{i=0}^{n} a_i b_{n-i} \]

Naive computation: ( O$n^2$ ) Using Fast Fourier Transform (FFT): ( O$n \log n$ )

Example: Polynomial multiplication Let \[ A(x) = a_0 + a_1x + a_2x^2, \quad B(x) = b_0 + b_1x + b_2x^2 \] Then ( C(x) = A(x)B(x) ) gives coefficients by convolution.

This trick is used in:

Large integer multiplication- Pattern matching (cross-correlation)- Subset sum acceleration

4. Polynomial Methods

Many algorithmic problems can be represented as polynomials, where coefficients encode combinatorial structure.

A. Polynomial Interpolation

Given ( n+1 ) points, there’s a unique degree-( n ) polynomial passing through them.

Used in error correction, FFT-based reconstruction, and number-theoretic transforms.

Lagrange Interpolation: \[ P(x) = \sum_i y_i \prod_{j \ne i} \frac{x - x_j}{x_i - x_j} \]

B. Root Representation

Solve equations or check identities by working modulo a polynomial. Used in finite fields and coding theory (e.g., Reed-Solomon).

5. Transform Techniques

Transforms convert problems to simpler domains where operations become efficient.

Transform	Converts	Key Property	Used In
FFT / NTT	Time ↔︎ Frequency	Convolution → Multiplication	Signal, polynomial mult
Z-Transform	Sequence ↔︎ Function	Recurrence solving	DSP, control
Laplace Transform	Function ↔︎ Algebraic	Diff. eq. → Algebraic eq.	Continuous systems
Walsh-Hadamard Transform	Boolean vectors	XOR convolution	Subset sum, SOS DP

Example: Subset Convolution via FWT

For all subsets ( S ): \[ f'(S) = \sum_{T \subseteq S} f(T) \]

Use Fast Walsh-Hadamard Transform (FWHT) to compute in ( O$n2^n$ ) instead of ( O$3^n$ ).

6. Matrix Tricks

Matrix algebra enables transformations and compact formulations.

Matrix exponentiation: solve recurrences in $O(\log n)$
Diagonalization: $A = P D P^{-1}$, then $A^k = P D^k P^{-1}$
Fast power: speeds up Fibonacci, linear recurrences, Markov chains

Example: Fibonacci

\[ \begin{bmatrix} F_{n+1} \\ F_n \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}^n \begin{bmatrix} 1 \\ 0 \end{bmatrix} \]

7. Tiny Code

Polynomial Multiplication via FFT (Pseudo-C):

// Outline using complex FFT library
fft(A, false);
fft(B, false);
for (int i = 0; i < n; i++)
    C[i] = A[i] * B[i];
fft(C, true); // inverse

Matrix Power (Fibonacci):

void matmul(long long A[2][2], long long B[2][2]) {
    long long C[2][2] = {{0}};
    for (int i=0;i<2;i++)
        for (int j=0;j<2;j++)
            for (int k=0;k<2;k++)
                C[i][j] += A[i][k]*B[k][j];
    memcpy(A, C, sizeof(C));
}

void matpow(long long A[2][2], int n) {
    long long R[2][2] = {{1,0},{0,1}};
    while(n){
        if(n&1) matmul(R,A);
        matmul(A,A);
        n>>=1;
    }
    memcpy(A, R, sizeof(R));
}

8. Summary

Technique	Purpose	Speedup
Algebraic Identities	Simplify expressions	Constant factor
Generating Functions	Solve recurrences	Conceptual
FFT / Convolution	Combine sequences fast	(O$n^2$ O$n \log n$)
Polynomial Interpolation	Reconstruction	(O$n^2$ O$n \log^2 n$)
Matrix Tricks	Accelerate recurrences	(O(n) O$\log n$)

Why It Matters

Algebra turns computation into structure. By rewriting problems in algebraic form, you reveal hidden symmetries, exploit fast transforms, and find elegant solutions. It’s not magic , it’s the math beneath performance.

“The smartest code is often the one that solves itself on paper first.”

Try It Yourself

Multiply two polynomials using FFT.
Represent Fibonacci as a matrix and compute $F_{100}$.
Use generating functions to count coin change ways.
Implement subset sum via Walsh-Hadamard transform.
Derive a recurrence and solve it algebraically.

Understanding algebraic tricks makes you not just a coder, but a mathematical engineer , bending structure to will.

Chapter 7. Strings and Text Algorithms

61. String Matching (KMP, Z, Rabin-Karp, Boyer-Moore)

String matching is one of the oldest and most fundamental problems in computer science: given a text ( T ) of length ( n ) and a pattern ( P ) of length ( m ), find all positions where ( P ) appears in ( T ).

This section walks you through both naive and efficient algorithms , from the straightforward brute-force method to elegant linear-time solutions like KMP and Z-algorithm, and clever heuristics like Boyer-Moore and Rabin-Karp.

1. Problem Setup

We’re given:

Text: $T = t_1 t_2 \ldots t_n$- Pattern: $P = p_1 p_2 \ldots p_m$ Goal: find all ( i ) such that \[ T[i \ldots i+m-1] = P[1 \ldots m] \]

Naive solution: compare ( P ) with every substring of ( T ) Time complexity: ( O(nm) )

We’ll now see how to reduce it to ( O(n + m) ) or close.

2. Knuth-Morris-Pratt (KMP)

KMP avoids rechecking characters by precomputing overlaps within the pattern.

It builds a prefix-function (also called failure function), which tells how much to shift when a mismatch happens.

A. Prefix Function

For each position ( i ), compute $\pi[i]$ = length of longest prefix that’s also a suffix of ( P[1..i] ).

Example: Pattern ababc

i	P[i]	π[i]
1	a	0
2	b	0
3	a	1
4	b	2
5	c	0

B. Search Phase

Use $\pi[]$ to skip mismatched prefixes in the text.

Time Complexity: ( O(n + m) ) Space: ( O(m) )

Tiny Code (C)

void compute_pi(char *p, int m, int pi[]) {
    pi[0] = 0;
    for (int i = 1, k = 0; i < m; i++) {
        while (k > 0 && p[k] != p[i]) k = pi[k-1];
        if (p[k] == p[i]) k++;
        pi[i] = k;
    }
}

void kmp_search(char *t, char *p) {
    int n = strlen(t), m = strlen(p);
    int pi[m]; compute_pi(p, m, pi);
    for (int i = 0, k = 0; i < n; i++) {
        while (k > 0 && p[k] != t[i]) k = pi[k-1];
        if (p[k] == t[i]) k++;
        if (k == m) {
            printf("Found at %d\n", i - m + 1);
            k = pi[k-1];
        }
    }
}

3. Z-Algorithm

Z-algorithm computes the Z-array,
where $Z[i]$ = length of the longest substring starting at $i$ that matches the prefix of $P$.

To match $P$ in $T$, build the string:

\[ S = P + \# + T \]

Then every $i$ where $Z[i] = |P|$ corresponds to a match.

Time: $O(n + m)$
Simple and elegant.

Example:

P = "aba", T = "ababa"
S = "aba#ababa"
Z = [0,0,1,0,3,0,1,0]
Match at index 0, 2

4. Rabin-Karp (Rolling Hash)

Instead of comparing strings character-by-character, compute a hash for each window in ( T ), and compare hashes.

\[ h(s_1s_2\ldots s_m) = (s_1b^{m-1} + s_2b^{m-2} + \ldots + s_m) \bmod M \]

Use a rolling hash to update in ( O(1) ) per shift.

Time: average ( O(n + m) ), worst ( O(nm) ) Good for multiple pattern search.

Tiny Code (Rolling Hash)

#define B 256
#define M 101

void rabin_karp(char *t, char *p) {
    int n = strlen(t), m = strlen(p);
    int h = 1, pHash = 0, tHash = 0;
    for (int i = 0; i < m-1; i++) h = (h*B) % M;
    for (int i = 0; i < m; i++) {
        pHash = (B*pHash + p[i]) % M;
        tHash = (B*tHash + t[i]) % M;
    }
    for (int i = 0; i <= n-m; i++) {
        if (pHash == tHash && strncmp(&t[i], p, m) == 0)
            printf("Found at %d\n", i);
        if (i < n-m)
            tHash = (B*(tHash - t[i]*h) + t[i+m]) % M;
        if (tHash < 0) tHash += M;
    }
}

5. Boyer-Moore (Heuristic Skipping)

Boyer-Moore compares from right to left and uses two heuristics:

Bad Character Rule When mismatch at ( j ), shift pattern so next occurrence of ( T[i] ) in ( P ) aligns.
Good Suffix Rule Shift pattern so a suffix of matched portion aligns with another occurrence.

Time: ( O(n/m) ) on average Practical and fast, especially for English text.

6. Summary

Algorithm	Time	Space	Idea	Best For
Naive	(O(nm))	(O(1))	Direct compare	Simple cases
KMP	(O(n+m))	(O(m))	Prefix overlap	General use
Z	(O(n+m))	(O(n+m))	Prefix matching	Pattern prep
Rabin-Karp	(O(n+m)) avg	(O(1))	Hashing	Multi-pattern
Boyer-Moore	(O(n/m)) avg	(O$m+\sigma$)	Right-to-left skip	Long texts

Why It Matters

String matching powers text editors, DNA search, spam filters, and search engines. These algorithms show how structure and clever preprocessing turn brute force into elegance.

“To find is human, to match efficiently is divine.”

Try It Yourself

Implement KMP and print all matches in a sentence.
Use Rabin-Karp to find multiple keywords.
Compare running times on large text files.
Modify KMP for case-insensitive matching.
Visualize prefix function computation step-by-step.

By mastering these, you’ll wield the foundation of pattern discovery , the art of finding order in streams of symbols.

62. Multi-Pattern Search (Aho-Corasick)

So far, we’ve matched one pattern against a text. But what if we have many patterns , say, a dictionary of keywords , and we want to find all occurrences of all patterns in a single pass?

That’s where the Aho-Corasick algorithm shines. It builds a trie with failure links, turning multiple patterns into one efficient automaton. Think of it as “KMP for many words at once.”

1. Problem Setup

Given:

A text ( T ) of length ( n )- A set of patterns ${ P_1, P_2, \ldots, P_k }$ with total length $m = \sum |P_i|$

Goal: find all occurrences of every $P_i$ in ( T ).

Naive solution: Run KMP for each pattern , ( O(kn) )

Better idea: Merge all patterns into a trie, and use failure links to transition on mismatches.

Aho-Corasick achieves O(n + m + z), where ( z ) = number of matches reported.

2. Trie Construction

Each pattern is inserted into a trie node-by-node.

Example Patterns:

he, she, his, hers

Trie:

(root)
 ├─ h ─ e*
 │   └─ r ─ s*
 ├─ s ─ h ─ e*
 └─ h ─ i ─ s*

Each node may mark an output (end of pattern).

3. Failure Links

Failure link of a node points to the longest proper suffix that’s also a prefix in the trie.

These links let us “fall back” like KMP.

When mismatch happens, follow failure link to find next possible match.

Building Failure Links (BFS)

Root’s failure = null
Children of root → failure = root
BFS over nodes:
- For each edge ( (u, c) → v ): follow failure links from ( u ) until you find ( f ) with edge ( c ) then $v.\text{fail} = f.c$

Example

For “he”, “she”, “his”, “hers”:

fail("he") = root- fail("hers") = "rs" path invalid → fallback to "s" if exists So failure links connect partial suffixes.

4. Matching Phase

Now we can process the text in one pass:

state = root
for each character c in text:
    while state has no child c and state != root:
        state = state.fail
    if state has child c:
        state = state.child[c]
    else:
        state = root
    if state.output:
        report matches at this position

Each transition costs O(1) amortized. No backtracking , fully linear time.

5. Example Walkthrough

Patterns: he, she, his, hers Text: ahishers

At each character:

a → root (no match)
h → go to h
i → go to hi
s → go to his → output "his"
h → fallback → h
e → he → output "he"
r → her → continue
s → hers → output "hers"

Outputs: "his", "he", "hers"

6. Tiny Code (C Implementation Sketch)

#define ALPHA 26

typedef struct Node {
    struct Node *next[ALPHA];
    struct Node *fail;
    int out;
} Node;

Node* newNode() {
    Node *n = calloc(1, sizeof(Node));
    return n;
}

void insert(Node *root, char *p) {
    for (int i = 0; p[i]; i++) {
        int c = p[i] - 'a';
        if (!root->next[c]) root->next[c] = newNode();
        root = root->next[c];
    }
    root->out = 1;
}

void build_failures(Node *root) {
    Node *q[10000];
    int front=0, back=0;
    root->fail = root;
    q[back++] = root;
    while (front < back) {
        Node *u = q[front++];
        for (int c=0; c<ALPHA; c++) {
            Node *v = u->next[c];
            if (!v) continue;
            Node *f = u->fail;
            while (f != root && !f->next[c]) f = f->fail;
            if (f->next[c] && f->next[c] != v) v->fail = f->next[c];
            else v->fail = root;
            if (v->fail->out) v->out = 1;
            q[back++] = v;
        }
    }
}

7. Complexity

Phase	Time	Space
Trie Build	( O(m) )	( O(m) )
Failure Links	( O(m) )	( O(m) )
Search	( O(n + z) )	( O(1) )

Total: O(n + m + z)

8. Summary

Step	Purpose
Trie	Merge patterns
Fail Links	Handle mismatches
Outputs	Collect matches
BFS	Build efficiently
One Pass	Match all patterns

Why It Matters

Aho-Corasick is the core of:

Spam filters- Intrusion detection (e.g., Snort IDS)- Keyword search in compilers- DNA sequence scanners It’s a masterclass in blending automata theory with practical efficiency.

“Why search one word at a time when your algorithm can read the whole dictionary?”

Try It Yourself

Build an automaton for words {“he”, “she”, “hers”} and trace it manually.
Modify code for uppercase letters.
Extend to report overlapping matches.
Measure runtime vs. naive multi-search.
Visualize the failure links in a graph.

Once you grasp Aho-Corasick, you’ll see pattern search not as a loop , but as a machine that reads and recognizes.

63. Suffix Structures (Suffix Array, Suffix Tree, LCP)

Suffix-based data structures are among the most powerful tools in string algorithms. They enable fast searching, substring queries, pattern matching, and lexicographic operations , all from one fundamental idea:

Represent all suffixes of a string in a structured form.

In this section, we explore three key constructs:

Suffix Array (SA) - lexicographically sorted suffix indices- Longest Common Prefix (LCP) array - shared prefix lengths between neighbors- Suffix Tree - compressed trie of all suffixes Together, they power many advanced algorithms in text processing, bioinformatics, and compression.

1. Suffix Array (SA)

A suffix array stores all suffixes of a string in lexicographic order, represented by their starting indices.

Example: String banana$ All suffixes:

Index	Suffix
0	banana$
1	anana$
2	nana$
3	ana$
4	na$
5	a$
6	$

Sort them:

Sorted Order	Suffix	Index
0	`$`	6
1	`a$`	5
2	`ana$`	3
3	`anana$`	1
4	`banana$`	0
5	`na$`	4
6	`nana$`	2

Suffix Array: [6, 5, 3, 1, 0, 4, 2]

Construction (Prefix Doubling)

We iteratively sort suffixes by first 2ⁱ characters, using radix sort on pairs of ranks.

Steps:

Assign initial rank by character.
Sort by (rank[i], rank[i+k]).
Repeat doubling $k \leftarrow 2k$ until all ranks distinct.

Time Complexity: ( O$n \log n$ ) Space: ( O(n) )

Tiny Code (C, Sketch)

typedef struct { int idx, rank[2]; } Suffix;
int cmp(Suffix a, Suffix b) {
    return (a.rank[0]==b.rank[0]) ? (a.rank[1]-b.rank[1]) : (a.rank[0]-b.rank[0]);
}

void buildSA(char *s, int n, int sa[]) {
    Suffix suf[n];
    for (int i = 0; i < n; i++) {
        suf[i].idx = i;
        suf[i].rank[0] = s[i];
        suf[i].rank[1] = (i+1<n) ? s[i+1] : -1;
    }
    for (int k = 2; k < 2*n; k *= 2) {
        qsort(suf, n, sizeof(Suffix), cmp);
        int r = 0, rank[n]; rank[suf[0].idx]=0;
        for (int i=1;i<n;i++) {
            if (suf[i].rank[0]!=suf[i-1].rank[0] || suf[i].rank[1]!=suf[i-1].rank[1]) r++;
            rank[suf[i].idx]=r;
        }
        for (int i=0;i<n;i++){
            suf[i].rank[0] = rank[suf[i].idx];
            suf[i].rank[1] = (suf[i].idx+k/2<n)?rank[suf[i].idx+k/2]:-1;
        }
    }
    for (int i=0;i<n;i++) sa[i]=suf[i].idx;
}

2. Longest Common Prefix (LCP)

The LCP array stores the length of the longest common prefix between consecutive suffixes in SA order.

Example: banana$

SA	Suffix	LCP
6	$	0
5	a$	0
3	ana$	1
1	anana$	3
0	banana$	0
4	na$	0
2	nana$	2

So LCP = [0,0,1,3,0,0,2]

Kasai’s Algorithm (Build in O(n))

We compute LCP in one pass using inverse SA:

void buildLCP(char *s, int n, int sa[], int lcp[]) {
    int rank[n];
    for (int i=0;i<n;i++) rank[sa[i]]=i;
    int k=0;
    for (int i=0;i<n;i++) {
        if (rank[i]==n-1) { k=0; continue; }
        int j = sa[rank[i]+1];
        while (i+k<n && j+k<n && s[i+k]==s[j+k]) k++;
        lcp[rank[i]]=k;
        if (k>0) k--;
    }
}

Time Complexity: ( O(n) )

3. Suffix Tree

A suffix tree is a compressed trie of all suffixes.

Each edge holds a substring interval, not individual characters. This gives:

Construction in ( O(n) ) (Ukkonen’s algorithm)- Pattern search in ( O(m) )- Many advanced uses (e.g., longest repeated substring) Example: String: banana$ Suffix tree edges:

(root)
 ├─ b[0:0] → ...
 ├─ a[1:1] → ...
 ├─ n[2:2] → ...

Edges compress consecutive letters into intervals like [start:end].

Comparison

Structure	Space	Build Time	Search
Suffix Array	( O(n) )	( O$n \log n$ )	( O$m \log n$ )
LCP Array	( O(n) )	( O(n) )	Range queries
Suffix Tree	( O(n) )	( O(n) )	( O(m) )

Suffix Array + LCP ≈ compact Suffix Tree.

4. Applications

Substring search - binary search in SA
Longest repeated substring - max(LCP)
Lexicographic order - direct from SA
Distinct substrings count = ( n(n+1)/2 - LCP[i] )
Pattern frequency - range query in SA using LCP

5. Tiny Code (Search via SA)

int searchSA(char *t, int n, char *p, int sa[]) {
    int l=0, r=n-1, m=strlen(p);
    while (l <= r) {
        int mid = (l+r)/2;
        int cmp = strncmp(t+sa[mid], p, m);
        if (cmp==0) return sa[mid];
        else if (cmp<0) l=mid+1;
        else r=mid-1;
    }
    return -1;
}

6. Summary

Concept	Purpose	Complexity
Suffix Array	Sorted suffix indices	( O$n \log n$ )
LCP Array	Adjacent suffix overlap	( O(n) )
Suffix Tree	Compressed trie of suffixes	( O(n) )

Together they form the core of advanced string algorithms.

Why It Matters

Suffix structures reveal hidden order in strings. They turn raw text into searchable, analyzable data , ideal for compression, search engines, and DNA analysis.

“All suffixes, perfectly sorted , the DNA of text.”

Try It Yourself

Build suffix array for banana$ by hand.
Write code to compute LCP and longest repeated substring.
Search multiple patterns using binary search on SA.
Count distinct substrings from SA + LCP.
Compare SA-based vs. tree-based search performance.

Mastering suffix structures equips you to tackle problems that were once “too big” for brute force , now solvable with elegance and order.

64. Palindromes and Periodicity (Manacher)

Palindromes are symmetric strings that read the same forwards and backwards , like “level”, “racecar”, or “madam”. They arise naturally in text analysis, bioinformatics, and even in data compression.

This section introduces efficient algorithms to detect and analyze palindromic structure and periodicity in strings, including the legendary Manacher’s Algorithm, which finds all palindromic substrings in linear time.

1. What Is a Palindrome?

A string ( S ) is a palindrome if: \[ S[i] = S[n - i + 1] \quad \text{for all } i \]

Examples:

"abba" is even-length palindrome- "aba" is odd-length palindrome A string may contain many palindromic substrings , our goal is to find all centers efficiently.

2. Naive Approach

For each center (between characters or at characters), expand outward while characters match.

for each center c:
    expand left, right while S[l] == S[r]

Complexity: ( O$n^2$ ) , too slow for large strings.

We need something faster , that’s where Manacher’s Algorithm steps in.

3. Manacher’s Algorithm (O(n))

Manacher’s Algorithm finds the radius of the longest palindrome centered at each position in linear time.

It cleverly reuses previous computations using mirror symmetry and a current right boundary.

Step-by-Step

Preprocess string to handle even-length palindromes: Insert # between characters.

Example:
```
S = "abba"
T = "^#a#b#b#a#$"
```
(^ and $ are sentinels)
Maintain:
- C: center of rightmost palindrome - R: right boundary - P[i]: palindrome radius at i
For each position i:
- mirror position mirror = 2*C - i - initialize P[i] = min(R - i, P[mirror]) - expand around i while characters match - if new palindrome extends past R, update C and R
The maximum value of P[i] gives the longest palindrome.

Example

S = "abba"
T = "^#a#b#b#a#$"
P = [0,0,1,0,3,0,3,0,1,0,0]
Longest radius = 3 → "abba"

Tiny Code (C Implementation)

int manacher(char *s) {
    int n = strlen(s);
    char t[2*n + 3];
    int p[2*n + 3];
    int m = 0;
    t[m++] = '^';
    for (int i=0;i<n;i++) {
        t[m++] = '#';
        t[m++] = s[i];
    }
    t[m++] = '#'; t[m++] = '$';
    t[m] = '\0';
    
    int c = 0, r = 0, maxLen = 0;
    for (int i=1; i<m-1; i++) {
        int mirror = 2*c - i;
        if (i < r)
            p[i] = (r - i < p[mirror]) ? (r - i) : p[mirror];
        else p[i] = 0;
        while (t[i + 1 + p[i]] == t[i - 1 - p[i]])
            p[i]++;
        if (i + p[i] > r) {
            c = i;
            r = i + p[i];
        }
        if (p[i] > maxLen) maxLen = p[i];
    }
    return maxLen;
}

Time Complexity: ( O(n) ) Space: ( O(n) )

4. Periodicity and Repetition

A string ( S ) has a period ( p ) if: \[ S[i] = S[i + p] \text{ for all valid } i \]

Example: abcabcabc has period 3 (abc).

Checking Periodicity:

Build prefix function (as in KMP).
Let ( n = |S| ), $p = n - \pi[n-1]$.
If $n \mod p = 0$, period = ( p ).

Example:

S = "ababab"
π = [0,0,1,2,3,4]
p = 6 - 4 = 2
6 mod 2 = 0 → periodic

Tiny Code (Check Periodicity)

int period(char *s) {
    int n = strlen(s), pi[n];
    pi[0]=0;
    for(int i=1,k=0;i<n;i++){
        while(k>0 && s[k]!=s[i]) k=pi[k-1];
        if(s[k]==s[i]) k++;
        pi[i]=k;
    }
    int p = n - pi[n-1];
    return (n % p == 0) ? p : n;
}

5. Applications

Palindrome Queries: is substring ( S[l:r] ) palindrome? → precompute radii- Longest Palindromic Substring- DNA Symmetry Analysis- Pattern Compression / Period Detection- String Regularity Tests

6. Summary

Concept	Purpose	Time
Naive Expand	Simple palindrome check	( O$n^2$ )
Manacher	Longest palindromic substring	( O(n) )
KMP Prefix	Period detection	( O(n) )

Why It Matters

Palindromes reveal hidden symmetries. Manacher’s algorithm is a gem , a linear-time mirror-based solution to a quadratic problem.

“In every word, there may hide a reflection.”

Try It Yourself

Run Manacher’s algorithm on "abacdfgdcaba".
Modify code to print all palindromic substrings.
Use prefix function to find smallest period.
Combine both to find palindromic periodic substrings.
Compare runtime vs. naive expand method.

Understanding palindromes and periodicity teaches how structure emerges from repetition , a central theme in all of algorithmic design.

65. Edit Distance and Alignment

Edit distance measures how different two strings are , the minimal number of operations needed to turn one into the other. It’s a cornerstone of spell checking, DNA sequence alignment, plagiarism detection, and fuzzy search.

The most common form is the Levenshtein distance, using:

Insertion (add a character)- Deletion (remove a character)- Substitution (replace a character) We’ll also touch on alignment, which generalizes this idea with custom scoring and penalties.

1. Problem Definition

Given two strings ( A ) and ( B ), find the minimum number of edits to convert $A \to B$.

If ( A = “kitten” ) ( B = “sitting” )

One optimal sequence:

kitten → sitten (substitute 'k'→'s')
sitten → sittin (substitute 'e'→'i')
sittin → sitting (insert 'g')

So edit distance = 3.

2. Dynamic Programming Solution

Let $dp[i][j]$ be the minimum edits to convert $A[0..i-1] \to B[0..j-1]$.

Recurrence: \[ dp[i][j] = \begin{cases} dp[i-1][j-1], & \text{if } A[i-1] = B[j-1], \\ 1 + \min\big(dp[i-1][j],\, dp[i][j-1],\, dp[i-1][j-1]\big), & \text{otherwise} \end{cases} \]

Where: - $dp[i-1][j]$: delete from $A$ - $dp[i][j-1]$: insert into $A$ - $dp[i-1][j-1]$: substitute

Base cases: \[ dp[0][j] = j,\quad dp[i][0] = i \]

Time complexity: $O(|A||B|)$

Example

A = kitten, B = sitting

	“”	s	i	t	t	i	n	g
“”	0	1	2	3	4	5	6	7
k	1	1	2	3	4	5	6	7
i	2	2	1	2	3	4	5	6
t	3	3	2	1	2	3	4	5
t	4	4	3	2	1	2	3	4
e	5	5	4	3	2	2	3	4
n	6	6	5	4	3	3	2	3

Edit distance = 3

Tiny Code (C)

#include <stdio.h>
#include <string.h>
#define MIN3(a,b,c) ((a<b)?((a<c)?a:c):((b<c)?b:c))

int edit_distance(char *A, char *B) {
    int n = strlen(A), m = strlen(B);
    int dp[n+1][m+1];
    for (int i=0;i<=n;i++) dp[i][0]=i;
    for (int j=0;j<=m;j++) dp[0][j]=j;
    for (int i=1;i<=n;i++)
        for (int j=1;j<=m;j++)
            if (A[i-1]==B[j-1])
                dp[i][j]=dp[i-1][j-1];
            else
                dp[i][j]=1+MIN3(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]);
    return dp[n][m];
}

int main() {
    printf("%d\n", edit_distance("kitten","sitting")); // 3
}

3. Space Optimization

We only need the previous row to compute the current row.

So,
Space complexity: $O(\min(|A|, |B|))$

int edit_distance_opt(char *A, char *B) {
    int n=strlen(A), m=strlen(B);
    int prev[m+1], curr[m+1];
    for(int j=0;j<=m;j++) prev[j]=j;
    for(int i=1;i<=n;i++){
        curr[0]=i;
        for(int j=1;j<=m;j++){
            if(A[i-1]==B[j-1]) curr[j]=prev[j-1];
            else curr[j]=1+MIN3(prev[j], curr[j-1], prev[j-1]);
        }
        memcpy(prev,curr,sizeof(curr));
    }
    return prev[m];
}

4. Alignment

Alignment shows which characters correspond between two strings. Used in bioinformatics (e.g., DNA sequence alignment).

Each operation has a cost:

Match: 0
Mismatch: 1
Gap (insert/delete): 1 We fill the DP table similarly, but track choices to trace back alignment.

Example Alignment

A: kitten-
B: sitt-ing

We can visualize the transformation path by backtracking dp table.

Scoring Alignment (General Form)

We can generalize: \[ dp[i][j] = \min \begin{cases} dp[i-1][j-1] + cost(A_i,B_j) \ dp[i-1][j] + gap \ dp[i][j-1] + gap \end{cases} \]

Used in Needleman-Wunsch (global alignment) and Smith-Waterman (local alignment).

5. Variants

Damerau-Levenshtein: adds transposition (swap adjacent chars)- Hamming Distance: only substitutions, equal-length strings- Weighted Distance: different operation costs- Local Alignment: only best matching substrings

6. Summary

Method	Operations	Time	Use
Levenshtein	insert, delete, replace	(O(nm))	Spell check, fuzzy search
Hamming	substitution only	(O(n))	DNA, binary strings
Alignment (Needleman-Wunsch)	with scoring	(O(nm))	Bioinformatics
Local Alignment (Smith-Waterman)	best substring	(O(nm))	DNA regions

Why It Matters

Edit distance transforms “difference” into data. It quantifies how far apart two strings are, enabling flexible, robust comparisons.

“Similarity isn’t perfection , it’s the cost of becoming alike.”

Try It Yourself

Compute edit distance between “intention” and “execution”.
Trace back operations to show alignment.
Modify costs (insertion=2, deletion=1, substitution=2) and compare results.
Implement Hamming distance for equal-length strings.
Explore Smith-Waterman for longest common substring.

Once you master edit distance, you can build tools that understand typos, align genomes, and search imperfectly , perfectly.

66. Compression (Huffman, Arithmetic, LZ77, BWT)

Compression algorithms let us encode information efficiently, reducing storage or transmission cost without losing meaning. They turn patterns and redundancy into shorter representations , the essence of data compression.

This section introduces the key families of lossless compression algorithms that form the backbone of formats like ZIP, PNG, and GZIP.

We’ll explore:

Huffman Coding (prefix-free variable-length codes)
Arithmetic Coding (fractional interval encoding)
LZ77 / LZ78 (dictionary-based methods)
Burrows-Wheeler Transform (BWT) (reversible sorting transform)

1. Huffman Coding

Huffman coding assigns shorter codes to frequent symbols, and longer codes to rare ones , achieving optimal compression among prefix-free codes.

A. Algorithm

Count frequencies of all symbols.
Build a min-heap of nodes (symbol, freq).
While heap size > 1:
- Extract two smallest nodes a, b. - Create new node with freq = a.freq + b.freq. - Push back into heap.4. Assign 0 to left, 1 to right.
Traverse tree , collect codes.

Each symbol gets a unique prefix code (no code is prefix of another).

B. Example

Text: ABRACADABRA

Frequencies:

Symbol	Count
A	5
B	2
R	2
C	1
D	1

Building tree gives codes like:

A: 0  
B: 101  
R: 100  
C: 1110  
D: 1111

Encoded text: 0 101 100 0 1110 0 1111 0 101 100 0 Compression achieved!

Tiny Code (C, Sketch)

typedef struct Node {
    char ch;
    int freq;
    struct Node *left, *right;
} Node;

Use a min-heap (priority queue) to build the tree. Traverse recursively to print codewords.

Complexity: (O$n \log n$)

2. Arithmetic Coding

Instead of mapping symbols to bit strings, arithmetic coding maps the entire message to a single number in [0,1).

We start with interval ([0,1)), and iteratively narrow it based on symbol probabilities.

Example

Symbols: {A: 0.5, B: 0.3, C: 0.2} Message: ABC

Intervals:

Start: [0, 1)
A → [0, 0.5)
B → [0.25, 0.4)
C → [0.34, 0.37)

Final code = any number in [0.34, 0.37) (e.g. 0.35)

Decoding reverses this process.

Advantage: achieves near-optimal entropy compression. Used in: JPEG2000, H.264

Time Complexity: ( O(n) )

3. LZ77 (Sliding Window Compression)

LZ77 replaces repeated substrings with back-references (offset, length, next_char) pointing into a sliding window.

Example

Text: abcabcabcx

Window slides; when abc repeats:

(0,0,'a'), (0,0,'b'), (0,0,'c'),
(3,3,'x')  // "abc" repeats from 3 chars back

So sequence is compressed as references to earlier substrings.

Used in: DEFLATE (ZIP, GZIP), PNG

Time: ( O(n) ), Space: proportional to window size.

Tiny Code (Conceptual)

struct Token { int offset, length; char next; };

Search previous window for longest match before emitting token.

4. LZ78 (Dictionary-Based)

Instead of sliding window, LZ78 builds an explicit dictionary of substrings.

Algorithm:

Start with empty dictionary.- Read input, find longest prefix in dictionary.- Output (index, next_char) and insert new entry. Example:

Input: ABAABABAABAB
Output: (0,A), (0,B), (1,B), (2,A), (4,A), (3,B)

Used in: LZW (GIF, TIFF)

5. Burrows-Wheeler Transform (BWT)

BWT is not compression itself , it permutes text to cluster similar characters, making it more compressible by run-length or Huffman coding.

Steps

Generate all rotations of string.
Sort them lexicographically.
Take last column as output.

Example: banana$

Rotations	Sorted
banana$	$banana \| \| anana$b
a$banan \| na$bana
$banana \| nana$ba

Last column: annb$aa BWT(“banana$") = "annb$aa”

Reversible with index of original row.

Used in: bzip2, FM-index (bioinformatics)

6. Summary

Algorithm	Idea	Complexity	Use
Huffman	Variable-length prefix codes	(O$n \log n$)	General compression
Arithmetic	Interval encoding	(O(n))	Near-optimal entropy
LZ77	Sliding window matches	(O(n))	ZIP, PNG
LZ78	Dictionary building	(O(n))	GIF, TIFF
BWT	Permute for clustering	(O$n \log n$)	bzip2

Why It Matters

Compression algorithms reveal structure in data , they exploit patterns that humans can’t see. They’re also a window into information theory, showing how close we can get to the entropy limit.

“To compress is to understand , every bit saved is a pattern found.”

Try It Yourself

Build a Huffman tree for MISSISSIPPI.
Implement a simple LZ77 encoder for repeating patterns.
Apply BWT and observe clustering of symbols.
Compare Huffman and Arithmetic outputs on same input.
Explore DEFLATE format combining LZ77 + Huffman.

Understanding compression means learning to see redundancy , the key to efficient storage, transmission, and understanding itself.

67. Cryptographic Hashes and Checksums

In algorithms, hashing helps us map data to fixed-size values. But when used for security and verification, hashing becomes a cryptographic tool. This section explores cryptographic hashes and checksums , algorithms that verify integrity, detect corruption, and secure data.

We’ll look at:

Simple checksums (parity, CRC)- Cryptographic hash functions (MD5, SHA family, BLAKE3)- Properties like collision resistance and preimage resistance- Practical uses in verification, signing, and storage

1. Checksums

Checksums are lightweight methods to detect accidental errors in data (not secure against attackers). They’re used in filesystems, networking, and storage to verify integrity.

A. Parity Bit

Adds one bit to make total 1s even or odd. Used in memory or communication to detect single-bit errors.

Example: Data = 1011 → has three 1s. Add parity bit 1 to make total 4 (even parity).

Limitation: Only detects odd number of bit errors.

B. Modular Sum (Simple Checksum)

Sum all bytes (mod 256 or 65536).

Tiny Code (C)

uint8_t checksum(uint8_t *data, int n) {
    uint32_t sum = 0;
    for (int i = 0; i < n; i++) sum += data[i];
    return (uint8_t)(sum % 256);
}

Use: Simple file or packet validation.

C. CRC (Cyclic Redundancy Check)

CRCs treat bits as coefficients of a polynomial. Divide by a generator polynomial, remainder = CRC code.

Used in Ethernet, ZIP, and PNG.

Example: CRC-32, CRC-16.

Fast hardware and table-driven implementations available.

Key Property:

Detects most burst errors- Not cryptographically secure

2. Cryptographic Hash Functions

A cryptographic hash function ( h(x) ) maps any input to a fixed-size output such that:

Deterministic: same input → same output
Fast computation
Preimage resistance: hard to find ( x ) given ( h(x) )
Second-preimage resistance: hard to find $x' \neq x$ with ( h(x’) = h(x) )
Collision resistance: hard to find any two distinct inputs with same hash

Algorithm	Output (bits)	Notes
MD5	128	Broken (collisions found)
SHA-1	160	Deprecated
SHA-256	256	Standard (SHA-2 family)
SHA-3	256	Keccak-based sponge
BLAKE3	256	Fast, parallel, modern

Example

h("hello") = 2cf24dba5fb0a... (SHA-256)

Change one letter, hash changes completely (avalanche effect):

h("Hello") = 185f8db32271f...

Even small changes → big differences.

Tiny Code (C, using pseudo-interface)

#include <openssl/sha.h>

unsigned char hash[SHA256_DIGEST_LENGTH];
SHA256((unsigned char*)"hello", 5, hash);

Print hash as hex string to verify.

3. Applications

Data integrity: verify files (e.g., SHA256SUM)- Digital signatures: sign hashes, not raw data- Password storage: store hashes, not plaintext- Deduplication: detect identical files via hashes- Blockchain: link blocks with hash pointers- Git: stores objects via SHA-1 identifiers

4. Hash Collisions

A collision occurs when ( h(x) = h(y) ) for $x \neq y$. Good cryptographic hashes make this computationally infeasible.

By the birthday paradox, collisions appear after $2^{n/2}$ operations for an ( n )-bit hash.

Hence, SHA-256 → ~$2^{128}$ effort to collide.

5. Checksums vs Hashes

Feature	Checksum	Cryptographic Hash
Goal	Detect errors	Ensure integrity and authenticity
Resistance	Low	High
Output Size	Small	128-512 bits
Performance	Very fast	Fast but secure
Example	CRC32	SHA-256, BLAKE3

Why It Matters

Checksums catch accidental corruption, hashes protect against malicious tampering. Together, they guard the trustworthiness of data , the foundation of secure systems.

“Integrity is invisible , until it’s lost.”

Try It Yourself

Compute CRC32 of a text file, flip one bit, and recompute.
Use sha256sum to verify file integrity.
Experiment: change one character in input, observe avalanche.
Compare performance of SHA-256 and BLAKE3.
Research how Git uses SHA-1 to track versions.

By learning hashes, you master one of the pillars of security , proof that something hasn’t changed, even when everything else does.

68. Approximate and Streaming Matching

Exact string matching (like KMP or Boyer-Moore) demands perfect alignment between pattern and text. But what if errors, noise, or incomplete data exist?

That’s where approximate matching and streaming matching come in. These algorithms let you search efficiently even when:

The pattern might contain typos or mutations- The text arrives in a stream (too large to store entirely)- You want to match “close enough,” not “exactly” They’re crucial in search engines, spell checkers, bioinformatics, and real-time monitoring systems.

1. Approximate String Matching

Approximate string matching finds occurrences of a pattern in a text allowing mismatches, insertions, or deletions , often measured by edit distance.

A. Dynamic Programming (Levenshtein Distance)

Given two strings $A$ and $B$, the edit distance is the minimum number of insertions, deletions, or substitutions to turn $A$ into $B$.

We can build a DP table $dp[i][j]$:

$dp[i][0] = i$ (delete all characters)
$dp[0][j] = j$ (insert all characters)
If $A[i] = B[j]$, then $dp[i][j] = dp[i-1][j-1]$
Else $dp[i][j] = 1 + \min(dp[i-1][j],\, dp[i][j-1],\, dp[i-1][j-1])$

Tiny Code (C)

int edit_distance(char *a, char *b) {
    int n = strlen(a), m = strlen(b);
    int dp[n+1][m+1];
    for (int i = 0; i <= n; i++) dp[i][0] = i;
    for (int j = 0; j <= m; j++) dp[0][j] = j;

    for (int i = 1; i <= n; i++)
        for (int j = 1; j <= m; j++)
            if (a[i-1] == b[j-1]) dp[i][j] = dp[i-1][j-1];
            else dp[i][j] = 1 + fmin(fmin(dp[i-1][j], dp[i][j-1]), dp[i-1][j-1]);
    return dp[n][m];
}

This computes Levenshtein distance in ( O(nm) ) time.

B. Bitap Algorithm (Shift-Or)

When pattern length is small, Bitap uses bitmasks to track mismatches. It efficiently supports up to k errors and runs in near linear time for small patterns.

Used in grep -E, ag, and fuzzy searching systems.

Idea: Maintain a bitmask where 1 = mismatch, 0 = match. Shift and OR masks as we scan text.

C. k-Approximate Matching

Find all positions where edit distance ≤ k. Efficient for small ( k ) (e.g., spell correction: edit distance ≤ 2).

Applications:

Typo-tolerant search- DNA sequence matching- Autocomplete systems

2. Streaming Matching

In streaming, the text is too large or unbounded, so we must process input online. We can’t store everything , only summaries or sketches.

A. Rolling Hash (Rabin-Karp style)

Maintains a moving hash of recent characters. When new character arrives, update hash in ( O(1) ). Compare with pattern’s hash for possible match.

Good for sliding window matching.

Example:

hash = (base * (hash - old_char * base^(m-1)) + new_char) % mod;

B. Fingerprinting (Karp-Rabin Fingerprint)

A compact representation of a substring. If fingerprints match, do full verification (avoid false positives). Used in streaming algorithms and chunking.

C. Sketch-Based Matching

Algorithms like Count-Min Sketch or SimHash build summaries of large data. They help approximate similarity between streams.

Applications:

Near-duplicate detection (SimHash in Google)- Network anomaly detection- Real-time log matching

3. Approximate Matching in Practice

Domain	Use Case	Algorithm
Spell Checking	“recieve” → “receive”	Edit Distance
DNA Alignment	Find similar sequences	Smith-Waterman
Autocomplete	Suggest close matches	Fuzzy Search
Logs & Streams	Online pattern alerts	Streaming Bitap, Karp-Rabin
Near-Duplicate	Detect similar text	SimHash, MinHash

4. Complexity

Algorithm	Time	Space	Notes
Levenshtein DP	(O(nm))	(O(nm))	Exact distance
Bitap	(O(n))	(O(1))	For small patterns
Rolling Hash	(O(n))	(O(1))	Probabilistic match
SimHash	(O(n))	(O(1))	Approximate similarity

Why It Matters

Real-world data is messy , typos, noise, loss, corruption. Approximate matching lets you build algorithms that forgive errors and adapt to streams. It powers everything from search engines to genomics, ensuring your algorithms stay practical in an imperfect world.

Try It Yourself

Compute edit distance between “kitten” and “sitting.”
Implement fuzzy search that returns words with ≤1 typo.
Use rolling hash to detect repeated substrings in a stream.
Experiment with SimHash to compare document similarity.
Observe how small typos affect fuzzy vs exact search.

69. Bioinformatics Alignment (Needleman-Wunsch, Smith-Waterman)

In bioinformatics, comparing DNA, RNA, or protein sequences is like comparing strings , but with biological meaning. Each sequence is made of letters (A, C, G, T for DNA; amino acids for proteins). To analyze similarity, scientists use sequence alignment algorithms that handle insertions, deletions, and substitutions.

Two fundamental methods dominate:

Needleman-Wunsch for global alignment- Smith-Waterman for local alignment

1. Sequence Alignment

Alignment means placing two sequences side by side to maximize matches and minimize gaps or mismatches.

For example:

A C G T G A
| | |   | |
A C G A G A

Here, mismatches and gaps may occur, but the alignment finds the best possible match under a scoring system.

Scoring System

Alignment uses scores instead of just counts. Typical scheme:

Match: +1- Mismatch: -1- Gap (insertion or deletion): -2 You can adjust weights depending on the biological context.

2. Needleman-Wunsch (Global Alignment)

Used when you want to align entire sequences , from start to end.

It uses dynamic programming to build a score table ( dp[i][j] ), where each cell represents the best score for aligning prefixes ( A[1..i] ) and ( B[1..j] ).

Recurrence:

\[dp[i][j] = \max \begin{cases} dp[i-1][j-1] + \text{score}(A_i, B_j) \ dp[i-1][j] + \text{gap penalty} \ dp[i][j-1] + \text{gap penalty} \end{cases}\]

Base cases: \[ dp[0][j] = j \times \text{gap penalty}, \quad dp[i][0] = i \times \text{gap penalty} \]

Tiny Code (C)

int max3(int a, int b, int c) {
    return a > b ? (a > c ? a : c) : (b > c ? b : c);
}

int needleman_wunsch(char *A, char *B, int match, int mismatch, int gap) {
    int n = strlen(A), m = strlen(B);
    int dp[n+1][m+1];
    for (int i = 0; i <= n; i++) dp[i][0] = i * gap;
    for (int j = 0; j <= m; j++) dp[0][j] = j * gap;

    for (int i = 1; i <= n; i++) {
        for (int j = 1; j <= m; j++) {
            int s = (A[i-1] == B[j-1]) ? match : mismatch;
            dp[i][j] = max3(dp[i-1][j-1] + s, dp[i-1][j] + gap, dp[i][j-1] + gap);
        }
    }
    return dp[n][m];
}

Example:

A = "ACGT"
B = "AGT"
match = +1, mismatch = -1, gap = -2

Produces optimal alignment:

A C G T
A - G T

3. Smith-Waterman (Local Alignment)

Used when sequences may have similar segments, not full-length similarity. Perfect for finding motifs or conserved regions.

Recurrence is similar, but with local reset to zero:

\[dp[i][j] = \max \begin{cases} 0, \ dp[i-1][j-1] + \text{score}(A_i, B_j), \ dp[i-1][j] + \text{gap penalty}, \ dp[i][j-1] + \text{gap penalty} \end{cases}\]

Final answer = maximum value in the table (not necessarily at the end).

It finds the best substring alignment.

Example

A = "ACGTTG"
B = "CGT"

Smith-Waterman finds best local match:

A C G T
  | | |
  C G T

Unlike global alignment, extra prefixes or suffixes are ignored.

4. Variants and Extensions

Algorithm	Type	Notes
Needleman-Wunsch	Global	Aligns full sequences
Smith-Waterman	Local	Finds similar subsequences
Gotoh Algorithm	Global	Uses affine gap penalty (opening + extension)
BLAST	Heuristic	Speeds up search for large databases

BLAST (Basic Local Alignment Search Tool) uses word seeds and extension, trading exactness for speed , essential for large genome databases.

5. Complexity

Both Needleman-Wunsch and Smith-Waterman run in:

Time: ( O(nm) )- Space: ( O(nm) ) But optimized versions use banded DP or Hirschberg’s algorithm to cut memory to ( O(n + m) ).

Why It Matters

Sequence alignment bridges computer science and biology. It’s how we:

Compare species- Identify genes- Detect mutations- Trace ancestry- Build phylogenetic trees The idea of “minimum edit cost” echoes everywhere , from spell checkers to DNA analysis.

“In biology, similarity is a story. Alignment is how we read it.”

Try It Yourself

Implement Needleman-Wunsch for short DNA sequences.
Change gap penalties , see how alignment shifts.
Compare outputs from global and local alignment.
Use real sequences from GenBank to test.
Explore BLAST online and compare to exact alignment results.

70. Text Indexing and Search Structures

When text becomes large , think books, databases, or the entire web , searching naively for patterns (O(nm)) is far too slow. We need indexing structures that let us search fast, often in O(m) or O(log n) time.

This section covers the backbone of search engines and string processing:

Suffix Arrays- Suffix Trees- Inverted Indexes- Tries and Prefix Trees- Compressed Indexes like FM-Index (Burrows-Wheeler)

1. Why Index?

A text index is like a table of contents , it doesn’t store the book, but lets you jump straight to what you want.

If you have a text of length ( n ), and you’ll run many queries, it’s worth building an index (even if it costs ( O$n \log n$ ) to build).

Without indexing: each query takes ( O(nm) ). With indexing: each query can take ( O(m) ) or less.

2. Suffix Array

A suffix array is a sorted array of all suffixes of a string.

For text "banana", suffixes are:

0: banana  
1: anana  
2: nana  
3: ana  
4: na  
5: a

Sorted lexicographically:

5: a  
3: ana  
1: anana  
0: banana  
4: na  
2: nana

Suffix Array = [5, 3, 1, 0, 4, 2]

To search, binary search over the suffix array using your pattern , ( O$m \log n$ ).

Tiny Code (C) (naive construction)

int cmp(const void *a, const void *b, void *txt) {
    int i = *(int*)a, j = *(int*)b;
    return strcmp((char*)txt + i, (char*)txt + j);
}

void build_suffix_array(char *txt, int n, int sa[]) {
    for (int i = 0; i < n; i++) sa[i] = i;
    qsort_r(sa, n, sizeof(int), cmp, txt);
}

Modern methods like prefix doubling or radix sort build it in ( O$n \log n$ ).

Applications:

Fast substring search- Longest common prefix (LCP) array- Pattern matching in DNA sequences- Plagiarism detection

3. Suffix Tree

A suffix tree is a compressed trie of all suffixes , each edge stores multiple characters.

For "banana", you’d build a tree where each leaf corresponds to a suffix index.

Advantages:

Pattern search in ( O(m) )- Space ( O(n) ) (with compression) Built using Ukkonen’s algorithm in ( O(n) ).

Use Suffix Array + LCP as a space-efficient alternative.

4. FM-Index (Burrows-Wheeler Transform)

Used in compressed full-text search (e.g., Bowtie, BWA). Combines:

Burrows-Wheeler Transform (BWT)- Rank/select bitvectors Supports pattern search in O(m) time with very low memory.

Idea: transform text so similar substrings cluster together, enabling compression and backward search.

Applications:

DNA alignment- Large text archives- Memory-constrained search

5. Inverted Index

Used in search engines. Instead of suffixes, it indexes words.

For example, text corpus:

doc1: quick brown fox  
doc2: quick red fox

Inverted index:

"quick" → [doc1, doc2]
"brown" → [doc1]
"red"   → [doc2]
"fox"   → [doc1, doc2]

Now searching “quick fox” becomes set intersection of lists.

Used with ranking functions (TF-IDF, BM25).

6. Tries and Prefix Trees

A trie stores strings character by character. Each node = prefix.

typedef struct Node {
    struct Node *child[26];
    int end;
} Node;

Perfect for:

Autocomplete- Prefix search- Spell checkers Search: O(m), where m = pattern length.

Compressed tries (Patricia trees) reduce space.

7. Comparing Structures

Structure	Search Time	Build Time	Space	Notes
Trie	O(m)	O(n)	High	Prefix queries
Suffix Array	O(m log n)	O(n log n)	Medium	Sorted suffixes
Suffix Tree	O(m)	O(n)	High	Rich structure
FM-Index	O(m)	O(n)	Low	Compressed
Inverted Index	O(k)	O(N)	Medium	Word-based

Why It Matters

Text indexing is the backbone of search engines, DNA alignment, and autocomplete systems. Without it, Google searches, code lookups, or genome scans would take minutes, not milliseconds.

“Indexing turns oceans of text into navigable maps.”

Try It Yourself

Build a suffix array for “banana” and perform binary search for “ana.”
Construct a trie for a dictionary and query prefixes.
Write a simple inverted index for a few documents.
Compare memory usage of suffix tree vs suffix array.
Experiment with FM-index using an online demo (like BWT explorer).

Chapter 8. Geometry, Graphics, and Spatial Algorithms

71. Convex Hull (Graham, Andrew, Chan)

In computational geometry, the convex hull of a set of points is the smallest convex polygon that contains all the points. Intuitively, imagine stretching a rubber band around a set of nails on a board , the shape the band takes is the convex hull.

Convex hulls are foundational for many geometric algorithms, like closest pair, Voronoi diagrams, and collision detection.

In this section, we’ll explore three classical algorithms:

Graham Scan - elegant and simple (O(n log n))- Andrew’s Monotone Chain - robust and practical (O(n log n))- Chan’s Algorithm - advanced and optimal (O(n log h), where h = number of hull points)

1. Definition

Given a set of points $P = {p_1, p_2, ..., p_n}$, the convex hull, ( (P) ), is the smallest convex polygon enclosing all points.

Formally: \[ \text{CH}(P) = \bigcap {C \subseteq \mathbb{R}^2 \mid C \text{ is convex and } P \subseteq C } \]

A polygon is convex if every line segment between two points of the polygon lies entirely inside it.

2. Orientation Test

All convex hull algorithms rely on an orientation test using cross product: Given three points ( a, b, c ):

\[ \text{cross}(a,b,c) = (b_x - a_x)(c_y - a_y) - (b_y - a_y)(c_x - a_x) \]

> 0 → counter-clockwise turn- < 0 → clockwise turn- = 0 → collinear

3. Graham Scan

One of the earliest convex hull algorithms.

Idea:

Pick the lowest point (and leftmost if tie).
Sort all other points by polar angle with respect to it.
Traverse points and maintain a stack:
- Add point - While last three points make a right turn, pop middle one4. Remaining points form convex hull in CCW order.

Tiny Code (C)

typedef struct { double x, y; } Point;

double cross(Point a, Point b, Point c) {
    return (b.x - a.x)*(c.y - a.y) - (b.y - a.y)*(c.x - a.x);
}

int cmp(const void *p1, const void *p2) {
    Point *a = (Point*)p1, *b = (Point*)p2;
    // Compare by polar angle or distance
    return (a->y != b->y) ? (a->y - b->y) : (a->x - b->x);
}

int graham_scan(Point pts[], int n, Point hull[]) {
    qsort(pts, n, sizeof(Point), cmp);
    int top = 0;
    for (int i = 0; i < n; i++) {
        while (top >= 2 && cross(hull[top-2], hull[top-1], pts[i]) <= 0)
            top--;
        hull[top++] = pts[i];
    }
    return top; // number of hull points
}

Complexity:

Sorting: ( O$n \log n$ )- Scanning: ( O(n) ) → Total: O(n log n)

Example

Input:

(0, 0), (1, 1), (2, 2), (2, 0), (0, 2)

Hull (CCW):

(0,0) → (2,0) → (2,2) → (0,2)

4. Andrew’s Monotone Chain

Simpler and more robust for floating-point coordinates. Builds lower and upper hulls separately.

Steps:

Sort points lexicographically (x, then y).
Build lower hull (left-to-right)
Build upper hull (right-to-left)
Concatenate (excluding duplicates)

Tiny Code (C)

int monotone_chain(Point pts[], int n, Point hull[]) {
    qsort(pts, n, sizeof(Point), cmp);
    int k = 0;
    // Lower hull
    for (int i = 0; i < n; i++) {
        while (k >= 2 && cross(hull[k-2], hull[k-1], pts[i]) <= 0) k--;
        hull[k++] = pts[i];
    }
    // Upper hull
    for (int i = n-2, t = k+1; i >= 0; i--) {
        while (k >= t && cross(hull[k-2], hull[k-1], pts[i]) <= 0) k--;
        hull[k++] = pts[i];
    }
    return k-1; // last point == first point
}

Time Complexity: ( O$n \log n$ )

5. Chan’s Algorithm

When $h \ll n$, Chan’s method achieves ( O$n \log h$ ):

Partition points into groups of size ( m ).
Compute hulls for each group (Graham).
Merge hulls with Jarvis March (gift wrapping).
Choose ( m ) cleverly ($m = 2^k$) to ensure ( O$n \log h$ ).

Used in: large-scale geometric processing.

6. Applications

Domain	Use
Computer Graphics	Shape boundary, hitboxes
GIS / Mapping	Region boundaries
Robotics	Obstacle envelopes
Clustering	Outlier detection
Data Analysis	Minimal bounding shape

7. Complexity Summary

Algorithm	Time	Space	Notes
Graham Scan	( O$n \log n$ )	( O(n) )	Simple, classic
Monotone Chain	( O$n \log n$ )	( O(n) )	Stable, robust
Chan’s Algorithm	( O$n \log h$ )	( O(n) )	Best asymptotic

Why It Matters

Convex hulls are one of the cornerstones of computational geometry. They teach sorting, cross products, and geometric reasoning , and form the basis for many spatial algorithms.

“Every scattered set hides a simple shape. The convex hull is that hidden simplicity.”

Try It Yourself

Implement Graham Scan for 10 random points.
Plot the points and verify the hull.
Compare results with Andrew’s Monotone Chain.
Test with collinear and duplicate points.
Explore 3D convex hulls (QuickHull, Gift Wrapping) next.

72. Closest Pair and Segment Intersection

Geometric problems often ask: what’s the shortest distance between two points? or do these segments cross? These are classic building blocks in computational geometry , essential for collision detection, graphics, clustering, and path planning.

This section covers two foundational problems:

Closest Pair of Points - find two points with minimum Euclidean distance- Segment Intersection - determine if (and where) two line segments intersect

1. Closest Pair of Points

Given ( n ) points in 2D, find a pair with the smallest distance. The brute force solution is ( O$n^2$ ), but using Divide and Conquer, we can solve it in O(n log n).

A. Divide and Conquer Algorithm

Idea:

Sort points by x-coordinate.
Split into left and right halves.
Recursively find closest pairs in each half (distance = ( d )).
Merge step: check pairs across the split line within ( d ).

In merge step, we only need to check at most 6 neighbors per point (by geometric packing).

Tiny Code (C, Sketch)

#include <math.h>
typedef struct { double x, y; } Point;

double dist(Point a, Point b) {
    double dx = a.x - b.x, dy = a.y - b.y;
    return sqrt(dx*dx + dy*dy);
}

double brute_force(Point pts[], int n) {
    double d = 1e9;
    for (int i = 0; i < n; i++)
        for (int j = i + 1; j < n; j++)
            d = fmin(d, dist(pts[i], pts[j]));
    return d;
}

Recursive divide and merge:

double closest_pair(Point pts[], int n) {
    if (n <= 3) return brute_force(pts, n);
    int mid = n / 2;
    double d = fmin(closest_pair(pts, mid),
                    closest_pair(pts + mid, n - mid));
    // merge step: check strip points within distance d
    // sort by y, check neighbors
    return d;
}

Time Complexity: ( O$n \log n$ )

Example:

Points:

(2,3), (12,30), (40,50), (5,1), (12,10), (3,4)

Closest pair: (2,3) and (3,4), distance = √2

B. Sweep Line Variant

Another method uses a line sweep and a balanced tree to keep active points. As you move from left to right, maintain a window of recent points within ( d ).

Used in large-scale spatial systems.

Applications

Domain	Use
Clustering	Find nearest neighbors
Robotics	Avoid collisions
GIS	Nearest city search
Networking	Sensor proximity

2. Segment Intersection

Given two segments ( AB ) and ( CD ), determine whether they intersect. It’s the core of geometry engines and vector graphics systems.

A. Orientation Test

We use the cross product (orientation) test again. Two segments ( AB ) and ( CD ) intersect if and only if:

The segments straddle each other: \[ \text{orient}(A, B, C) \neq \text{orient}(A, B, D) \]

\[ \text{orient}(C, D, A) \neq \text{orient}(C, D, B) \] 2. Special cases for collinear points (check bounding boxes).

Tiny Code (C)

double cross(Point a, Point b, Point c) {
    return (b.x - a.x)*(c.y - a.y) - (b.y - a.y)*(c.x - a.x);
}

int on_segment(Point a, Point b, Point c) {
    return fmin(a.x, b.x) <= c.x && c.x <= fmax(a.x, b.x) &&
           fmin(a.y, b.y) <= c.y && c.y <= fmax(a.y, b.y);
}

int intersect(Point a, Point b, Point c, Point d) {
    double o1 = cross(a, b, c);
    double o2 = cross(a, b, d);
    double o3 = cross(c, d, a);
    double o4 = cross(c, d, b);
    if (o1*o2 < 0 && o3*o4 < 0) return 1; // general case
    if (o1 == 0 && on_segment(a,b,c)) return 1;
    if (o2 == 0 && on_segment(a,b,d)) return 1;
    if (o3 == 0 && on_segment(c,d,a)) return 1;
    if (o4 == 0 && on_segment(c,d,b)) return 1;
    return 0;
}

B. Line Sweep Algorithm (Bentley-Ottmann)

For multiple segments, check all intersections efficiently. Algorithm:

Sort all endpoints by x-coordinate.
Sweep from left to right.
Maintain active set (balanced BST).
Check neighboring segments for intersections.

Time complexity: $O((n + k) \log n)$, where $k$ is the number of intersections.

Used in CAD, map rendering, and collision systems.

3. Complexity Summary

Problem	Naive	Optimal	Technique
Closest Pair	$O(n^2)$	$O(n \log n)$	Divide & Conquer
Segment Intersection	$O(n^2)$	$O((n + k) \log n)$	Sweep Line

Why It Matters

Geometric algorithms like these teach how to reason spatially , blending math, sorting, and logic. They power real-world systems where precision matters: from self-driving cars to game engines.

“Every point has a neighbor; every path may cross another , geometry finds the truth in space.”

Try It Yourself

Implement the closest pair algorithm using divide and conquer.
Visualize all pairwise distances , see which pairs are minimal.
Test segment intersection on random pairs.
Modify for 3D line segments using vector cross products.
Try building a line sweep visualizer to catch intersections step-by-step.

73. Line Sweep and Plane Sweep Algorithms

The sweep line (or plane sweep) technique is one of the most powerful paradigms in computational geometry. It transforms complex spatial problems into manageable one-dimensional events , by sweeping a line (or plane) across the input and maintaining a dynamic set of active elements.

This method underlies many geometric algorithms:

Event sorting → handle things in order- Active set maintenance → track current structure- Updates and queries → respond as the sweep progresses Used for intersection detection, closest pair, rectangle union, computational geometry in graphics and GIS.

1. The Core Idea

Imagine a vertical line sweeping from left to right across the plane. At each “event” (like a point or segment endpoint), we update the set of objects the line currently touches , the active set.

Each event may trigger queries, insertions, or removals.

This approach works because geometry problems often depend only on local relationships between nearby elements as the sweep advances.

A. Sweep Line Template

A general structure looks like this:

struct Event { double x; int type; Object *obj; };
sort(events.begin(), events.end());

ActiveSet S;

for (Event e : events) {
    if (e.type == START) S.insert(e.obj);
    else if (e.type == END) S.erase(e.obj);
    else if (e.type == QUERY) handle_query(S, e.obj);
}

Sorting ensures events are processed in order of increasing x (or another dimension).

2. Classic Applications

Let’s explore three foundational problems solvable by sweep techniques.

A. Segment Intersection (Bentley-Ottmann)

Goal: detect all intersections among ( n ) line segments.

Steps:

Sort endpoints by x-coordinate.
Sweep from left to right.
Maintain an ordered set of active segments (sorted by y).
When a new segment starts, check intersection with neighbors above and below.
When segments intersect, record intersection and insert a new event at the x-coordinate of intersection.

Complexity: $O((n + k)\log n)$, where $k$ is the number of intersections.

B. Closest Pair of Points

Sweep line version sorts by x, then slides a vertical line while maintaining active points within a strip of width ( d ) (current minimum). Only need to check at most 6-8 nearby points in strip.

Complexity: ( O$n \log n$ )

C. Rectangle Union Area

Given axis-aligned rectangles, compute total area covered.

Idea:

Treat vertical edges as events (entering/exiting rectangles).- Sweep line moves along x-axis.- Maintain y-intervals in active set (using a segment tree or interval tree).- At each step, multiply current width × height of union of active intervals. Complexity: ( O$n \log n$ )

Tiny Code Sketch (C)

typedef struct { double x, y1, y2; int type; } Event;
Event events[MAX];
int n_events;

qsort(events, n_events, sizeof(Event), cmp_by_x);

double prev_x = events[0].x, area = 0;
SegmentTree T;

for (int i = 0; i < n_events; i++) {
    double dx = events[i].x - prev_x;
    area += dx * T.total_length(); // current union height
    if (events[i].type == START)
        T.insert(events[i].y1, events[i].y2);
    else
        T.remove(events[i].y1, events[i].y2);
    prev_x = events[i].x;
}

3. Other Applications

Problem	Description	Time
K-closest points	Maintain top $k$ in active set	$O(n \log n)$
Union of rectangles	Compute covered area	$O(n \log n)$
Point location	Locate point in planar subdivision	$O(\log n)$
Visibility graph	Track visible edges	$O(n \log n)$

4. Plane Sweep Extensions

While line sweep moves in one dimension (x), plane sweep handles 2D or higher-dimensional spaces, where:

Events are 2D cells or regions.- Sweep front is a plane instead of a line. Used in 3D collision detection, computational topology, and CAD systems.

Conceptual Visualization

Sort events by one axis (say, x).
Maintain structure (set, tree, or heap) of intersecting or active elements.
Update at each event and record desired output (intersection, union, coverage).

The key is the locality principle: only neighbors in the sweep structure can change outcomes.

5. Complexity

Phase	Complexity
Sorting events	$O(n \log n)$
Processing events	$O(n \log n)$
Total	$O(n \log n)$ (typical)

Why It Matters

The sweep line method transforms geometric chaos into order , turning spatial relationships into sorted sequences. It’s the bridge between geometry and algorithms, blending structure with motion.

“A sweep line sees everything , not all at once, but just in time.”

Try It Yourself

Implement a sweep-line segment intersection finder.
Compute the union area of 3 rectangles with overlaps.
Animate the sweep line to visualize event processing.
Modify for circular or polygonal objects.
Explore how sweep-line logic applies to time-based events in scheduling.

74. Delaunay and Voronoi Diagrams

In geometry and spatial computing, Delaunay triangulations and Voronoi diagrams are duals , elegant structures that capture proximity, territory, and connectivity among points.

They’re used everywhere: from mesh generation, pathfinding, geospatial analysis, to computational biology. This section introduces both, their relationship, and algorithms to construct them efficiently.

1. Voronoi Diagram

Given a set of sites (points) $P = {p_1, p_2, \ldots, p_n}$, the Voronoi diagram partitions the plane into regions , one per point , so that every location in a region is closer to its site than to any other.

Formally, the Voronoi cell for $p_i$ is: \[ V(p_i) = {x \in \mathbb{R}^2 \mid d(x, p_i) \le d(x, p_j), \forall j \neq i } \]

Each region is convex, and boundaries are formed by perpendicular bisectors.

Example

For points ( A, B, C ):

Draw bisectors between each pair.- Intersection points define Voronoi vertices.- Resulting polygons cover the plane, one per site. Used to model nearest neighbor regions , “which tower serves which area?”

Properties

Every cell is convex.- Neighboring cells share edges.- The diagram’s vertices are centers of circumcircles through three sites.- Dual graph = Delaunay triangulation.

2. Delaunay Triangulation

The Delaunay triangulation (DT) connects points so that no point lies inside the circumcircle of any triangle.

Equivalently, it’s the dual graph of the Voronoi diagram.

It tends to avoid skinny triangles , maximizing minimum angles, creating well-shaped meshes.

Formal Definition

A triangulation ( T ) of ( P ) is Delaunay if for every triangle $\triangle abc \in T$, no point $p \in P \setminus {a,b,c}$ lies inside the circumcircle of $\triangle abc$.

Why It Matters:

Avoids sliver triangles.- Used in finite element meshes, terrain modeling, and path planning.- Leads to natural neighbor interpolation and smooth surfaces.

3. Relationship

Voronoi and Delaunay are geometric duals:

Voronoi	Delaunay
Regions = proximity zones	Triangles = neighbor connections
Edges = bisectors	Edges = neighbor pairs
Vertices = circumcenters	Faces = circumcircles

Connecting neighboring Voronoi cells gives Delaunay edges.

4. Algorithms

Several algorithms can build these diagrams efficiently.

A. Incremental Insertion

Start with a super-triangle enclosing all points.
Insert points one by one.
Remove triangles whose circumcircle contains the point.
Re-triangulate the resulting polygonal hole.

Time Complexity: ( O$n^2$ ), improved to ( O$n \log n$ ) with randomization.

B. Divide and Conquer

Sort points by x.
Recursively build DT for left and right halves.
Merge by finding common tangents.

Time Complexity: ( O$n \log n$ ) Elegant, structured, and deterministic.

C. Fortune’s Sweep Line Algorithm

For Voronoi diagrams, Fortune’s algorithm sweeps a line from top to bottom. Maintains a beach line of parabolic arcs and event queue.

Each event (site or circle) updates the structure , building Voronoi edges incrementally.

Time Complexity: ( O$n \log n$ )

D. Bowyer-Watson (Delaunay via Circumcircle Test)

A practical incremental version widely used in graphics and simulation.

Steps:

Start with supertriangle- Insert point- Remove all triangles whose circumcircle contains point- Reconnect the resulting cavity

Tiny Code (Conceptual)

typedef struct { double x, y; } Point;

typedef struct { Point a, b, c; } Triangle;

bool in_circle(Point a, Point b, Point c, Point p) {
    double A[3][3] = {
        {a.x - p.x, a.y - p.y, (a.x*a.x + a.y*a.y) - (p.x*p.x + p.y*p.y)},
        {b.x - p.x, b.y - p.y, (b.x*b.x + b.y*b.y) - (p.x*p.x + p.y*p.y)},
        {c.x - p.x, c.y - p.y, (c.x*c.x + c.y*c.y) - (p.x*p.x + p.y*p.y)}
    };
    return determinant(A) > 0;
}

This test ensures Delaunay property.

5. Applications

Domain	Application
GIS	Nearest facility, region partition
Mesh Generation	Finite element methods
Robotics	Visibility graphs, navigation
Computer Graphics	Terrain triangulation
Clustering	Spatial neighbor structure

6. Complexity Summary

Algorithm	Type	Time	Notes
Fortune	Voronoi	( O$n \log n$ )	Sweep line
Bowyer-Watson	Delaunay	( O$n \log n$ )	Incremental
Divide & Conquer	Delaunay	( O$n \log n$ )	Recursive

Why It Matters

Voronoi and Delaunay diagrams reveal natural structure in point sets. They convert distance into geometry, showing how space is divided and connected. If geometry is the shape of space, these diagrams are its skeleton.

“Every point claims its territory; every territory shapes its network.”

Try It Yourself

Draw Voronoi regions for 5 random points by hand.
Build Delaunay triangles (connect neighboring sites).
Verify the empty circumcircle property.
Use a library (CGAL / SciPy) to visualize both structures.
Explore how adding new points reshapes the diagrams.

75. Point in Polygon and Polygon Triangulation

Geometry often asks two fundamental questions:

Is a point inside or outside a polygon?
How can a complex polygon be broken into triangles for computation?

These are the building blocks of spatial analysis, computer graphics, and computational geometry.

1. Point in Polygon (PIP)

Given a polygon defined by vertices ( $x_1, y_1$, $x_2, y_2$, , $x_n, y_n$ ) and a test point ( (x, y) ), we want to determine if the point lies inside, on the boundary, or outside the polygon.

Methods

A. Ray Casting Algorithm

Shoot a ray horizontally to the right of the point. Count how many times it intersects polygon edges.

Odd count → Inside- Even count → Outside This is based on the even-odd rule.

Tiny Code (Ray Casting in C)

bool point_in_polygon(Point p, Point poly[], int n) {
    bool inside = false;
    for (int i = 0, j = n - 1; i < n; j = i++) {
        if (((poly[i].y > p.y) != (poly[j].y > p.y)) &&
            (p.x < (poly[j].x - poly[i].x) * 
                   (p.y - poly[i].y) / 
                   (poly[j].y - poly[i].y) + poly[i].x))
            inside = !inside;
    }
    return inside;
}

This toggles inside every time a crossing is found.

B. Winding Number Algorithm

Counts how many times the polygon winds around the point.

Nonzero winding number → Inside- Zero → Outside More robust for complex polygons with holes or self-intersections.

Method	Time Complexity	Robustness
Ray Casting	(O(n))	Simple, may fail on edge cases
Winding Number	(O(n))	More accurate for complex shapes

Edge Cases

Handle:

Points on edges or vertices- Horizontal edges (special treatment to avoid double counting) Numerical precision is key.

Applications

Hit testing in computer graphics- GIS spatial queries- Collision detection

2. Polygon Triangulation

A polygon triangulation divides a polygon into non-overlapping triangles whose union equals the polygon.

Why triangulate?

Triangles are simple, stable, and efficient for rendering and computation.- Used in graphics pipelines, area computation, physics, and mesh generation.

A. Triangulation Basics

For a simple polygon with ( n ) vertices,

Always possible- Always yields ( n - 2 ) triangles Goal: Find a triangulation efficiently and stably.

B. Ear Clipping Algorithm

An intuitive and widely used method for triangulation.

Idea

Find an ear: a triangle formed by three consecutive vertices ( $v_{i-1}, v_i, v_{i+1}$ ) such that:
- It is convex - Contains no other vertex inside
Clip the ear (remove vertex $v_i$)
Repeat until only one triangle remains

Time Complexity: ( O$n^2$ )

Tiny Code (Ear Clipping Sketch)

while (n > 3) {
    for (i = 0; i < n; i++) {
        if (is_ear(i)) {
            add_triangle(i-1, i, i+1);
            remove_vertex(i);
            break;
        }
    }
}

Helper is_ear() checks convexity and emptiness.

C. Dynamic Programming for Convex Polygons

If the polygon is convex, use DP triangulation:

\[ dp[i][j] = \min_{k \in (i,j)} dp[i][k] + dp[k][j] + cost(i, j, k) \]

Cost: perimeter or area (for minimum-weight triangulation)

Time Complexity: ( O$n^3$ ) Space: ( O$n^2$ )

D. Divide and Conquer

Recursively split polygon and triangulate sub-polygons. Useful for convex or near-convex shapes.

Algorithm	Time	Notes
Ear Clipping	(O$n^2$)	Simple polygons
DP Triangulation	(O$n^3$)	Weighted cost
Convex Polygon	(O(n))	Straightforward

3. Applications

Domain	Usage
Computer Graphics	Rendering, rasterization
Computational Geometry	Area computation, integration
Finite Element Analysis	Mesh subdivision
Robotics	Path planning, map decomposition

Why It Matters

Point-in-polygon answers where you are. Triangulation tells you how space is built. Together, they form the foundation of geometric reasoning.

“From a single point to a thousand triangles, geometry turns space into structure.”

Try It Yourself

Draw a non-convex polygon and test random points using the ray casting rule.
Implement the ear clipping algorithm for a simple polygon.
Visualize how each step removes an ear and simplifies the shape.
Compare triangulation results for convex vs concave shapes.

76. Spatial Data Structures (KD, R-tree)

When working with geometric data, points, rectangles, or polygons, efficient lookup and organization are crucial. Spatial data structures are designed to answer queries like:

Which objects are near a given point?- Which shapes intersect a region?- What’s the nearest neighbor? They form the backbone of computational geometry, computer graphics, GIS, and search systems.

1. Motivation

Brute force approaches that check every object have ( O(n) ) or worse performance. Spatial indexing structures, like KD-Trees and R-Trees, enable efficient range queries, nearest neighbor searches, and spatial joins.

2. KD-Tree (k-dimensional tree)

A KD-tree is a binary tree that recursively partitions space using axis-aligned hyperplanes.

Each node splits the data by one coordinate axis (x, y, z, …).

Structure

Each node represents a point.- Each level splits by a different axis (x, y, x, y, …).- Left child contains points with smaller coordinate.- Right child contains larger coordinate.

Tiny Code (KD-tree Construction in 2D)

typedef struct {
    double x, y;
} Point;

int axis; // 0 for x, 1 for y

KDNode* build(Point points[], int n, int depth) {
    if (n == 0) return NULL;
    axis = depth % 2;
    int mid = n / 2;
    nth_element(points, points + mid, points + n, compare_by_axis);
    KDNode* node = new_node(points[mid]);
    node->left  = build(points, mid, depth + 1);
    node->right = build(points + mid + 1, n - mid - 1, depth + 1);
    return node;
}

Search Complexity:

Average: ( O$\log n$ )- Worst-case: ( O(n) )

Queries

Range query: Find points in a region.- Nearest neighbor: Search branches that might contain closer points.- K-nearest neighbors: Use priority queues.

Pros & Cons

Pros	Cons
Efficient for static data	Costly updates
Good for low dimensions	Degrades with high dimensions

Applications

Nearest neighbor in ML- Collision detection- Clustering (e.g., k-means acceleration)

3. R-Tree (Rectangle Tree)

An R-tree is a height-balanced tree for rectangular bounding boxes. It’s the spatial analog of a B-tree.

Idea

Store objects or bounding boxes in leaf nodes.- Internal nodes store MBRs (Minimum Bounding Rectangles) that cover child boxes.- Query by traversing overlapping MBRs.

Tiny Code (R-Tree Node Sketch)

typedef struct {
    Rectangle mbr;
    Node* children[MAX_CHILDREN];
    int count;
} Node;

Insertion chooses the child whose MBR expands least to accommodate the new entry.

Operations

Insert: Choose subtree → Insert → Adjust MBRs- Search: Descend into nodes whose MBR intersects query- Split: When full, use heuristics (linear, quadratic, R*-Tree) Complexity:
Query: ( O$\log n$ )- Insert/Delete: ( O$\log n$ ) average

Pros & Cons

Pros	Cons
Supports dynamic data	Overlaps can degrade performance
Ideal for rectangles	Complex split rules

Variants

R*-Tree: Optimized reinsertion, better packing- R+ Tree: Non-overlapping partitions- Hilbert R-Tree: Uses space-filling curves

4. Comparison

Feature	KD-Tree	R-Tree
Data Type	Points	Rectangles / Regions
Dimensionality	Low (2-10)	Medium
Use Case	NN, range queries	Spatial joins, overlap queries
Updates	Expensive	Dynamic-friendly
Balance	Recursive median	B-tree-like

5. Other Spatial Structures

Structure	Description
Quadtree	Recursive 2D subdivision into 4 quadrants
Octree	3D analog of quadtree
BSP Tree	Binary partition using arbitrary planes
Grid Index	Divide space into uniform grid cells

6. Applications

Domain	Usage
GIS	Region queries, map intersections
Graphics	Ray tracing acceleration
Robotics	Collision and path planning
ML	Nearest neighbor search
Databases	Spatial indexing

Why It Matters

Spatial structures turn geometry into searchable data. They enable efficient algorithms for where and what’s near, vital for real-time systems.

“Divide space wisely, and queries become whispers instead of shouts.”

Try It Yourself

Build a KD-tree for 10 random 2D points.
Implement nearest neighbor search.
Insert rectangles into a simple R-tree and query intersection with a bounding box.
Compare query time vs brute force.

77. Rasterization and Scanline Techniques

When you draw shapes on a screen, triangles, polygons, circles, they must be converted into pixels. This conversion is called rasterization. It’s the bridge between geometric math and visible images.

Rasterization and scanline algorithms are foundational to computer graphics, game engines, and rendering pipelines.

1. What Is Rasterization?

Rasterization transforms vector shapes (continuous lines and surfaces) into discrete pixels on a grid.

For example, a triangle defined by vertices (x1, y1), (x2, y2), (x3, y3) must be filled pixel by pixel.

2. Core Idea

Each shape (line, polygon, circle) is sampled over a grid. The algorithm decides which pixels are inside, on, or outside the shape.

A rasterizer answers:

Which pixels should be lit?- What color or depth should each pixel have?

3. Line Rasterization (Bresenham’s Algorithm)

A classic method for drawing straight lines with integer arithmetic.

Key Idea: Move from one pixel to the next, choosing the pixel closest to the true line path.

void draw_line(int x0, int y0, int x1, int y1) {
    int dx = abs(x1 - x0), dy = abs(y1 - y0);
    int sx = (x0 < x1) ? 1 : -1;
    int sy = (y0 < y1) ? 1 : -1;
    int err = dx - dy;
    while (true) {
        plot(x0, y0); // draw pixel
        if (x0 == x1 && y0 == y1) break;
        int e2 = 2 * err;
        if (e2 > -dy) { err -= dy; x0 += sx; }
        if (e2 < dx) { err += dx; y0 += sy; }
    }
}

Why it works: Bresenham avoids floating-point math and keeps the line visually continuous.

4. Polygon Rasterization

To fill shapes, we need scanline algorithms, they sweep a horizontal line (y-axis) across the shape and fill pixels in between edges.

Scanline Fill Steps

Sort edges by their y-coordinates.
Scan each line (y).
Find intersections with polygon edges.
Fill between intersection pairs.

This guarantees correct filling for convex and concave polygons.

Example (Simple Triangle Rasterization)

for (int y = y_min; y <= y_max; y++) {
    find all x-intersections with polygon edges;
    sort x-intersections;
    for (int i = 0; i < count; i += 2)
        draw_line(x[i], y, x[i+1], y);
}

5. Circle Rasterization (Midpoint Algorithm)

Use symmetry, a circle is symmetric in 8 octants.

Each step calculates the error term to decide whether to move horizontally or diagonally.

void draw_circle(int xc, int yc, int r) {
    int x = 0, y = r, d = 3 - 2 * r;
    while (y >= x) {
        plot_circle_points(xc, yc, x, y);
        x++;
        if (d > 0) { y--; d += 4 * (x - y) + 10; }
        else d += 4 * x + 6;
    }
}

6. Depth and Shading

In 3D graphics, rasterization includes depth testing (Z-buffer) and color interpolation. Each pixel stores its depth; new pixels overwrite only if closer.

Interpolated shading (Gouraud, Phong) computes smooth color transitions across polygons.

7. Hardware Rasterization

Modern GPUs perform rasterization in parallel:

Vertex Shader → Projection- Rasterizer → Pixel Grid- Fragment Shader → Color & Depth Each pixel is processed in fragment shaders for lighting, texture, and effects.

8. Optimizations

Technique	Purpose
Bounding Box Clipping	Skip off-screen regions
Early Z-Culling	Discard hidden pixels early
Edge Functions	Fast inside-test for triangles
Barycentric Coordinates	Interpolate depth/color smoothly

9. Why It Matters

Rasterization turns math into imagery. It’s the foundation of all visual computing, renderers, CAD, games, and GUIs. Even with ray tracing rising, rasterization remains dominant for real-time rendering.

“Every pixel you see began as math, it’s just geometry painted by light.”

10. Try It Yourself

Implement Bresenham’s algorithm for lines.
Write a scanline polygon fill for triangles.
Extend it with color interpolation using barycentric coordinates.
Compare performance vs brute force (looping over all pixels).

78. Computer Vision (Canny, Hough, SIFT)

Computer vision is where algorithms learn to see, to extract structure, shape, and meaning from images. Behind every object detector, edge map, and keypoint matcher lies a handful of powerful geometric algorithms.

In this section, we explore four pillars of classical vision: Canny edge detection, Hough transform, and SIFT (Scale-Invariant Feature Transform).

1. The Vision Pipeline

Most vision algorithms follow a simple pattern:

Input: Raw pixels (grayscale or color)
Preprocess: Smoothing or filtering
Feature extraction: Edges, corners, blobs
Detection or matching: Shapes, keypoints
Interpretation: Object recognition, tracking

Canny, Hough, and SIFT live in the feature extraction and detection stages.

2. Canny Edge Detector

Edges mark places where intensity changes sharply, the outlines of objects. The Canny algorithm (1986) is one of the most robust and widely used edge detectors.

Steps

Smoothing: Apply Gaussian blur to reduce noise.
Gradient computation:
- Compute $G_x$ and $G_y$ via Sobel filters
- Gradient magnitude: $G = \sqrt{G_x^2 + G_y^2}$
- Gradient direction: $\theta = \tan^{-1}\frac{G_y}{G_x}$
Non-maximum suppression:
- Keep only local maxima along the gradient direction
Double thresholding:
- Strong edges (high gradient)
- Weak edges (connected to strong ones)
Edge tracking by hysteresis:
- Connect weak edges linked to strong edges

Tiny Code (Pseudocode)

Image canny(Image input) {
    Image smoothed = gaussian_blur(input);
    Gradient grad = sobel(smoothed);
    Image suppressed = non_max_suppression(grad);
    Image edges = hysteresis_threshold(suppressed, low, high);
    return edges;
}

Why Canny Works

Canny maximizes three criteria:

Good detection (low false negatives)
Good localization (edges close to true edges)
Single response (no duplicates)

It’s a careful balance between sensitivity and stability.

3. Hough Transform

Canny finds edge points, Hough connects them into shapes.

The Hough transform detects lines, circles, and other parametric shapes using voting in parameter space.

Line Detection

Equation of a line: \[ \rho = x\cos\theta + y\sin\theta \]

Each edge point votes for all ($\rho, \theta$) combinations it could belong to. Peaks in the accumulator array correspond to strong lines.

Tiny Code (Hough Transform)

for each edge point (x, y):
  for theta in [0, 180):
    rho = x*cos(theta) + y*sin(theta);
    accumulator[rho, theta]++;

Then pick ($\rho, \theta$) with highest votes.

Circle Detection

Use 3D accumulator $center_x, center_y, radius$. Each edge pixel votes for possible circle centers.

Applications

Lane detection in self-driving- Shape recognition (circles, ellipses)- Document analysis (lines, grids)

4. SIFT (Scale-Invariant Feature Transform)

SIFT finds keypoints that remain stable under scale, rotation, and illumination changes.

It’s widely used for image matching, panoramas, 3D reconstruction, and object recognition.

Steps

Scale-space extrema detection
- Use Difference of Gaussians (DoG) across scales. - Detect maxima/minima in space-scale neighborhood.2. Keypoint localization
- Refine keypoint position and discard unstable ones.3. Orientation assignment
- Assign dominant gradient direction.4. Descriptor generation
- Build a 128D histogram of gradient orientations in a local patch.

Tiny Code (Outline)

for each octave:
  build scale-space pyramid
  find DoG extrema
  localize keypoints
  assign orientations
  compute 128D descriptor

Properties

Property	Description
Scale Invariant	Detects features at multiple scales
Rotation Invariant	Uses local orientation
Robust	Handles lighting, noise, affine transforms

5. Comparison

Algorithm	Purpose	Output	Robustness
Canny	Edge detection	Binary edge map	Sensitive to thresholds
Hough	Shape detection	Lines, circles	Needs clean edges
SIFT	Feature detection	Keypoints, descriptors	Very robust

6. Applications

Domain	Use Case
Robotics	Visual SLAM, localization
AR / VR	Marker tracking
Search	Image matching
Medical	Edge segmentation
Industry	Quality inspection

7. Modern Successors

ORB (FAST + BRIEF): Efficient for real-time- SURF: Faster SIFT alternative- Harris / FAST: Corner detectors- Deep features: CNN-based descriptors

Why It Matters

These algorithms gave machines their first eyes, before deep learning, they were how computers recognized structure. Even today, they’re used in preprocessing, embedded systems, and hybrid pipelines.

“Before neural nets could dream, vision began with gradients, geometry, and votes.”

Try It Yourself

Implement Canny using Sobel and hysteresis.
Use Hough transform to detect lines in a synthetic image.
Try OpenCV SIFT to match keypoints between two rotated images.
Compare edge maps before and after Gaussian blur.

79. Pathfinding in Space (A*, RRT, PRM)

When navigating a maze, driving an autonomous car, or moving a robot arm, the question is the same: How do we find a path from start to goal efficiently and safely?

Pathfinding algorithms answer this question, balancing optimality, speed, and adaptability. In this section, we explore three foundational families:

A*: Heuristic search in grids and graphs- RRT (Rapidly-Exploring Random Tree): Sampling-based exploration- PRM (Probabilistic Roadmap): Precomputed navigation networks

1. The Pathfinding Problem

Given:

A space (grid, graph, or continuous)- A start node and goal node- A cost function (distance, time, energy)- Optional obstacles Find a collision-free, low-cost path.

2. A* (A-star) Search

A* combines Dijkstra’s algorithm with a heuristic that estimates remaining cost. It’s the most popular graph-based pathfinding algorithm.

Key Idea

Each node ( n ) has: \[ f(n) = g(n) + h(n) \]

( g(n) ): cost so far- ( h(n) ): estimated cost to goal- ( f(n) ): total estimated cost

Algorithm

Initialize priority queue with start node
While queue not empty:
- Pop node with smallest ( f(n) ) - If goal reached → return path - For each neighbor:
  - Compute new ( g ), ( f ) - Update queue if better

Tiny Code (Grid A*)

typedef struct { int x, y; double g, f; } Node;

double heuristic(Node a, Node b) {
    return fabs(a.x - b.x) + fabs(a.y - b.y); // Manhattan
}

void a_star(Node start, Node goal) {
    PriorityQueue open;
    push(open, start);
    while (!empty(open)) {
        Node cur = pop_min(open);
        if (cur == goal) return reconstruct_path();
        for (Node n : neighbors(cur)) {
            double tentative_g = cur.g + dist(cur, n);
            if (tentative_g < n.g) {
                n.g = tentative_g;
                n.f = n.g + heuristic(n, goal);
                push(open, n);
            }
        }
    }
}

Complexity

Time: ( O$E \log V$ )- Space: ( O(V) )- Optimal if ( h(n) ) is admissible (never overestimates)

Variants

Variant	Description
Dijkstra	A* with ( h(n) = 0 )
Greedy Best-First	Uses ( h(n) ) only
Weighted A*	Speeds up with tradeoff on optimality
Jump Point Search	Optimized for uniform grids

3. RRT (Rapidly-Exploring Random Tree)

A* struggles in continuous or high-dimensional spaces (e.g. robot arms). RRT tackles this with randomized exploration.

Core Idea

Grow a tree from the start by randomly sampling points.- Extend tree toward each sample (step size $\epsilon$).- Stop when near the goal.

Tiny Code (RRT Sketch)

Tree T = {start};
for (int i = 0; i < MAX_ITERS; i++) {
    Point q_rand = random_point();
    Point q_near = nearest(T, q_rand);
    Point q_new = steer(q_near, q_rand, step_size);
    if (collision_free(q_near, q_new))
        add_edge(T, q_near, q_new);
    if (distance(q_new, goal) < eps)
        return path;
}

Pros & Cons

Pros	Cons
Works in continuous space	Paths are suboptimal
Handles high dimensions	Randomness may miss narrow passages
Simple and fast	Needs post-processing (smoothing)

Variants

Variant	Description
RRT*	Asymptotically optimal
Bi-RRT	Grow from both start and goal
Informed RRT*	Focus on promising regions

4. PRM (Probabilistic Roadmap)

PRM builds a graph of feasible configurations, a roadmap, then searches it.

Steps

Sample random points in free space
Connect nearby points with collision-free edges
Search roadmap (e.g., with A*)

Tiny Code (PRM Sketch)

Graph G = {};
for (int i = 0; i < N; i++) {
    Point p = random_free_point();
    G.add_vertex(p);
}
for each p in G:
    for each q near p:
        if (collision_free(p, q))
            G.add_edge(p, q);
path = a_star(G, start, goal);

Pros & Cons

Pros	Cons
Precomputes reusable roadmap	Needs many samples for coverage
Good for multiple queries	Poor for single-query planning
Works in high-dim spaces	May need post-smoothing

5. Comparison

Algorithm	Space	Nature	Optimal	Use Case
A*	Discrete	Deterministic	Yes	Grids, graphs
RRT	Continuous	Randomized	No (RRT* = Yes)	Robotics, motion planning
PRM	Continuous	Randomized	Approx.	Multi-query planning

6. Applications

Domain	Use Case
Robotics	Arm motion, mobile navigation
Games	NPC pathfinding, AI navigation mesh
Autonomous vehicles	Route planning
Aerospace	Drone and spacecraft trajectory
Logistics	Warehouse robot movement

Why It Matters

Pathfinding is decision-making in space, it gives agents the ability to move, explore, and act purposefully. From Pac-Man to Mars rovers, every journey starts with an algorithm.

“To move with purpose, one must first see the paths that are possible.”

Try It Yourself

Implement A* on a 2D grid with walls.
Generate an RRT in a 2D obstacle field.
Build a PRM for a continuous space and run A* on the roadmap.
Compare speed and path smoothness across methods.

80. Computational Geometry Variants and Applications

Computational geometry is the study of algorithms on geometric data, points, lines, polygons, circles, and higher-dimensional shapes. By now, you’ve seen core building blocks: convex hulls, intersections, nearest neighbors, triangulations, and spatial indexing.

This final section brings them together through variants, generalizations, and real-world applications, showing how geometry quietly powers modern computing.

1. Beyond the Plane

Most examples so far assumed 2D geometry. But real systems often live in 3D or N-D spaces.

Dimension	Example Problems	Typical Uses
2D	Convex hull, polygon area, line sweep	GIS, CAD, mapping
3D	Convex polyhedra, mesh intersection, visibility	Graphics, simulation
N-D	Voronoi in high-D, KD-trees, optimization	ML, robotics, data science

Higher dimensions add complexity (and sometimes impossibility):

Exact geometry often replaced by approximations.- Volume, distance, and intersection tests become more expensive.

2. Approximate and Robust Geometry

Real-world geometry faces numerical errors (floating point) and degenerate cases (collinear, overlapping). To handle this, algorithms adopt robustness and approximation strategies.

Epsilon comparisons: treat values within tolerance as equal- Orientation tests: robustly compute turn direction via cross product- Exact arithmetic: rational or symbolic computation- Grid snapping: quantize space for stability Approximate geometry accepts small error for large speed-up, essential in graphics and machine learning.

3. Geometric Duality

A powerful tool for reasoning about problems: map points to lines, lines to points. For example:

A point ( (a, b) ) maps to line ( y = ax - b ).- A line ( y = mx + c ) maps to point ( (m, -c) ). Applications:
Transforming line intersection problems into point location problems- Simplifying half-plane intersections- Enabling arrangement algorithms in computational geometry Duality is a common trick: turn geometry upside-down to make it simpler.

4. Geometric Data Structures

Recap of core spatial structures and what they’re best at:

Structure	Stores	Queries	Use Case
KD-Tree	Points	NN, range	Low-D search
R-Tree	Rectangles	Overlaps	Spatial DB
Quad/Octree	Space partitions	Point lookup	Graphics, GIS
BSP Tree	Polygons	Visibility	Rendering
Delaunay Triangulation	Points	Neighbors	Mesh generation
Segment Tree	Intervals	Range	Sweep-line events

5. Randomized Geometry

Randomness simplifies deterministic geometry:

Randomized incremental construction (Convex Hulls, Delaunay)- Random sampling for approximation (ε-nets, VC dimension)- Monte Carlo geometry for probabilistic intersection and coverage Example: randomized incremental convex hull builds expected ( O$n \log n$ ) structures with elegant proofs.

6. Computational Topology

Beyond geometry lies shape connectivity, studied by topology. Algorithms compute connected components, holes, homology, and Betti numbers.

Applications include:

3D printing (watertightness)- Data analysis (persistent homology)- Robotics (free space topology) Geometry meets topology in alpha-shapes, simplicial complexes, and manifold reconstruction.

7. Geometry Meets Machine Learning

Many ML methods are geometric at heart:

Nearest neighbor → Voronoi diagram- SVM → hyperplane separation- K-means → Voronoi partitioning- Manifold learning → low-dim geometry- Convex optimization → geometric feasibility Visualization tools (t-SNE, UMAP) rely on spatial embedding and distance geometry.

8. Applications Across Fields

Field	Application	Geometric Core
Graphics	Rendering, collision	Triangulation, ray tracing
GIS	Maps, roads	Polygons, point-in-region
Robotics	Path planning	Obstacles, configuration space
Architecture	Modeling	Mesh operations
Vision	Object boundaries	Contours, convexity
AI	Clustering, similarity	Distance metrics
Physics	Simulation	Particle collision
Databases	Spatial joins	R-Trees, indexing

Geometry underpins structure, position, and relationship, the backbone of spatial reasoning.

9. Complexity and Open Problems

Some problems still challenge efficient solutions:

Point location in dynamic settings- Visibility graphs in complex polygons- Motion planning in high dimensions- Geometric median / center problems- Approximation guarantees in robust settings These remain active areas in computational geometry research.

Tiny Code (Point-in-Polygon via Ray Casting)

bool inside(Point p, Polygon poly) {
    int cnt = 0;
    for (int i = 0; i < poly.n; i++) {
        Point a = poly[i], b = poly[(i + 1) % poly.n];
        if (intersect_ray(p, a, b)) cnt++;
    }
    return cnt % 2 == 1; // odd crossings = inside
}

This small routine appears everywhere, maps, games, GUIs, and physics engines.

10. Why It Matters

Computational geometry is more than shape, it’s the mathematics of space, powering visual computing, spatial data, and intelligent systems. Everywhere something moves, collides, maps, or recognizes form, geometry is the invisible hand guiding it.

“All computation lives somewhere, and geometry is how we understand the where.”

Try It Yourself

Implement point-in-polygon and test on convex vs concave shapes.
Visualize a Delaunay triangulation and its Voronoi dual.
Experiment with KD-trees for nearest neighbor queries.
Write a small convex hull in 3D using incremental insertion.
Sketch an RRT path over a geometric map.

Chapter 9. Systems, Databases, and Distributed Algorithms

81. Concurrency Control (2PL, MVCC, OCC)

In multi-user or multi-threaded systems, many operations want to read or write shared data at the same time. Without discipline, this leads to chaos, lost updates, dirty reads, or even inconsistent states.

Concurrency control ensures correctness under parallelism, so that the result is as if each transaction ran alone (a property called serializability).

This section introduces three foundational techniques:

2PL - Two-Phase Locking- MVCC - Multi-Version Concurrency Control- OCC - Optimistic Concurrency Control

1. The Goal: Serializability

We want transactions to behave as if executed in some serial order, even though they’re interleaved.

A schedule is serializable if it yields the same result as some serial order of transactions.

Concurrency control prevents problems like:

Lost Update: Two writes overwrite each other.- Dirty Read: Read uncommitted data.- Non-repeatable Read: Data changes mid-transaction.- Phantom Read: New rows appear after a query.

2. Two-Phase Locking (2PL)

Idea: Use locks to coordinate access. Each transaction has two phases:

Growing phase: acquire locks (shared or exclusive)
Shrinking phase: release locks (no new locks allowed after release)

This ensures conflict-serializability.

Lock Types

Type	Operation	Shared?	Exclusive?
Shared (S)	Read	Yes	No
Exclusive (X)	Write	No	No

If a transaction needs to read: request S-lock. If it needs to write: request X-lock.

Tiny Code (Lock Manager Sketch)

void acquire_lock(Transaction *T, Item *X, LockType type) {
    while (conflict_exists(X, type))
        wait();
    add_lock(X, T, type);
}

void release_all(Transaction *T) {
    for (Lock *l in T->locks)
        unlock(l);
}

Example

T1: read(A); write(A)
T2: read(A); write(A)

Without locks → race condition. With 2PL → one must wait → consistent.

Variants

Variant	Description
Strict 2PL	Holds all locks until commit → avoids cascading aborts
Rigorous 2PL	Same as Strict, all locks released at end
Conservative 2PL	Acquires all locks before execution

Pros & Cons

Pros	Cons
Guarantees serializability	Can cause deadlocks
Simple concept	Blocking, contention under load

3. Multi-Version Concurrency Control (MVCC)

Idea: Readers don’t block writers, and writers don’t block readers. Each write creates a new version of data with a timestamp.

Transactions read from a consistent snapshot based on their start time.

Snapshot Isolation

Readers see the latest committed version at transaction start.- Writers produce new versions; conflicts detected at commit time. Each record stores:
value- created_at- deleted_at (if applicable)

Tiny Code (Version Chain)

struct Version {
    int value;
    Timestamp created;
    Timestamp deleted;
    Version *next;
};

Read finds version with created <= tx.start && deleted > tx.start.

Pros & Cons

Pros	Cons
No read locks	Higher memory (multiple versions)
Readers never block	Write conflicts at commit
Great for OLTP systems	GC of old versions needed

Used In

PostgreSQL- Oracle- MySQL (InnoDB)- Spanner

4. Optimistic Concurrency Control (OCC)

Idea: Assume conflicts are rare. Let transactions run without locks. At commit time, validate, if conflicts exist, rollback.

Phases

Read phase - execute, read data, buffer writes.
Validation phase - check if conflicts occurred.
Write phase - apply changes if valid, else abort.

Tiny Code (OCC Validation)

bool validate(Transaction *T) {
    for (Transaction *U in committed_since(T.start))
        if (conflict(T, U))
            return false;
    return true;
}

Pros & Cons

Pros	Cons
No locks → no deadlocks	High abort rate under contention
Great for low-conflict workloads	Wasted work on abort

Used In

In-memory DBs- Distributed systems- STM (Software Transactional Memory)

5. Choosing a Strategy

System Type	Preferred Control
OLTP (many reads/writes)	MVCC
OLAP (read-heavy)	MVCC or OCC
Real-time systems	2PL (predictable)
Low contention	OCC
High contention	2PL / MVCC

6. Why It Matters

Concurrency control is the backbone of consistency in databases, distributed systems, and even multi-threaded programs. It enforces correctness amid chaos, ensuring your data isn’t silently corrupted.

“Without order, parallelism is noise. Concurrency control is its conductor.”

Try It Yourself

Simulate 2PL with two transactions updating shared data.
Implement a toy MVCC table with version chains.
Write an OCC validator for three concurrent transactions.
Experiment: under high conflict, which model performs best?

82. Logging, Recovery, and Commit Protocols

No matter how elegant your algorithms or how fast your storage, failures happen. Power cuts, crashes, and network splits are inevitable. What matters is recovery, restoring the system to a consistent state without losing committed work.

Logging, recovery, and commit protocols form the backbone of reliable transactional systems, ensuring durability and correctness in the face of crashes.

1. The Problem

We need to guarantee the ACID properties:

Atomicity - all or nothing- Consistency - valid before and after- Isolation - no interference- Durability - once committed, always safe If a crash occurs mid-transaction, how do we roll back or redo correctly?

The answer: Log everything, then replay or undo after failure.

2. Write-Ahead Logging (WAL)

The golden rule:

“Write log entries before modifying the database.”

Every action is recorded in a sequential log on disk, ensuring the system can reconstruct the state.

Log Record Format

Each log entry typically includes:

LSN (Log Sequence Number)- Transaction ID- Operation (update, insert, delete)- Before image (old value)- After image (new value)

struct LogEntry {
    int lsn;
    int tx_id;
    char op[10];
    Value before, after;
};

When a transaction commits, the system first flushes logs to disk (fsync). Only then is the commit acknowledged.

3. Recovery Actions

When the system restarts, it reads logs and applies a recovery algorithm.

Three Phases (ARIES Model)

Analysis - determine state at crash (active vs committed)
Redo - repeat all actions from last checkpoint
Undo - rollback incomplete transactions

ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) is the most widely used approach (IBM DB2, PostgreSQL, SQL Server).

Redo Rule

If the system committed before crash → redo all updates so data is preserved.

Undo Rule

If the system didn’t commit → undo to maintain atomicity.

Tiny Code (Simplified Recovery Sketch)

void recover(Log log) {
    for (Entry e : log) {
        if (e.committed)
            apply(e.after);
        else
            apply(e.before);
    }
}

4. Checkpointing

Instead of replaying the entire log, systems take checkpoints, periodic snapshots marking a safe state.

Type	Description
Sharp checkpoint	Stop all transactions briefly, flush data + log
Fuzzy checkpoint	Mark consistent LSN; continue running

Checkpoints reduce recovery time: only replay after the last checkpoint.

5. Commit Protocols

In distributed systems, multiple nodes must agree to commit or abort together. This is handled by atomic commit protocols.

Two-Phase Commit (2PC)

Goal: All participants either commit or abort in unison.

Steps:

Prepare phase (voting):
- Coordinator asks all participants to “prepare” - Each replies yes/no2. Commit phase (decision):
- If all say yes → commit - Else → abort
```
Coordinator: PREPARE  
Participants: VOTE YES / NO  
Coordinator: COMMIT / ABORT
```

If the coordinator crashes after prepare, participants must wait → blocking protocol.

Tiny Code (2PC Pseudocode)

bool two_phase_commit(Participants P) {
    for each p in P:
        if (!p.prepare()) return abort_all();
    for each p in P:
        p.commit();
    return true;
}

Three-Phase Commit (3PC)

Improves on 2PC by adding an intermediate phase to avoid indefinite blocking. More complex, used in systems with reliable failure detection.

6. Logging in Distributed Systems

Each participant maintains its own WAL. To recover globally:

Use coordinated checkpoints- Maintain global commit logs- Consensus-based protocols (Paxos Commit, Raft) can replace 2PC for high availability

7. Example Timeline

Step	Action
T1 updates record A	WAL entry written
T1 updates record B	WAL entry written
T1 commits	WAL flush, commit record
Crash!	Disk may be inconsistent
Restart	Recovery scans log, redoes T1

8. Pros and Cons

Approach	Strength	Weakness
WAL	Simple, durable	Write overhead
Checkpointing	Faster recovery	I/O spikes
2PC	Global atomicity	Blocking
3PC / Consensus	Non-blocking	Complex, slower

9. Real Systems

System	Strategy
PostgreSQL	WAL + ARIES + Checkpoint
MySQL (InnoDB)	WAL + Fuzzy checkpoint
Spanner	WAL + 2PC + TrueTime
Kafka	WAL for durability
RocksDB	WAL + LSM checkpoints

10. Why It Matters

Logging and commit protocols make data survive crashes and stay consistent across machines. Without them, every failure risks corruption.

“Persistence is not about never failing, it’s about remembering how to stand back up.”

Try It Yourself

Write a toy WAL system that logs before writes.
Simulate a crash mid-transaction and replay the log.
Implement a simple 2PC coordinator with two participants.
Compare recovery time with vs without checkpoints.

83. Scheduling (Round Robin, EDF, Rate-Monotonic)

In operating systems and real-time systems, scheduling determines the order in which tasks or processes run. Since resources like CPU time are limited, a good scheduler aims to balance fairness, efficiency, and responsiveness.

1. The Goal of Scheduling

Every system has tasks competing for the CPU. Scheduling decides:

Which task runs next- How long it runs- When it yields or preempts Different goals apply in different domains:

Domain	Objective
General-purpose OS	Fairness, responsiveness
Real-time systems	Meeting deadlines
Embedded systems	Predictability
High-performance servers	Throughput, latency balance

A scheduler’s policy can be preemptive (interrupts tasks) or non-preemptive (waits for voluntary yield).

2. Round Robin Scheduling

Round Robin (RR) is one of the simplest preemptive schedulers. Each process gets a fixed time slice (quantum) and runs in a circular queue.

If a process doesn’t finish, it’s put back at the end of the queue.

Tiny Code: Round Robin (Pseudocode)

queue processes;
while (!empty(processes)) {
    process = dequeue(processes);
    run_for_quantum(process);
    if (!process.finished)
        enqueue(processes, process);
}

Characteristics

Fair: Every process gets CPU time.- Responsive: Short tasks don’t starve.- Downside: Context switching overhead if quantum is too small. #### Example

Process	Burst Time
P1	4
P2	3
P3	2

Quantum = 1 Order: P1, P2, P3, P1, P2, P3, P1, P2 → all finish fairly.

3. Priority Scheduling

Each task has a priority. The scheduler always picks the highest-priority ready task.

Preemptive: A higher-priority task can interrupt a lower one.- Non-preemptive: The CPU is released voluntarily. #### Problems
Starvation: Low-priority tasks may never run.- Solution: Aging - gradually increase waiting task priority.

4. Earliest Deadline First (EDF)

EDF is a dynamic priority scheduler for real-time systems. Each task has a deadline, and the task with the earliest deadline runs first.

Rule

At any time, run the ready task with the closest deadline.

Example

Task	Execution Time	Deadline
T1	1	3
T2	2	5
T3	1	2

Order: T3 → T1 → T2

EDF is optimal for preemptive scheduling of independent tasks on a single processor.

5. Rate-Monotonic Scheduling (RMS)

In periodic real-time systems, tasks repeat at fixed intervals. RMS assigns higher priority to tasks with shorter periods.

Task	Period	Priority
T1	2 ms	High
T2	5 ms	Medium
T3	10 ms	Low

It’s static (priorities don’t change) and optimal among fixed-priority schedulers.

Utilization Bound

For n tasks, RMS is guaranteed schedulable if:

\[ U = \sum_{i=1}^{n} \frac{C_i}{T_i} \le n(2^{1/n} - 1) \]

For example, for 3 tasks, $U \le 0.78$.

6. Shortest Job First (SJF)

Run the task with the shortest burst time first.

Non-preemptive SJF: Once started, runs to completion.- Preemptive SJF (Shortest Remaining Time First): Preempts if a shorter job arrives. Advantage: Minimizes average waiting time. Disadvantage: Needs knowledge of future job lengths.

7. Multilevel Queue Scheduling

Divide processes into classes (interactive, batch, system). Each class has its own queue with own policy, e.g.:

Queue 1: System → RR (quantum = 10ms)- Queue 2: Interactive → RR (quantum = 50ms)- Queue 3: Batch → FCFS (First-Come-First-Serve) CPU is assigned based on queue priority.

8. Multilevel Feedback Queue (MLFQ)

Processes move between queues based on behavior.

CPU-bound → move down (lower priority)- I/O-bound → move up (higher priority) Goal: Adaptive scheduling that rewards interactive tasks.

Used in modern OS kernels (Linux, Windows).

9. Scheduling Metrics

Metric	Meaning
Turnaround Time	Completion − Arrival
Waiting Time	Time spent in ready queue
Response Time	Time from arrival to first execution
Throughput	Completed tasks per unit time
CPU Utilization	% of time CPU is busy

Schedulers balance these based on design goals.

10. Why It Matters

Schedulers shape how responsive, efficient, and fair a system feels. In operating systems, they govern multitasking. In real-time systems, they ensure deadlines are met. In servers, they keep latency low and throughput high.

“Scheduling is not just about time. It’s about fairness, foresight, and flow.”

Try It Yourself

Simulate Round Robin with quantum = 2, compare average waiting time.
Implement EDF for a set of periodic tasks with deadlines.
Check schedulability under RMS for 3 periodic tasks.
Explore Linux CFS (Completely Fair Scheduler) source code.
Compare SJF and RR for CPU-bound vs I/O-bound workloads.

84. Caching and Replacement (LRU, LFU, CLOCK)

Caching is the art of remembering the past to speed up the future. In computing, caches store recently used or frequently accessed data to reduce latency and load on slower storage (like disks or networks). The challenge: caches have limited capacity, so when full, we must decide what to evict. That’s where replacement policies come in.

1. The Need for Caching

Caches appear everywhere:

CPU: L1, L2, L3 caches speed up memory access- Databases: query results or index pages- Web browsers / CDNs: recently fetched pages- Operating systems: page cache for disk blocks The principle guiding all caches is locality:
Temporal locality: recently used items are likely used again soon- Spatial locality: nearby items are likely needed next

2. Cache Replacement Problem

When the cache is full, which item should we remove?

We want to minimize cache misses (requests not found in cache).

Formally:

Given a sequence of accesses, find a replacement policy that minimizes misses.

Theoretical optimal policy (OPT): always evict the item used farthest in the future. But OPT requires future knowledge, so we rely on heuristics like LRU, LFU, CLOCK.

3. Least Recently Used (LRU)

LRU evicts the least recently accessed item. It assumes recently used = likely to be used again.

Implementation Approaches

Stack (list): move item to top on access- Hash map + doubly linked list: O(1) insert, delete, lookup #### Tiny Code: LRU (Simplified)

typedef struct Node {
    int key;
    struct Node *prev, *next;
} Node;

HashMap cache;
List lru_list;

void access(int key) {
    if (in_cache(key)) move_to_front(key);
    else {
        if (cache_full()) remove_lru();
        insert_front(key);
    }
}

Pros

Good for workloads with strong temporal locality #### Cons
Costly in hardware or massive caches (metadata overhead)

4. Least Frequently Used (LFU)

LFU evicts the least frequently accessed item.

Tracks usage count for each item:

Increment on each access- Evict lowest-count item #### Example

Item	Accesses	Frequency
A	3	3
B	1	1
C	2	2

Evict B.

Variants

LFU with aging: gradually reduce counts to adapt to new trends- Approximate LFU: counters in ranges (for memory efficiency) #### Pros
Great for stable, repetitive workloads #### Cons
Poor for workloads with shifting popularity (slow adaptation)

5. FIFO (First In First Out)

Simple but naive:

Evict the oldest item, ignoring usage Used in simple hardware caches. Good when access pattern is cyclic, bad otherwise.

6. Random Replacement (RR)

Evict a random entry.

Surprisingly competitive in some high-concurrency systems, and trivial to implement. Used in memcached (as an option).

7. CLOCK Algorithm

A practical approximation of LRU, widely used in OS page replacement.

Each page has a reference bit (R). Pages form a circular list.

Algorithm:

Clock hand sweeps over pages.
If R = 0, evict page.
If R = 1, set R = 0 and skip.

This mimics LRU with O(1) cost and low overhead.

8. Second-Chance and Enhanced CLOCK

Second-Chance: give recently used pages a “second chance” before eviction. Enhanced CLOCK: also uses modify bit (M) to prefer clean pages.

Used in Linux’s page replacement (with Active/Inactive lists).

9. Adaptive Algorithms

Modern systems use hybrid or adaptive policies:

ARC (Adaptive Replacement Cache) - balances recency and frequency- CAR (Clock with Adaptive Replacement) - CLOCK-style adaptation- TinyLFU - frequency sketch + admission policy- Hyperbolic caching - popularity decay for large-scale systems These adapt dynamically to changing workloads.

10. Why It Matters

Caching is the backbone of system speed:

OS uses it for paging- Databases for buffer pools- CPUs for memory hierarchies- CDNs for global acceleration Choosing the right eviction policy can mean orders of magnitude improvement in latency and throughput.

“A good cache remembers what matters, and forgets what no longer does.”

Try It Yourself

Simulate a cache of size 3 with sequence: A B C A B D A B C D Compare LRU, LFU, and FIFO miss counts.
Implement LRU with a doubly-linked list and hash map in C.
Try CLOCK with reference bits, simulate a sweep.
Experiment with ARC and TinyLFU for dynamic workloads.
Measure hit ratios for different access patterns (sequential, random, looping).

85. Networking (Routing, Congestion Control)

Networking algorithms make sure data finds its way through vast, connected systems, efficiently, reliably, and fairly. Two core pillars of network algorithms are routing (deciding where packets go) and congestion control (deciding how fast to send them).

Together, they ensure the internet functions under heavy load, dynamic topology, and unpredictable demand.

1. The Goals of Networking Algorithms

Correctness: all destinations are reachable if paths exist- Efficiency: use minimal resources (bandwidth, latency, hops)- Scalability: support large, dynamic networks- Robustness: recover from failures- Fairness: avoid starving flows

2. Types of Routing

Routing decides paths packets should follow through a graph-like network.

Static vs Dynamic Routing

Static: fixed routes, manual configuration (good for small networks)- Dynamic: routes adjust automatically as topology changes (internet-scale) #### Unicast, Multicast, Broadcast
Unicast: one-to-one (most traffic)- Multicast: one-to-many (video streaming, gaming)- Broadcast: one-to-all (local networks)

3. Shortest Path Routing

Most routing relies on shortest path algorithms:

Dijkstra’s Algorithm

Builds shortest paths from one source- Complexity: O(E log V) with priority queue Used in:
OSPF (Open Shortest Path First)- IS-IS (Intermediate System to Intermediate System) #### Bellman-Ford Algorithm
Handles negative edges- Basis for Distance-Vector routing (RIP) #### Tiny Code: Dijkstra for Routing

#define INF 1e9
int dist[MAX], visited[MAX];
vector<pair<int,int>> adj[MAX];

void dijkstra(int s, int n) {
    for (int i = 0; i < n; i++) dist[i] = INF;
    dist[s] = 0;
    priority_queue<pair<int,int>> pq;
    pq.push({0, s});
    while (!pq.empty()) {
        int u = pq.top().second; pq.pop();
        if (visited[u]) continue;
        visited[u] = 1;
        for (auto [v, w]: adj[u]) {
            if (dist[v] > dist[u] + w) {
                dist[v] = dist[u] + w;
                pq.push({-dist[v], v});
            }
        }
    }
}

4. Distance-Vector vs Link-State

Feature	Distance-Vector (RIP)	Link-State (OSPF)
Info Shared	Distance to neighbors	Full topology map
Convergence	Slower (loops possible)	Fast (SPF computation)
Complexity	Lower	Higher
Examples	RIP, BGP (conceptually)	OSPF, IS-IS

RIP uses Bellman-Ford. OSPF floods link-state updates, runs Dijkstra at each node.

5. Hierarchical Routing

Large-scale networks (like the Internet) use hierarchical routing:

Routers grouped into Autonomous Systems (AS)- Intra-AS routing: OSPF, IS-IS- Inter-AS routing: BGP (Border Gateway Protocol) BGP exchanges reachability info, not shortest paths, and prefers policy-based routing (e.g., cost, contracts, peering).

6. Congestion Control

Even with good routes, we can’t flood links. Congestion control ensures fair and efficient use of bandwidth.

Implemented primarily at the transport layer (TCP).

TCP Congestion Control

Key components:

Additive Increase, Multiplicative Decrease (AIMD)- Slow Start: probe capacity- Congestion Avoidance: grow cautiously- Fast Retransmit / Recovery Modern variants:
TCP Reno: classic AIMD- TCP Cubic: non-linear growth for high-speed networks- BBR (Bottleneck Bandwidth + RTT): model-based control #### Algorithm Sketch (AIMD)

On ACK: cwnd += 1/cwnd  // increase slowly
On loss: cwnd /= 2      // halve window

7. Queue Management

Routers maintain queues. Too full? => Packet loss, latency spikes, tail drop.

Solutions:

RED (Random Early Detection) - drop packets early- CoDel (Controlled Delay) - monitor queue delay, drop adaptively These prevent bufferbloat, improving latency for real-time traffic.

8. Flow Control vs Congestion Control

Flow Control: prevent sender from overwhelming receiver- Congestion Control: prevent sender from overwhelming network TCP uses both: receive window (rwnd) and congestion window (cwnd). Actual sending rate = min(rwnd, cwnd).

9. Data Plane vs Control Plane

Control Plane: decides routes (OSPF, BGP)- Data Plane: forwards packets (fast path) Modern networking (e.g. SDN, Software Defined Networking) separates these:
Controller computes routes- Switches act on flow rules

10. Why It Matters

Routing and congestion control shape the performance of:

The Internet backbone- Data center networks (with load balancing)- Cloud services and microservice meshes- Content delivery networks (CDNs) Every packet’s journey, from your laptop to a global data center, relies on these ideas.

“Networking is not magic. It’s algorithms moving data through time and space.”

Try It Yourself

Implement Dijkstra’s algorithm for a small network graph.
Simulate RIP (Distance Vector): each node updates from neighbors.
Model TCP AIMD window growth; visualize with Python.
Try RED: drop packets when queue length > threshold.
Compare TCP Reno, Cubic, BBR throughput in simulation.

86. Distributed Consensus (Paxos, Raft, PBFT)

In a distributed system, multiple nodes must agree on a single value, for example, the state of a log, a database entry, or a blockchain block. This agreement process is called consensus.

Consensus algorithms let distributed systems act as one reliable system, even when some nodes fail, crash, or lie (Byzantine faults).

1. Why Consensus?

Imagine a cluster managing a shared log (like in databases or Raft). Each node might:

See different requests,- Fail and recover,- Communicate over unreliable links. We need all non-faulty nodes to agree on the same order of operations.

A valid consensus algorithm must satisfy:

Agreement: all correct nodes choose the same value- Validity: the chosen value was proposed by a node- Termination: every correct node eventually decides- Fault Tolerance: works despite failures

2. The FLP Impossibility

The FLP theorem (Fischer, Lynch, Paterson, 1985) says:

In an asynchronous system with even one faulty process, no deterministic algorithm can guarantee consensus.

So practical algorithms use:

Randomization, or- Partial synchrony (timeouts, retries)

3. Paxos: The Classical Algorithm

Paxos, by Leslie Lamport, is the theoretical foundation for distributed consensus.

It revolves around three roles:

Proposers: suggest values- Acceptors: vote on proposals- Learners: learn the final decision Consensus proceeds in two phases.

Phase 1 (Prepare)

Proposer picks a proposal number n and sends (Prepare, n) to acceptors.
Acceptors respond with their highest accepted proposal (if any).

Phase 2 (Accept)

If proposer receives a majority of responses, it sends (Accept, n, v) with value v (highest seen or new).
Acceptors accept if they haven’t promised higher n.

When a majority accept, value v is chosen.

Guarantees

Safety: no two different values chosen- Liveness: possible under stable leadership #### Drawbacks
Complex to implement correctly- High messaging overhead > “Paxos is for theorists; Raft is for engineers.”

4. Raft: Understandable Consensus

Raft was designed to be simpler and more practical than Paxos, focusing on replicated logs.

Roles

Leader: coordinates all changes- Followers: replicate leader’s log- Candidates: during elections #### Workflow

Leader Election
- Timeout triggers candidate election. - Each follower votes; majority wins.2. Log Replication
- Leader appends entries, sends AppendEntries RPCs. - Followers acknowledge; leader commits when majority ack.3. Safety
- Logs are consistent across majority. - Followers accept only valid prefixes. Raft ensures:

At most one leader per term- Committed entries never lost- Logs stay consistent #### Pseudocode Sketch

on timeout -> become_candidate()
send RequestVote(term, id)
if majority_votes -> become_leader()

on AppendEntries(term, entries):
    if term >= current_term:
        append(entries)
        reply success

5. PBFT: Byzantine Fault Tolerance

Paxos and Raft assume crash faults (nodes stop, not lie). For Byzantine faults (arbitrary behavior), we use PBFT (Practical Byzantine Fault Tolerance).

Tolerates up to f faulty nodes out of 3f + 1 total.

Phases

Pre-Prepare: Leader proposes value
Prepare: Nodes broadcast proposal hashes
Commit: Nodes confirm receipt by 2f+1 votes

Used in blockchains and critical systems (space, finance).

6. Quorum Concept

Consensus often relies on quorums (majorities):

Two quorums always intersect, ensuring consistency.- Write quorum + read quorum ≥ total nodes. In Raft/Paxos:
Majority = N/2 + 1- Guarantees overlap even if some nodes fail.

7. Log Replication and State Machines

Consensus underlies Replicated State Machines (RSM):

Every node applies the same commands in the same order.- Guarantees deterministic, identical states. This model powers:
Databases (etcd, Spanner, TiKV)- Coordination systems (ZooKeeper, Consul)- Kubernetes control planes

8. Leader Election

All practical consensus systems need leaders:

Simplifies coordination- Reduces conflicts- Heartbeats detect failures- New elections restore progress Algorithms:
Raft Election (random timeouts)- Bully Algorithm- Chang-Roberts Ring Election

9. Performance and Optimization

Batching: amortize RPC overhead- Pipeline: parallelize appends- Read-only optimizations: serve from followers (stale reads)- Witness nodes: participate in quorum without full data Advanced:
Multi-Paxos: reuse leader, fewer rounds- Fast Paxos: shortcut some phases- Viewstamped Replication: Paxos-like log replication

10. Why It Matters

Consensus is the backbone of reliability in modern distributed systems. Every consistent database, service registry, or blockchain depends on it.

Systems using consensus:

etcd, Consul, ZooKeeper - cluster coordination- Raft in Kubernetes - leader election- PBFT in blockchains - fault-tolerant ledgers- Spanner, TiDB - consistent databases > “Consensus is how machines learn to agree, and trust.”

Try It Yourself

Implement Raft leader election in C or Python.
Simulate Paxos on 5 nodes with message drops.
Explore PBFT: try failing nodes and Byzantine behavior.
Compare performance of Raft vs Paxos under load.
Build a replicated key-value store with Raft.

87. Load Balancing and Rate Limiting

When systems scale, no single server can handle all requests alone. Load balancing distributes incoming traffic across multiple servers to improve throughput, reduce latency, and prevent overload. Meanwhile, rate limiting protects systems by controlling how often requests are allowed, ensuring fairness, stability, and security.

These two ideas, spreading the load and controlling the flow, are cornerstones of modern distributed systems and APIs.

1. Why Load Balancing Matters

Imagine a web service receiving thousands of requests per second. If every request went to one machine, it would crash. A load balancer (LB) acts as a traffic director, spreading requests across many backends.

Goals:

Efficiency - fully utilize servers- Reliability - no single point of failure- Scalability - handle growing workloads- Flexibility - add/remove servers dynamically

2. Types of Load Balancers

1. Layer 4 (Transport Layer)

Balances based on IP and port. Fast and protocol-agnostic (works for TCP/UDP).

Example: Linux IPVS, Envoy, HAProxy

2. Layer 7 (Application Layer)

Understands protocols like HTTP. Can route by URL path, headers, cookies.

Example: Nginx, Envoy, AWS ALB

3. Load Balancing Algorithms

Round Robin

Cycles through backends in order.

Req1 → ServerA  
Req2 → ServerB  
Req3 → ServerC

Simple, fair (if all nodes equal).

Weighted Round Robin

Assigns weights to reflect capacity. Example: ServerA(2x), ServerB(1x)

Least Connections

Send request to server with fewest active connections.

Least Response Time

Select backend with lowest latency (monitored dynamically).

Hash-Based (Consistent Hashing)

Deterministically route based on request key (like user ID).

Keeps cache locality- Used in CDNs, distributed caches (e.g. memcached) #### Random

Pick a random backend, surprisingly effective under uniform load.

4. Consistent Hashing (In Depth)

Used for sharding and sticky sessions.

Key idea:

Map servers to a hash ring- A request’s key is hashed onto the ring- Assigned to next clockwise server When servers join/leave, only small fraction of keys move.

Used in:

CDNs- Distributed caches (Redis Cluster, DynamoDB)- Load-aware systems

5. Health Checks and Failover

A smart LB monitors health of each server:

Heartbeat pings (HTTP/TCP)- Auto-remove unhealthy servers- Rebalance traffic instantly Example: If ServerB fails, remove from rotation:

Healthy: [ServerA, ServerC]

Also supports active-passive failover: hot standby servers take over when active fails.

6. Global Load Balancing

Across regions or data centers:

GeoDNS: route to nearest region- Anycast: advertise same IP globally; routing picks nearest- Latency-based routing: measure and pick lowest RTT Used by CDNs, cloud services, multi-region apps

7. Rate Limiting: The Other Side

If load balancing spreads the work, rate limiting keeps total work reasonable.

It prevents:

Abuse (bots, DDoS)- Overload (too many requests)- Fairness issues (no user dominates resources) Policies:
Per-user, per-IP, per-API-key- Global or per-endpoint

8. Rate Limiting Algorithms

Token Bucket

Bucket holds tokens (capacity = burst limit)- Each request consumes 1 token- Tokens refill at constant rate (rate limit)- If empty → reject or delay Good for bursty traffic.

if (tokens > 0) {
    tokens--;
    allow();
} else reject();

Leaky Bucket

Requests flow into a bucket, drain at fixed rate- Excess = overflow = dropped Smooths bursts; used for shaping.

Fixed Window Counter

Count requests in fixed interval (e.g. 1s)- Reset every window- Simple but unfair around boundaries #### Sliding Window Log / Sliding Window Counter
Maintain timestamps of requests- Remove old ones beyond time window- More accurate and fair

9. Combining Both

A full system might:

Use rate limiting per user or service- Use load balancing across nodes- Apply circuit breakers when overload persists Together, they form resilient architectures that stay online even under spikes.

10. Why It Matters

These techniques make large-scale systems:

Scalable - handle millions of users- Stable - prevent cascading failures- Fair - each client gets a fair share- Resilient - recover gracefully from spikes or node loss Used in:
API Gateways (Kong, Envoy, Nginx)- Cloud Load Balancers (AWS ALB, GCP LB)- Kubernetes Ingress and Service Meshes- Distributed Caches and Databases > “Balance keeps systems alive. Limits keep them sane.”

Try It Yourself

Simulate Round Robin and Least Connections balancing across 3 servers.
Implement a Token Bucket rate limiter in C or Python.
Test burst traffic, observe drops or delays.
Combine Consistent Hashing with Token Bucket for user-level control.
Visualize how load balancing + rate limiting keep system latency low.

88. Search and Indexing (Inverted, BM25, WAND)

Search engines, whether web-scale like Google or local like SQLite’s FTS, rely on efficient indexing and ranking to answer queries fast. Instead of scanning all documents, they use indexes (structured lookup tables) to quickly find relevant matches.

This section explores inverted indexes, ranking algorithms (TF-IDF, BM25), and efficient retrieval techniques like WAND.

1. The Search Problem

Given:

A corpus of documents- A query (e.g., “machine learning algorithms”) We want to return:
Relevant documents- Ranked by importance and similarity Naive search → O(N × M) comparisons Inverted indexes → O(K log N), where K = terms in query

2. Inverted Index: The Heart of Search

An inverted index maps terms → documents containing them.

Example

Term	Postings List
“data”	[1, 4, 5]
“algorithm”	[2, 3, 5]
“machine”	[1, 2]

Each posting may include:

docID- term frequency (tf)- positions (for phrase search) #### Construction Steps

Tokenize documents → words
Normalize (lowercase, stemming, stopword removal)
Build postings: term → [docIDs, tf, positions]
Sort & compress for storage efficiency

Used by:

Elasticsearch, Lucene, Whoosh, Solr

3. Boolean Retrieval

Simplest model:

Query = Boolean expression e.g. (machine AND learning) OR AI

Use set operations on postings:

AND → intersection- OR → union- NOT → difference Fast intersection uses merge algorithm on sorted lists.

void intersect(int A[], int B[], int n, int m) {
    int i = 0, j = 0;
    while (i < n && j < m) {
        if (A[i] == B[j]) { print(A[i]); i++; j++; }
        else if (A[i] < B[j]) i++;
        else j++;
    }
}

But Boolean search doesn’t rank results, so we need scoring models.

4. Vector Space Model

Represent documents and queries as term vectors. Each dimension = term weight (tf-idf).

tf: term frequency in document- idf: inverse document frequency $idf = \log\frac{N}{df_t}$

Cosine similarity measures relevance: \[ \text{score}(q, d) = \frac{q \cdot d}{|q| |d|} \]

Simple, interpretable, forms basis of BM25 and modern embeddings.

5. BM25: The Classic Ranking Function

BM25 (Best Match 25) is the de facto standard in information retrieval.

\[ \text{score}(q, d) = \sum_{t \in q} IDF(t) \cdot \frac{f(t, d) \cdot (k_1 + 1)}{f(t, d) + k_1 \cdot (1 - b + b \cdot \frac{|d|}{avgdl})} \]

Where:

( f(t, d) ): term frequency- ( |d| ): doc length- ( avgdl ): average doc length- $k_1, b$: tunable params (typ. 1.2-2.0, 0.75) #### Advantages
Balances term frequency, document length, and rarity- Fast and effective baseline- Still used in Elasticsearch, Lucene, OpenSearch

6. Efficiency Tricks: WAND, Block-Max WAND

Ranking involves merging multiple postings. We can skip irrelevant documents early with WAND (Weak AND).

WAND Principle

Each term has upper-bound score- Maintain pointers in each posting- Compute potential max score- If max < current threshold, skip doc Improves latency for top-k retrieval.

Variants:

BMW (Block-Max WAND) - uses block-level score bounds- MaxScore - simpler thresholding- Dynamic pruning - skip unpromising candidates

7. Index Compression

Postings lists are long, compression is crucial.

Common schemes:

Delta encoding: store gaps between docIDs- Variable-byte (VB) or Gamma coding- Frame of Reference (FOR) and SIMD-BP128 for vectorized decoding Goal: smaller storage + faster decompression

8. Advanced Retrieval

Proximity Search

Require words appear near each other. Use positional indexes.

Phrase Search

Match exact sequences using positions: “machine learning” ≠ “learning machine”

Fuzzy / Approximate Search

Allow typos: Use Levenshtein automata, n-grams, or k-approximate matching

Fielded Search

Score per field (title, body, tags) Weighted combination

9. Learning-to-Rank and Semantic Search

Modern search adds ML-based re-ranking:

Learning to Rank (LTR): use features (tf, idf, BM25, clicks)- Neural re-ranking: BERT-style embeddings for semantic similarity- Hybrid retrieval: combine BM25 + dense vectors (e.g. ColBERT, RRF) Also: ANN (Approximate Nearest Neighbor) for vector-based search.

10. Why It Matters

Efficient search powers:

Web search engines- IDE symbol lookup- Log search, code search- Database full-text search- AI retrieval pipelines (RAG) It’s where algorithms meet language and scale.

“Search is how we connect meaning to memory.”

Try It Yourself

Build a tiny inverted index in C or Python.
Implement Boolean AND and OR queries.
Compute TF-IDF and BM25 scores for a toy dataset.
Add WAND pruning for top-k retrieval.
Compare BM25 vs semantic embeddings for relevance.

89. Compression and Encoding in Systems

Compression and encoding algorithms are the quiet workhorses of computing, shrinking data to save space, bandwidth, and time. They allow systems to store more, transmit faster, and process efficiently. From files and databases to networks and logs, compression shapes nearly every layer of system design.

1. Why Compression Matters

Compression is everywhere:

Databases - column stores, indexes, logs- Networks - HTTP, TCP, QUIC payloads- File systems - ZFS, NTFS, btrfs compression- Streaming - video/audio codecs- Logs & telemetry - reduce I/O and storage cost Benefits:
Smaller data = faster I/O- Less storage = lower cost- Less transfer = higher throughput Trade-offs:
CPU overhead (compression/decompression)- Latency (especially for small data)- Suitability (depends on entropy and structure)

2. Key Concepts

Entropy

Minimum bits needed to represent data (Shannon). High entropy → less compressible.

Redundancy

Compression exploits repetition and patterns.

Lossless vs Lossy

Lossless: reversible (ZIP, PNG, LZ4)- Lossy: approximate (JPEG, MP3, H.264) In system contexts, lossless dominates.

3. Common Lossless Compression Families

Huffman Coding

Prefix-free variable-length codes- Frequent symbols = short codes- Optimal under symbol-level model Used in: DEFLATE, JPEG, MP3

Arithmetic Coding

Encodes sequence as fractional interval- More efficient than Huffman for skewed distributions- Used in: H.264, bzip2, AV1 #### Dictionary-Based (LZ77, LZ78)
Replace repeated substrings with references- Core of ZIP, gzip, zlib, LZMA, Snappy #### LZ77 Sketch

while (not EOF) {
    find longest match in sliding window;
    output (offset, length, next_char);
}

Variants:

LZ4 - fast, lower ratio- Snappy - optimized for speed- Zstandard (Zstd) - tunable speed/ratio, dictionary support #### Burrows-Wheeler Transform (BWT)
Reorders data to group similar symbols- Followed by Move-To-Front + Huffman- Used in bzip2, BWT-based compressors #### Run-Length Encoding (RLE)
Replace consecutive repeats with (symbol, count)- Great for structured or sparse data Example: AAAAABBBCC → (A,5)(B,3)(C,2)

4. Specialized Compression in Systems

Columnar Databases

Compress per column:

Dictionary encoding - map strings → ints- Run-length encoding - good for sorted columns- Delta encoding - store differences (time series)- Bit-packing - fixed-width integers in minimal bits Combine multiple for optimal ratio.

Example (time deltas):

[100, 102, 103, 107] → [100, +2, +1, +4]

Log and Telemetry Compression

Structured formats → fieldwise encoding- Often Snappy or LZ4 for fast decode- Aggregators (Fluentd, Loki, Kafka) rely heavily on them #### Data Lakes and Files
Parquet, ORC, Arrow → columnar + compressed- Choose codec per column: LZ4 for speed, Zstd for ratio

5. Streaming and Chunked Compression

Large data often processed in chunks:

Enables random access and parallelism- Needed for network streams, distributed files Example: zlib block, Zstd frame, gzip chunk

Used in:

HTTP chunked encoding- Kafka log segments- MapReduce shuffle

6. Encoding Schemes

Compression ≠ encoding. Encoding ensures safe transport.

Base64

Maps 3 bytes → 4 chars- 33% overhead- Used for binary → text (emails, JSON APIs) #### URL Encoding
Escape unsafe chars with %xx #### Delta Encoding
Store differences, not full values #### Varint / Zigzag Encoding
Compact integers (e.g. protobufs)- Smaller numbers → fewer bytes Example:

while (x >= 0x80) { emit((x & 0x7F) | 0x80); x >>= 7; }
emit(x);

7. Adaptive and Context Models

Modern compressors adapt to local patterns:

PPM (Prediction by Partial Matching)- Context mixing (PAQ)- Zstd uses FSE (Finite State Entropy) coding Balance between speed, memory, and compression ratio.

8. Hardware Acceleration

Compression can be offloaded to:

CPUs with SIMD (AVX2, SSE4.2)- GPUs (parallel encode/decode)- NICs / SmartNICs- ASICs (e.g., Intel QAT) Critical for high-throughput databases, network appliances, storage systems.

9. Design Trade-offs

Goal	Best Choice
Max speed	LZ4, Snappy
Max ratio	Zstd, LZMA
Balance	Zstd (tunable)
Column store	RLE, Delta, Dict
Logs / telemetry	Snappy, LZ4
Archival	bzip2, xz
Real-time	LZ4, Brotli (fast mode)

Choose based on CPU budget, I/O cost, latency tolerance.

10. Why It Matters

Compression is a first-class optimization:

Saves petabytes in data centers- Boosts throughput across networks- Powers cloud storage (S3, BigQuery, Snowflake)- Enables efficient analytics and ML pipelines > “Every byte saved is time earned.”

Try It Yourself

Compress text using Huffman coding (build frequency table).
Compare gzip, Snappy, and Zstd on a 1GB dataset.
Implement delta encoding and RLE for numeric data.
Try dictionary encoding on repetitive strings.
Measure compression ratio, speed, and CPU usage trade-offs.

90. Fault Tolerance and Replication

Modern systems must survive hardware crashes, network partitions, or data loss without stopping. Fault tolerance ensures that a system continues to function, even when parts fail. Replication underpins this resilience, duplicating data or computation across multiple nodes for redundancy, performance, and consistency.

Together, they form the backbone of reliability in distributed systems.

1. Why Fault Tolerance?

No system is perfect:

Servers crash- Disks fail- Networks partition- Power goes out The question isn’t if failure happens, but when. Fault-tolerant systems detect, contain, and recover from failure automatically.

Goals:

Availability - keep serving requests- Durability - never lose data- Consistency - stay correct across replicas

2. Failure Models

Crash Faults

Node stops responding but doesn’t misbehave. Handled by restarts or replication (Raft, Paxos).

Omission Faults

Lost messages or dropped updates. Handled with retries and acknowledgments.

Byzantine Faults

Arbitrary/malicious behavior. Handled by Byzantine Fault Tolerance (PBFT), expensive but robust.

3. Redundancy: The Core Strategy

Fault tolerance = redundancy + detection + recovery

Redundancy types:

Hardware: multiple power supplies, disks (RAID)- Software: replicated services, retries- Data: multiple copies, erasure codes- Temporal: retry or checkpoint and replay

4. Replication Models

1. Active Replication

All replicas process requests in parallel (lockstep). Results must match. Used in real-time and Byzantine systems.

2. Passive (Primary-Backup)

One leader (primary) handles requests. Backups replicate log, take over on failure. Used in Raft, ZooKeeper, PostgreSQL streaming.

3. Quorum Replication

Writes and reads contact majority of replicas. Ensures overlap → consistency. Used in Cassandra, DynamoDB, Etcd.

5. Consistency Models

Replication introduces a trade-off between consistency and availability (CAP theorem).

Strong Consistency

All clients see the same value immediately. Example: Raft, Etcd, Spanner.

Eventual Consistency

Replicas converge over time. Example: DynamoDB, Cassandra.

Causal Consistency

Preserves causal order of events. Example: Vector clocks, CRDTs.

Choice depends on workload:

Banking → strong- Social feeds → eventual- Collaborative editing → causal

6. Checkpointing and Recovery

To recover after crash:

Periodically checkpoint state- On restart, replay log of missed events Example: Databases → Write-Ahead Log (WAL) Stream systems → Kafka checkpoints

1. Save state to disk
2. Record latest log position
3. On restart → reload + replay

7. Erasure Coding

Instead of full copies, store encoded fragments. With ( k ) data blocks, ( m ) parity blocks → tolerate ( m ) failures.

Example: Reed-Solomon (used in HDFS, Ceph)

k	m	Total	Fault Tolerance
4	2	6	2 failures

Better storage efficiency than 3× replication.

8. Failure Detection

Detecting failure is tricky in distributed systems (because of latency). Common techniques:

Heartbeats - periodic “I’m alive” messages- Timeouts - suspect node if no heartbeat- Gossip protocols - share failure info among peers Used in Consul, Cassandra, Kubernetes health checks.

9. Self-Healing Systems

After failure:

Detect it
Isolate faulty component
Replace or restart
Rebalance load or re-replicate data

Patterns:

Supervisor trees (Erlang/Elixir)- Self-healing clusters (Kubernetes)- Rebalancing (Cassandra ring repair) “Never trust a single machine, trust the system.”

10. Why It Matters

Fault tolerance turns fragile infrastructure into reliable services.

Used in:

Databases (replication + WAL)- Distributed storage (HDFS, Ceph, S3)- Orchestration (Kubernetes controllers)- Streaming systems (Kafka, Flink) Without replication and fault tolerance, large-scale systems would collapse under failure.

“Resilience is built, not assumed.”

Try It Yourself

Build a primary-backup key-value store: leader writes, follower replicates.
Add heartbeat + timeout detection to trigger failover.
Simulate partition: explore behavior under strong vs eventual consistency.
Implement checkpoint + replay recovery for a small app.
Compare 3× replication vs Reed-Solomon (4+2) for space and reliability.

Chapter 10. AI, ML and Optimization

91. Classical ML (k-means, Naive Bayes, SVM, Decision Trees)

Classical machine learning is built on interpretable mathematics and solid optimization foundations. Long before deep learning, these algorithms powered search engines, spam filters, and recommendation systems. They’re still used today, fast, explainable, and easy to deploy.

This section covers the four pillars of classical ML:

k-means - unsupervised clustering- Naive Bayes - probabilistic classification- SVM - margin-based classification- Decision Trees - rule-based learning

1. The Essence of Classical ML

Classical ML is about learning from data using statistical principles, often without huge compute. Given dataset ( D = {$x_i, y_i$} ), the task is to:

Predict ( y ) from ( x )- Generalize beyond seen data- Balance bias and variance

2. k-means Clustering

Goal: partition data into ( k ) groups (clusters) such that intra-cluster distance is minimized.

Objective

\[ \min_{C} \sum_{i=1}^k \sum_{x \in C_i} |x - \mu_i|^2 \] Where $\mu_i$ = centroid of cluster ( i ).

Algorithm

Choose ( k ) random centroids
Assign each point to nearest centroid
Recompute centroids
Repeat until stable

Tiny Code (C-style)

for (iter = 0; iter < max_iter; iter++) {
    assign_points_to_clusters();
    recompute_centroids();
}

Pros

Simple, fast (( O(nkd) ))- Works well for spherical clusters #### Cons
Requires ( k )- Sensitive to initialization, outliers Variants:
k-means++ (better initialization)- Mini-batch k-means (scalable)

3. Naive Bayes Classifier

A probabilistic model using Bayes’ theorem under independence assumptions.

\[ P(y|x) \propto P(y) \prod_{i=1}^n P(x_i | y) \]

Algorithm

Compute prior ( P(y) )
Compute likelihood ( P$x_i | y$ )
Predict class with max posterior

Types

Multinomial NB - text (bag of words)- Gaussian NB - continuous features- Bernoulli NB - binary features #### Example (Spam Detection)

P(spam | "win money") ∝ P(spam) * P("win"|spam) * P("money"|spam)

Pros

Fast, works well for text- Needs little data- Probabilistic interpretation #### Cons
Assumes feature independence- Poor for correlated features

4. Support Vector Machines (SVM)

SVM finds the max-margin hyperplane separating classes.

Objective

Maximize margin = distance between boundary and nearest points.

\[ \min_{w, b} \frac{1}{2} |w|^2 \quad \text{s.t.} \quad y_i(w \cdot x_i + b) \ge 1 \]

Can be solved via Quadratic Programming.

Intuition

Each data point → vector- Hyperplane: $w \cdot x + b = 0$- Support vectors = boundary points #### Kernel Trick

Transform input via kernel ( K$x_i, x_j$ = $x_i$ $x_j$ ):

Linear: dot product- Polynomial: ( $x_i \cdot x_j + c$^d )- RBF: $e^{-\gamma |x_i - x_j|^2}$ #### Pros
Effective in high dimensions- Can model nonlinear boundaries- Few hyperparameters #### Cons
Slow on large data- Harder to tune kernel parameters

5. Decision Trees

If-else structure for classification/regression.

At each node:

Pick feature ( f ) and threshold ( t )- Split to maximize information gain #### Metrics
Entropy: $H = -\sum p_i \log p_i$- Gini: $G = 1 - \sum p_i^2$ #### Pseudocode

if (feature < threshold)
    go left;
else
    go right;

Build recursively until:

Max depth- Min samples per leaf- Pure nodes #### Pros
Interpretable- Handles mixed data- No scaling needed #### Cons
Prone to overfitting- Unstable (small data changes) Fixes:
Pruning (reduce depth)- Ensembles: Random Forests, Gradient Boosting

6. Bias-Variance Tradeoff

Algorithm	Bias	Variance
k-means	High	Low
Naive Bayes	High	Low
SVM	Low	Medium
Decision Tree	Low	High

Balancing both = good generalization.

7. Evaluation Metrics

For classification:

Accuracy, Precision, Recall, F1-score- ROC-AUC, Confusion Matrix For clustering:
Inertia, Silhouette Score Always use train/test split or cross-validation.

8. Scaling to Large Data

Techniques:

Mini-batch training- Online updates (SGD)- Dimensionality reduction (PCA)- Approximation (Random Projections) Libraries:
scikit-learn (Python)- liblinear, libsvm (C/C++)- MLlib (Spark)

9. When to Use What

Task	Recommended Algorithm
Text classification	Naive Bayes
Clustering	k-means
Nonlinear classification	SVM (RBF)
Tabular data	Decision Tree
Quick baseline	Logistic Regression / NB

10. Why It Matters

These algorithms are fast, interpretable, and theoretical foundations of modern ML. They remain the go-to choice for:

Small to medium datasets- Real-time classification- Explainable AI > “Classical ML is the art of solving problems with math you can still write on a whiteboard.”

Try It Yourself

Cluster 2D points with k-means, plot centroids.
Train Naive Bayes on a spam/ham dataset.
Classify linearly separable data with SVM.
Build a Decision Tree from scratch (entropy, Gini).
Compare models’ accuracy and interpretability.

92. Ensemble Methods (Bagging, Boosting, Random Forests)

Ensemble methods combine multiple weak learners to build a strong predictor. Instead of relying on one model, ensembles vote, average, or boost multiple models, improving stability and accuracy.

They are the bridge between classical and modern ML , simple models, combined smartly, become powerful.

1. The Core Idea

“Many weak learners, when combined, can outperform a single strong one.”

Mathematically, if $f_1, f_2, \ldots, f_k$ are weak learners, an ensemble predictor is:

\[ F(x) = \frac{1}{k}\sum_{i=1}^k f_i(x) \]

For classification, combine via majority vote. For regression, combine via average.

2. Bagging (Bootstrap Aggregating)

Bagging reduces variance by training models on different samples.

Steps

Draw ( B ) bootstrap samples from dataset ( D ).
Train one model per sample.
Aggregate predictions by averaging or voting.

\[ \hat{f}*{bag}(x) = \frac{1}{B} \sum*{b=1}^B f_b(x) \]

Each $f_b$ is trained on a random subset (with replacement).

Example

Base learner: Decision Tree- Ensemble: Bagged Trees- Famous instance: Random Forest #### Tiny Code (C-style Pseudocode)

for (int b = 0; b < B; b++) {
    D_b = bootstrap_sample(D);
    model[b] = train_tree(D_b);
}
prediction = average_predictions(model, x);

Pros

Reduces variance- Works well with high-variance learners- Parallelizable #### Cons
Increases computation- Doesn’t reduce bias

3. Random Forest

A bagging-based ensemble of decision trees with feature randomness.

Key Ideas

Each tree trained on bootstrap sample.- At each split, consider random subset of features.- Final prediction = majority vote or average. This decorrelates trees, improving generalization.

\[ F(x) = \frac{1}{B} \sum_{b=1}^{B} T_b(x) \]

Pros

Handles large feature sets- Low overfitting- Good default for tabular data #### Cons
Less interpretable- Slower on huge datasets OOB (Out-of-Bag) error = internal validation from unused samples.

4. Boosting

Boosting focuses on reducing bias by sequentially training models , each one corrects errors from the previous.

Steps

Start with weak learner ( f_1(x) )
Train next learner ( f_2(x) ) on residuals/errors
Combine with weighted sum

\[ F_m(x) = F_{m-1}(x) + \alpha_m f_m(x) \]

Weights $\alpha_m$ focus on difficult examples.

Tiny Code (Conceptual)

F = 0;
for (int m = 0; m < M; m++) {
    residual = y - predict(F, x);
    f_m = train_weak_learner(x, residual);
    F += alpha[m] * f_m;
}

5. AdaBoost (Adaptive Boosting)

AdaBoost adapts weights on samples after each iteration.

Algorithm

Initialize weights: $w_i = \frac{1}{n}$
Train weak classifier $f_t$
Compute error: $\epsilon_t$
Update weights: \[ w_i \leftarrow w_i \cdot e^{\alpha_t \cdot I(y_i \ne f_t(x_i))} \] where $\alpha_t = \frac{1}{2} \ln\left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$
Normalize weights

Final classifier: \[ F(x) = \text{sign}\left( \sum_t \alpha_t f_t(x) \right) \]

Pros

High accuracy on clean data- Simple and interpretable weights #### Cons
Sensitive to outliers- Sequential → not easily parallelizable

6. Gradient Boosting

A modern version of boosting using gradient descent on loss.

At each step, fit new model to negative gradient of loss function.

Objective

\[ F_m(x) = F_{m-1}(x) + \gamma_m h_m(x) \]

where $h_m(x) \approx -\frac{\partial L(y, F(x))}{\partial F(x)}$

Common Libraries

XGBoost
LightGBM
CatBoost #### Pros
High performance on tabular data- Flexible (custom loss)- Handles mixed feature types #### Cons
Slower to train- Sensitive to hyperparameters

7. Stacking (Stacked Generalization)

Combine multiple models (base learners) via a meta-model.

Steps

Train base models (SVM, Tree, NB, etc.)
Collect their predictions
Train meta-model (e.g. Logistic Regression) on outputs

\[ \hat{y} = f_{meta}(f_1(x), f_2(x), \ldots, f_k(x)) \]

8. Bagging vs Boosting

Feature	Bagging	Boosting
Strategy	Parallel	Sequential
Goal	Reduce variance	Reduce bias
Weighting	Uniform	Adaptive
Example	Random Forest	AdaBoost, XGBoost

9. Bias-Variance Behavior

Bagging: ↓ variance- Boosting: ↓ bias- Random Forest: balanced- Stacking: flexible but complex

10. Why It Matters

Ensemble methods are the workhorses of classical ML competitions and real-world tabular problems. They blend interpretability, flexibility, and predictive power.

“One tree may fall, but a forest stands strong.”

Try It Yourself

Train a Random Forest on the Iris dataset.
Implement AdaBoost from scratch using decision stumps.
Compare Bagging vs Boosting accuracy.
Try XGBoost with different learning rates.
Visualize feature importance across models.

93. Gradient Methods (SGD, Adam, RMSProp)

Gradient-based optimization is the heartbeat of machine learning. These methods update parameters iteratively by following the negative gradient of the loss function. They power everything from linear regression to deep neural networks.

1. The Core Idea

We want to minimize a loss function ( L$\theta$ ). Starting from some initial parameters $\theta_0$, we move in the opposite direction of the gradient:

\[ \theta_{t+1} = \theta_t - \eta \cdot \nabla_\theta L(\theta_t) \]

where $\eta$ is the learning rate (step size).

The gradient tells us which way the function increases fastest , we move the other way.

2. Batch Gradient Descent

Uses the entire dataset to compute the gradient.

\[ \nabla_\theta L(\theta) = \frac{1}{N} \sum_{i=1}^N \nabla_\theta \ell_i(\theta) \]

Accurate but slow for large ( N )- Each update is expensive Tiny Code

for (int t = 0; t < T; t++) {
    grad = compute_full_gradient(data, theta);
    theta = theta - eta * grad;
}

Good for: small datasets or convex problems

3. Stochastic Gradient Descent (SGD)

Instead of full data, use one random sample per step.

\[ \theta_{t+1} = \theta_t - \eta \cdot \nabla_\theta \ell_i(\theta_t) \]

Noisy but faster updates- Can escape local minima- Great for online learning Tiny Code

for each sample (x_i, y_i):
    grad = grad_loss(theta, x_i, y_i);
    theta -= eta * grad;

Pros

Fast convergence- Works on large datasets Cons
Noisy updates- Requires learning rate tuning

4. Mini-Batch Gradient Descent

Compromise between batch and stochastic.

Use small subset (mini-batch) of samples:

\[ \theta_{t+1} = \theta_t - \eta \cdot \frac{1}{m} \sum_{i=1}^m \nabla_\theta \ell_i(\theta_t) \]

Usually batch size = 32 or 64. Faster, more stable updates.

5. Momentum

Adds velocity to smooth oscillations.

\[ v_t = \beta v_{t-1} + (1 - \beta) \nabla_\theta L(\theta_t) \]

\[ \theta_{t+1} = \theta_t - \eta v_t \]

This accumulates past gradients to speed movement in consistent directions.

Think of it like a heavy ball rolling down a hill.

6. Nesterov Accelerated Gradient (NAG)

Improves momentum by looking ahead:

\[ v_t = \beta v_{t-1} + \eta \nabla_\theta L(\theta_t - \beta v_{t-1}) \]

It anticipates the future position before computing the gradient.

Faster convergence in convex settings.

7. RMSProp

Adjusts learning rate per parameter using exponential average of squared gradients:

\[ E[g^2]*t = \rho E[g^2]*{t-1} + (1 - \rho) g_t^2 \]

\[ \theta_{t+1} = \theta_t - \frac{\eta}{\sqrt{E[g^2]_t + \epsilon}} g_t \]

This helps when gradients vary in magnitude.

Good for: non-stationary objectives, deep networks

8. Adam (Adaptive Moment Estimation)

Combines momentum + RMSProp:

\[ m_t = \beta_1 m_{t-1} + (1 - \beta_1) g_t \]

\[ v_t = \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 \]

Bias-corrected estimates:

\[ \hat{m}_t = \frac{m_t}{1 - \beta_1^t}, \quad \hat{v}_t = \frac{v_t}{1 - \beta_2^t} \]

Update rule:

\[ \theta_{t+1} = \theta_t - \frac{\eta \cdot \hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \]

Tiny Code (Conceptual)

m = 0; v = 0;
for (int t = 1; t <= T; t++) {
    g = grad(theta);
    m = beta1 * m + (1 - beta1) * g;
    v = beta2 * v + (1 - beta2) * g * g;
    m_hat = m / (1 - pow(beta1, t));
    v_hat = v / (1 - pow(beta2, t));
    theta -= eta * m_hat / (sqrt(v_hat) + eps);
}

Pros

Works well out of the box- Adapts learning rate- Great for deep learning Cons
May not converge exactly- Needs decay schedule for stability

9. Learning Rate Schedules

Control $\eta$ over time:

Step decay: $\eta_t = \eta_0 \cdot \gamma^{\lfloor t/s \rfloor}$- Exponential decay: $\eta_t = \eta_0 e^{-\lambda t}$- Cosine annealing: smooth periodic decay- Warm restarts: reset learning rate periodically

10. Why It Matters

All modern deep learning is built on gradients. Choosing the right optimizer can mean faster training and better accuracy.

Optimizer	Adaptive	Momentum	Common Use
SGD	No	Optional	Simple tasks
SGD + Momentum	No	Yes	ConvNets
RMSProp	Yes	No	RNNs
Adam	Yes	Yes	Transformers, DNNs

“Optimization is the art of taking small steps in the right direction , many times over.”

Try It Yourself

Implement SGD and Adam on a linear regression task.
Compare training curves for SGD, Momentum, RMSProp, and Adam.
Experiment with learning rate schedules.
Visualize optimization paths on a 2D contour plot.

94. Deep Learning (Backpropagation, Dropout, Normalization)

Deep learning is about stacking layers of computation so that the network can learn hierarchical representations. From raw pixels to abstract features, deep nets build meaning through composition of functions.

At the core of this process are three ideas: backpropagation, regularization (dropout), and normalization.

1. The Essence of Deep Learning

A neural network is a chain of functions:

\[ f(x; \theta) = f_L(f_{L-1}(\cdots f_1(x))) \]

Each layer transforms its input and passes it on.

Training involves finding parameters $\theta$ that minimize a loss ( L(f$x; \theta$, y) ).

2. Backpropagation

Backpropagation is the algorithm that teaches neural networks.

It uses the chain rule of calculus to efficiently compute gradients layer by layer.

For each layer ( i ):

\[ \frac{\partial L}{\partial \theta_i} = \frac{\partial L}{\partial a_i} \cdot \frac{\partial a_i}{\partial \theta_i} \]

and propagate backward:

\[ \frac{\partial L}{\partial a_{i-1}} = \frac{\partial L}{\partial a_i} \cdot \frac{\partial a_i}{\partial a_{i-1}} \]

So every neuron learns how much it contributed to the error.

Tiny Code

// Pseudocode for 2-layer network
forward:
    z1 = W1*x + b1;
    a1 = relu(z1);
    z2 = W2*a1 + b2;
    y_hat = softmax(z2);
    loss = cross_entropy(y_hat, y);

backward:
    dz2 = y_hat - y;
    dW2 = dz2 * a1.T;
    db2 = sum(dz2);
    da1 = W2.T * dz2;
    dz1 = da1 * relu_grad(z1);
    dW1 = dz1 * x.T;
    db1 = sum(dz1);

Each gradient is computed by local differentiation and multiplied back.

3. Activation Functions

Nonlinear activations let networks approximate nonlinear functions.

Function	Formula	Use
ReLU	$\max(0, x)$	Default, fast
Sigmoid	$\frac{1}{1 + e^{-x}}$	Probabilities
Tanh	$\tanh(x)$	Centered activations
GELU	$x \, \Phi(x)$	Modern transformers

Without nonlinearity, stacking layers is just one big linear transformation.

4. Dropout

Dropout is a regularization technique that prevents overfitting. During training, randomly turn off neurons:

\[ \tilde{a}_i = a_i \cdot m_i, \quad m_i \sim \text{Bernoulli}(p) \]

At inference, scale activations by ( p ) (keep probability).

It forces the network to not rely on any single path.

Tiny Code

for (int i = 0; i < n; i++) {
    if (rand_uniform() < p) a[i] = 0;
    else a[i] /= p; // scaling
}

5. Normalization

Normalization stabilizes and speeds up training by reducing internal covariate shift.

Batch Normalization

Normalize activations per batch:

\[ \hat{x} = \frac{x - \mu_B}{\sqrt{\sigma_B^2 + \epsilon}} \]

\[ y = \gamma \hat{x} + \beta \]

Learnable parameters $\gamma, \beta$ restore flexibility.

Benefits:

Smooth gradients- Allows higher learning rates- Acts as regularizer #### Layer Normalization

Used in transformers (normalizes across features, not batch).

6. Initialization

Proper initialization helps gradients flow.

Scheme	Formula	Use
Xavier	$\text{Var}(W) = \frac{1}{n_{in}}$	Tanh
He	$\text{Var}(W) = \frac{2}{n_{in}}$	ReLU

Poor initialization can lead to vanishing or exploding gradients.

7. Training Pipeline

Initialize weights
Forward pass
Compute loss
Backward pass (backprop)
Update weights (e.g. with Adam)

Repeat until convergence.

8. Deep Architectures

Model	Key Idea	Typical Use
MLP	Fully connected	Tabular data
CNN	Convolutions	Images
RNN	Sequential recurrence	Time series, text
Transformer	Self-attention	Language, vision

Each architecture stacks linear operations and nonlinearities in different ways.

9. Overfitting and Regularization

Common fixes:

Dropout- Weight decay ($L_2$ regularization)- Data augmentation- Early stopping The key is to improve generalization, not just minimize training loss.

10. Why It Matters

Backpropagation turned neural networks from theory to practice. Normalization made them train faster. Dropout made them generalize better.

Together, they unlocked the deep learning revolution.

“Depth gives power, but gradients give life.”

Try It Yourself

Implement a 2-layer network with ReLU and softmax.
Add dropout and batch normalization.
Visualize training with and without dropout.
Compare performance on MNIST with and without normalization.

95. Sequence Models (Viterbi, Beam Search, CTC)

Sequence models process data where order matters, text, speech, DNA, time series. They capture dependencies across positions, predicting the next step from context.

This section explores three fundamental tools: Viterbi, Beam Search, and CTC (Connectionist Temporal Classification).

1. The Nature of Sequential Data

Sequential data has temporal or structural order. Each element $x_t$ depends on past inputs $x_{1:t-1}$.

Common sequence tasks:

Tagging (POS tagging, named entity recognition)- Transcription (speech → text)- Decoding (translation, path reconstruction) To handle such problems, we need models that remember.

2. Hidden Markov Models (HMMs)

A Hidden Markov Model assumes:

A sequence of hidden states $z_1, z_2, \ldots, z_T$- Each state emits an observation $x_t$- Transition and emission probabilities govern the process \[ P(z_t | z_{t-1}) = A_{z_{t-1}, z_t}, \quad P(x_t | z_t) = B_{z_t}(x_t) \]

Goal: find the most likely sequence of hidden states given observations.

3. The Viterbi Algorithm

Viterbi is a dynamic programming algorithm to decode the most probable path:

\[ \delta_t(i) = \max_{z_{1:t-1}} P(z_{1:t-1}, z_t = i, x_{1:t}) \]

Recurrence:

\[ \delta_t(i) = \max_j \big( \delta_{t-1}(j) \cdot A_{j,i} \big) \cdot B_i(x_t) \]

Track backpointers to reconstruct the best sequence.

Time complexity: $O(T \cdot N^2)$,
where $N$ = number of states, $T$ = sequence length.

Tiny Code

for (t = 1; t < T; t++) {
    for (i = 0; i < N; i++) {
        double best = -INF;
        int argmax = -1;
        for (j = 0; j < N; j++) {
            double score = delta[t-1][j] * A[j][i];
            if (score > best) { best = score; argmax = j; }
        }
        delta[t][i] = best * B[i][x[t]];
        backptr[t][i] = argmax;
    }
}

Use backptr to trace back the optimal path.

4. Beam Search

For many sequence models (e.g. neural machine translation), exhaustive search is impossible. Beam search keeps only the top-k best hypotheses at each step.

Algorithm:

Start with an empty sequence and score 0
At each step, expand each candidate with all possible next tokens
Keep only k best sequences (beam size)
Stop when all sequences end or reach max length

Beam size controls trade-off:

Larger beam → better accuracy, slower- Smaller beam → faster, riskier

Tiny Code

for (step = 0; step < max_len; step++) {
    vector<Candidate> new_beam;
    for (c in beam) {
        probs = model_next(c.seq);
        for (token, p in probs)
            new_beam.push({c.seq + token, c.score + log(p)});
    }
    beam = top_k(new_beam, k);
}

Use log probabilities to avoid underflow.

5. Connectionist Temporal Classification (CTC)

Used in speech recognition and handwriting recognition where input and output lengths differ.

CTC learns to align input frames with output symbols without explicit alignment.

Add a special blank symbol (∅) to allow flexible alignment.

Example (CTC decoding):

Frame	Output	After Collapse
A ∅ A A	A ∅ A	A A
H ∅ ∅ H	H H	H

Loss: \[ P(y | x) = \sum_{\pi \in \text{Align}(x, y)} P(\pi | x) \] where $\pi$ are all alignments that reduce to ( y ).

CTC uses dynamic programming to compute forward-backward probabilities.

6. Comparing Methods

Method	Used In	Key Idea	Handles Alignment?
Viterbi	HMMs	Most probable state path	Yes
Beam Search	Neural decoders	Approximate search	Implicit
CTC	Speech / seq2seq	Sum over alignments	Yes

7. Use Cases

Viterbi: POS tagging, speech decoding- Beam Search: translation, text generation- CTC: ASR, OCR, gesture recognition

8. Implementation Tips

Use log-space for probabilities- In beam search, apply length normalization- In CTC, use dynamic programming tables- Combine CTC + beam search for speech decoding

9. Common Pitfalls

Viterbi assumes Markov property (limited memory)- Beam Search can miss global optimum- CTC can confuse repeated characters without blanks

10. Why It Matters

Sequence models are the bridge between structure and time. They show how to decode hidden meaning in ordered data.

From decoding Morse code to transcribing speech, these algorithms give machines the gift of sequence understanding.

Try It Yourself

Implement Viterbi for a 3-state HMM.
Compare greedy decoding vs beam search on a toy language model.
Build a CTC loss table for a short sequence (like “HELLO”).

96. Metaheuristics (GA, SA, PSO, ACO)

Metaheuristics are general-purpose optimization strategies that search through vast, complex spaces when exact methods are too slow or infeasible. They don’t guarantee the perfect answer but often find good-enough solutions fast.

This section covers four classics:

GA (Genetic Algorithm)- SA (Simulated Annealing)- PSO (Particle Swarm Optimization)- ACO (Ant Colony Optimization)

1. The Metaheuristic Philosophy

Metaheuristics draw inspiration from nature and physics. They combine exploration (searching widely) and exploitation (refining promising spots).

They’re ideal for:

NP-hard problems (TSP, scheduling)- Continuous optimization (parameter tuning)- Black-box functions (no gradients) They trade mathematical guarantees for practical power.

2. Genetic Algorithm (GA)

Inspired by natural selection, GAs evolve a population of solutions.

Core Steps

Initialize population randomly
Evaluate fitness of each
Select parents
Crossover to produce offspring
Mutate to add variation
Replace worst with new candidates

Repeat until convergence.

Tiny Code

for (gen = 0; gen < max_gen; gen++) {
    evaluate(pop);
    parents = select_best(pop);
    offspring = crossover(parents);
    mutate(offspring);
    pop = select_survivors(pop, offspring);
}

Operators

Selection: tournament, roulette-wheel- Crossover: one-point, uniform- Mutation: bit-flip, Gaussian Strengths: global search, diverse exploration Weakness: may converge slowly

3. Simulated Annealing (SA)

Mimics cooling of metals, start hot (high randomness), slowly cool.

At each step:

Propose random neighbor
Accept if better
If worse, accept with probability \[ P = e^{-\frac{\Delta E}{T}} \]
Gradually lower ( T )

Tiny Code

T = T_init;
state = random_state();
while (T > T_min) {
    next = neighbor(state);
    dE = cost(next) - cost(state);
    if (dE < 0 || exp(-dE/T) > rand_uniform())
        state = next;
    T *= alpha; // cooling rate
}

Strengths: escapes local minima Weakness: sensitive to cooling schedule

4. Particle Swarm Optimization (PSO)

Inspired by bird flocking. Each particle adjusts velocity based on:

Its own best position- The global best found \[ v_i \leftarrow w v_i + c_1 r_1 (p_i - x_i) + c_2 r_2 (g - x_i) \]

\[ x_i \leftarrow x_i + v_i \]

Tiny Code

for each particle i:
    v[i] = w*v[i] + c1*r1*(pbest[i]-x[i]) + c2*r2*(gbest-x[i]);
    x[i] += v[i];
    update_best(i);

Strengths: continuous domains, easy Weakness: premature convergence

5. Ant Colony Optimization (ACO)

Inspired by ant foraging, ants deposit pheromones on paths. The stronger the trail, the more likely others follow.

Steps:

Initialize pheromone on all edges
Each ant builds a solution (prob. ∝ pheromone)
Evaluate paths
Evaporate pheromone
Reinforce good paths

\[ \tau_{ij} \leftarrow (1 - \rho)\tau_{ij} + \sum_k \Delta\tau_{ij}^k \]

Tiny Code

for each iteration:
    for each ant:
        path = build_solution(pheromone);
        score = evaluate(path);
    evaporate(pheromone);
    deposit(pheromone, best_paths);

Strengths: combinatorial problems (TSP) Weakness: parameter tuning, slower convergence

6. Comparing the Four

Method	Inspiration	Best For	Key Idea
GA	Evolution	Discrete search	Selection, crossover, mutation
SA	Thermodynamics	Local optima escape	Cooling + randomness
PSO	Swarm behavior	Continuous search	Local + global attraction
ACO	Ant foraging	Graph paths	Pheromone reinforcement

7. Design Patterns

Common metaheuristic pattern:

Represent solution- Define fitness / cost function- Define neighbor / mutation operators- Balance randomness and greediness Tuning parameters often matters more than equations.

8. Hybrid Metaheuristics

Combine strengths:

GA + SA: evolve population, fine-tune locally- PSO + DE: use swarm + differential evolution- ACO + Local Search: reinforce with hill-climbing These hybrids often outperform single methods.

9. Common Pitfalls

Poor representation → weak search- Over-exploitation → stuck in local optima- Bad parameters → chaotic or stagnant behavior Always visualize progress (fitness over time).

10. Why It Matters

Metaheuristics give us adaptive intelligence, searching without gradients, equations, or complete knowledge. They reflect nature’s way of solving complex puzzles: iterate, adapt, survive.

“Optimization is not about perfection. It’s about progress guided by curiosity.”

Try It Yourself

Implement Simulated Annealing for the Traveling Salesman Problem.
Create a Genetic Algorithm for knapsack optimization.
Tune PSO parameters to fit a function $f(x) = x^2 + 10\sin x$.
Compare ACO paths for TSP at different evaporation rates.

97. Reinforcement Learning (Q-learning, Policy Gradients)

Reinforcement Learning (RL) is about learning through interaction , an agent explores an environment, takes actions, and learns from rewards. Unlike supervised learning (where correct labels are given), RL learns what to do by trial and error.

This section introduces two core approaches:

Q-learning (value-based)- Policy Gradient (policy-based)

1. The Reinforcement Learning Setting

An RL problem is modeled as a Markov Decision Process (MDP):

States $S$
Actions $A$
Transition $P(s' \mid s, a)$
Reward $R(s, a)$
Discount factor $\gamma$

The agent’s goal is to find a policy $\pi(a \mid s)$ that maximizes expected return:

\[ G_t = \sum_{k=0}^\infty \gamma^k R_{t+k+1} \]

2. Value Functions

The value function measures how good a state (or state-action pair) is.

State-value: \[ V^\pi(s) = \mathbb{E}_\pi[G_t | S_t = s] \]
Action-value (Q-function): \[ Q^\pi(s, a) = \mathbb{E}_\pi[G_t | S_t = s, A_t = a] \]

3. Bellman Equation

The Bellman equation relates a state’s value to its neighbors:

\[ Q^*(s,a) = R(s,a) + \gamma \max_{a'} Q^*(s',a') \]

This recursive definition drives value iteration and Q-learning.

4. Q-Learning

Q-learning learns the optimal action-value function off-policy (independent of behavior policy):

Update Rule: \[ Q(s,a) \leftarrow Q(s,a) + \alpha \big[ r + \gamma \max_{a'} Q(s',a') - Q(s,a) \big] \]

Tiny Code

Q[s][a] += alpha * (r + gamma * max(Q[s_next]) - Q[s][a]);
s = s_next;

Repeat while exploring (e.g., $\varepsilon$-greedy):

With probability $\varepsilon$, choose a random action
With probability $1 - \varepsilon$, choose the best action

Over time, $Q$ converges to $Q^*$.

5. Exploration vs Exploitation

RL is a balancing act:

Exploration: try new actions to gather knowledge- Exploitation: use current best knowledge to maximize reward Strategies:
ε-greedy- Softmax action selection- Upper Confidence Bound (UCB)

6. Policy Gradient Methods

Instead of learning Q-values, learn the policy directly. Represent policy with parameters $\theta$:

\[ \pi_\theta(a|s) = P(a | s; \theta) \]

Goal: maximize expected return \[ J(\theta) = \mathbb{E}*{\pi*\theta}[G_t] \]

Gradient ascent update: \[ \theta \leftarrow \theta + \alpha \nabla_\theta J(\theta) \]

REINFORCE Algorithm: \[ \nabla_\theta J(\theta) = \mathbb{E}\big[ G_t \nabla_\theta \log \pi_\theta(a_t|s_t) \big] \]

Tiny Code

theta += alpha * G_t * grad_logpi(a_t, s_t);

7. Actor-Critic Architecture

Combines policy gradient (actor) + value estimation (critic).

Actor: updates policy- Critic: estimates value (baseline) Update: \[ \theta \leftarrow \theta + \alpha_\theta \delta_t \nabla_\theta \log \pi_\theta(a_t|s_t) \]

\[ w \leftarrow w + \alpha_w \delta_t \nabla_w V_w(s_t) \]

with TD error: \[ \delta_t = r + \gamma V(s') - V(s) \]

8. Comparing Methods

Method	Type	Learns	On/Off Policy	Continuous?
Q-learning	Value-based	Q(s, a)	Off-policy	No
Policy Gradient	Policy-based	π(a	s)	On-policy	Yes
Actor-Critic	Hybrid	Both	On-policy	Yes

9. Extensions

Deep Q-Networks (DQN): use neural nets for Q(s, a)- PPO / A3C: advanced actor-critic methods- TD(λ): tradeoff between MC and TD learning- Double Q-learning: reduce overestimation- Entropy regularization: encourage exploration

10. Why It Matters

Reinforcement learning powers autonomous agents, game AIs, and control systems. It’s the foundation of AlphaGo, robotics control, and adaptive decision systems.

“An agent learns not from instruction but from experience.”

Try It Yourself

Implement Q-learning for a grid-world maze.
Add ε-greedy exploration.
Visualize the learned policy.
Try REINFORCE with a simple policy (e.g. softmax over actions).
Compare convergence of Q-learning vs Policy Gradient.

98. Approximation and Online Algorithms

In the real world, we often can’t wait for a perfect solution , data arrives on the fly, or the problem is too hard to solve exactly. That’s where approximation and online algorithms shine. They aim for good-enough results, fast and adaptively, under uncertainty.

1. The Big Picture

Approximation algorithms: Solve NP-hard problems with provable bounds.- Online algorithms: Make immediate decisions without knowing the future. Both trade optimality for efficiency or adaptability.

2. Approximation Algorithms

An approximation algorithm finds a solution within a factor $\rho$ of the optimal.

If ( C ) is cost of the algorithm, and $C^*$ is optimal cost:

\[ \rho = \max\left(\frac{C}{C^*}, \frac{C^*}{C}\right) \]

Example: $\rho = 2$ → solution at most twice worse than optimal.

3. Example: Vertex Cover

Problem: Given graph ( G(V,E) ), choose smallest set of vertices covering all edges.

Algorithm (2-approximation):

Initialize cover = ∅
While edges remain:
- Pick any edge (u, v) - Add both u, v to cover - Remove all edges incident on u or v Guarantee: At most 2× optimal size.

Tiny Code

cover = {};
while (!edges.empty()) {
    (u, v) = edges.pop();
    cover.add(u);
    cover.add(v);
    remove_incident_edges(u, v);
}

4. Example: Metric TSP (Triangle Inequality)

Algorithm (Christofides):

Find MST
Find odd-degree vertices
Find min perfect matching
Combine + shortcut to get tour

Guarantee: ≤ 1.5 × optimal.

5. Greedy Approximation: Set Cover

Goal: Cover universe ( U ) with minimum sets $S_i$.

Greedy Algorithm: Pick set covering most uncovered elements each time. Guarantee: $H_n \approx \ln n$ factor approximation.

6. Online Algorithms

Online algorithms must decide now, before future input is known.

Goal: Minimize competitive ratio:

\[ \text{CR} = \max_{\text{input}} \frac{\text{Cost}*{\text{online}}}{\text{Cost}*{\text{optimal offline}}} \]

Lower CR → better adaptability.

7. Classic Example: Online Paging

You have k pages in cache, sequence of page requests.

If page in cache → hit- Else → miss, must evict one page Strategies:
LRU (Least Recently Used): evict oldest- FIFO: evict first loaded- Random: pick randomly Competitive Ratio:
LRU: ≤ ( k )- Random: ≤ ( 2k-1 )

Tiny Code

cache = LRUCache(k);
for (page in requests) {
    if (!cache.contains(page))
        cache.evict_oldest();
    cache.add(page);
}

8. Online Bipartite Matching (Karp-Vazirani-Vazirani)

Given offline set U and online set V (arrives one by one), match greedily. Competitive ratio: $1 - \frac{1}{e}$

Used in ad allocation and resource assignment.

9. Approximation + Online Together

Modern algorithms blend both:

Streaming algorithms: One pass, small memory (Count-Min, reservoir sampling)- Online learning: Update models incrementally (SGD, perceptron)- Approximate dynamic programming: RL and heuristic search These are approximate online solvers , both quick and adaptive.

10. Why It Matters

Approximation algorithms give us provable near-optimal answers. Online algorithms give us real-time adaptivity. Together, they model intelligence under limits , when time and information are scarce.

“Sometimes, good and on time beats perfect and late.”

Try It Yourself

Implement 2-approx vertex cover on a small graph.
Simulate online paging with LRU vs Random.
Build a greedy set cover solver.
Measure competitive ratio on test sequences.
Combine ideas: streaming + approximation for big data filtering.

99. Fairness, Causal Inference, and Robust Optimization

As algorithms increasingly shape decisions , from hiring to lending to healthcare , we must ensure they’re fair, causally sound, and robust to uncertainty. This section blends ideas from ethics, statistics, and optimization to make algorithms not just efficient, but responsible and reliable.

1. Why Fairness Matters

Machine learning systems often inherit biases from data. Without intervention, they can amplify inequality or discrimination.

Fairness-aware algorithms explicitly measure and correct these effects.

Common sources of bias:

Historical bias (biased data)- Measurement bias (imprecise features)- Selection bias (skewed samples) The goal: equitable treatment across sensitive groups (gender, race, region, etc.)

2. Formal Fairness Criteria

Several fairness notions exist, often conflicting:

Criterion	Description	Example
Demographic Parity	( P$\hat{Y}=1 \| A=a$ = P$\hat{Y}=1 \| A=b$ )	Equal positive rate
Equal Opportunity	Equal true positive rates	Same recall for all groups
Equalized Odds	Equal TPR & FPR	Balanced errors
Calibration	Same predicted probability meaning	If model says 70%, all groups should achieve 70%

No single measure fits all , fairness depends on context and trade-offs.

3. Algorithmic Fairness Techniques

Pre-processing Rebalance or reweight data before training. Example: reweighing, sampling.
In-processing Add fairness constraints to loss function. Example: adversarial debiasing.
Post-processing Adjust predictions after training. Example: threshold shifting.

Tiny Code (Adversarial Debiasing Skeleton)

for x, a, y in data:
    y_pred = model(x)
    loss_main = loss_fn(y_pred, y)
    loss_adv = adv_fn(y_pred, a)
    loss_total = loss_main - λ * loss_adv
    update(loss_total)

Here, the adversary tries to predict sensitive attribute, encouraging invariance.

4. Causal Inference Basics

Correlation ≠ causation. To reason about fairness and robustness, we need causal understanding , what would happen if we changed something.

Causal inference models relationships via Directed Acyclic Graphs (DAGs):

Nodes: variables- Edges: causal influence

5. Counterfactual Reasoning

A counterfactual asks:

“What would the outcome be if we intervened differently?”

Formally: \[ P(Y_{do(X=x)}) \]

Used in:

Fairness (counterfactual fairness)- Policy evaluation- Robust decision making

6. Counterfactual Fairness

An algorithm is counterfactually fair if prediction stays the same under hypothetical changes to sensitive attributes.

\[ \hat{Y}*{A \leftarrow a}(U) = \hat{Y}*{A \leftarrow a'}(U) \]

This requires causal models , not just data.

7. Robust Optimization

In uncertain environments, we want solutions that hold up under worst-case conditions.

Formulation: \[ \min_x \max_{\xi \in \Xi} f(x, \xi) \]

where $\Xi$ is the uncertainty set.

Example: Design a portfolio that performs well under varying market conditions.

Tiny Code

double robust_objective(double x[], Scenario Xi[], int N) {
    double worst = -INF;
    for (i=0; i<N; i++)
        worst = max(worst, f(x, Xi[i]));
    return worst;
}

This searches for a solution minimizing worst-case loss.

8. Distributional Robustness

Instead of worst-case instances, protect against worst-case distributions:

\[ \min_\theta \sup_{Q \in \mathcal{B}(P)} \mathbb{E}_{x \sim Q}[L(\theta, x)] \]

Used in adversarial training and domain adaptation.

Example: Add noise or perturbations to improve resilience:

x_adv = x + ε * sign(grad(loss, x))

9. Balancing Fairness, Causality, and Robustness

Goal	Method	Challenge
Fairness	Parity, Adversarial, Counterfactual	Competing definitions
Causality	DAGs, do-calculus, SCMs	Identifying true structure
Robustness	Min-max, DRO, Adversarial Training	Trade-off with accuracy

Real-world design involves balancing trade-offs.

Sometimes improving fairness reduces accuracy, or robustness increases conservatism.

10. Why It Matters

Algorithms don’t exist in isolation , they affect people. Embedding fairness, causality, and robustness ensures systems are trustworthy, interpretable, and just.

“The goal is not just intelligent algorithms , but responsible ones.”

Try It Yourself

Train a simple classifier on biased data.
Apply reweighing or adversarial debiasing.
Draw a causal DAG of your data features.
Compute counterfactual fairness for a sample.
Implement a robust loss using adversarial perturbations.

100. AI Planning, Search, and Learning Systems

AI systems are not just pattern recognizers , they are decision makers. They plan, search, and learn in structured environments, choosing actions that lead to long-term goals. This section explores how modern AI combines planning, search, and learning to solve complex tasks.

1. What Is AI Planning?

AI planning is about finding a sequence of actions that transforms an initial state into a goal state.

Formally, a planning problem consists of:

States ( S )- Actions ( A )- Transition function ( T(s, a) s’ )- Goal condition $G \subseteq S$- Cost function ( c(a) ) The objective: Find a plan $\pi = [a_1, a_2, \ldots, a_n]$ minimizing total cost or maximizing reward.

2. Search-Based Planning

At the heart of planning lies search. Search explores possible action sequences, guided by heuristics.

Algorithm	Type	Description
DFS	Uninformed	Deep exploration, no guarantee
BFS	Uninformed	Finds shortest path
Dijkstra	Weighted	Optimal if costs ≥ 0
A*	Heuristic	Combines cost + heuristic

A* Search Formula: \[ f(n) = g(n) + h(n) \] where:

( g(n) ): cost so far- ( h(n) ): heuristic estimate to goal If ( h ) is admissible, A* is optimal.

Tiny Code (A* Skeleton)

priority_queue<Node> open;
g[start] = 0;
open.push({start, h(start)});

while (!open.empty()) {
    n = open.pop_min();
    if (goal(n)) break;
    for (a in actions(n)) {
        s = step(n, a);
        cost = g[n] + c(n, a);
        if (cost < g[s]) {
            g[s] = cost;
            f[s] = g[s] + h(s);
            open.push({s, f[s]});
        }
    }
}

3. Heuristics and Admissibility

A heuristic ( h(s) ) estimates distance to the goal.

Admissible: never overestimates- Consistent: satisfies triangle inequality Examples:
Manhattan distance (grids)- Euclidean distance (geometry)- Pattern databases (puzzles) Good heuristics = faster convergence.

4. Classical Planning (STRIPS)

In symbolic AI, states are represented by facts (predicates), and actions have preconditions and effects.

Example:

Action: Move(x, y)
Precondition: At(x), Clear(y)
Effect: ¬At(x), At(y)

Search happens in logical state space.

Planners:

Forward search (progression)- Backward search (regression)- Heuristic planners (FF, HSP)

5. Hierarchical Planning

Break complex goals into subgoals.

HTN (Hierarchical Task Networks): Define high-level tasks broken into subtasks.

Example: “Make dinner” → [Cook rice, Stir-fry vegetables, Set table]

Hierarchy makes planning modular and interpretable.

6. Probabilistic Planning

When actions are uncertain:

MDPs: full observability, stochastic transitions- POMDPs: partial observability Use value iteration, policy iteration, or Monte Carlo planning.

7. Learning to Plan

Combine learning with search:

Learned heuristics: neural networks approximate ( h(s) )- AlphaZero-style planning: learn value + policy, guide tree search- Imitation learning: mimic expert demonstrations This bridges classical AI and modern ML.

Tiny Code (Learning-Guided A*)

f = g + alpha * learned_heuristic(s)

Neural net learns ( h_(s) ) from solved examples.

8. Integrated Systems

Modern AI stacks combine:

Search (planning backbone)- Learning (policy, heuristic, model)- Simulation (data generation) Examples:
AlphaZero: self-play + MCTS + neural nets- MuZero: learns model + value + policy jointly- Large Language Agents: use reasoning + memory + search

9. Real-World Applications

Robotics: motion planning, pathfinding- Games: Go, Chess, strategy games- Logistics: route optimization- Autonomy: drones, vehicles, AI assistants- Synthesis: program and query generation Each blends symbolic reasoning and statistical learning.

10. Why It Matters

Planning, search, and learning form the triad of intelligence:

Search explores possibilities- Planning sequences actions toward goals- Learning adapts heuristics from experience Together, they power systems that think, adapt, and act.

“Intelligence is not just knowing , it is choosing wisely under constraints.”

Try It Yourself

Implement A* search on a grid maze.
Add a Manhattan heuristic.
Extend to probabilistic transitions (simulate noise).
Build a simple planner with preconditions and effects.
Train a neural heuristic to guide search on puzzles.