Monte Carlo Methods — Bench

The Core Idea

Monte Carlo methods use random sampling to estimate quantities that are hard or impossible to compute exactly. Instead of solving analytically, you simulate many times and observe the outcome.

The name comes from the Monaco casino — a reference to randomness, coined by John von Neumann and Stanislaw Ulam in the 1940s while working on nuclear weapons at Los Alamos. The problems were too complex for direct calculation; random simulation was the only practical approach.

The key insight: many hard questions about a system can be answered by sampling from it. If you sample enough times, the empirical distribution approximates the true distribution. Statistics do the rest.

Estimating π with Random Points

The classic demonstration. A unit square contains a quarter-circle of radius 1:

Area of square: 1
Area of quarter-circle: π/4

Throw n random points into the square. Count how many land inside the quarter-circle (x² + y² ≤ 1). The fraction that land inside ≈ π/4.

π ≈ 4 × (points inside circle) / (total points)

With 100 points: rough estimate. With 1,000,000: accurate to ~3 decimal places.

This is inefficient as a way to compute π — but it illustrates the principle perfectly. The estimate converges to the truth as n → ∞, at rate 1/√n.

Monte Carlo Integration

Computing integrals that are too complex for closed-form solutions.

Simple Monte Carlo: to estimate ∫ₐᵇ f(x) dx:

Sample n points x₁, x₂, …, xₙ uniformly from [a, b]
Compute f(xᵢ) for each
Estimate: ∫ₐᵇ f(x) dx ≈ (b−a) × (1/n) Σ f(xᵢ)

This works because the integral is the expected value of f(X) when X is uniform on [a,b], times the interval length:

∫ₐᵇ f(x) dx = (b−a) × E[f(X)]

Why it’s powerful: the convergence rate is 1/√n regardless of dimension. For a 100-dimensional integral, traditional quadrature methods scale exponentially badly — Monte Carlo doesn’t. High-dimensional integration is where Monte Carlo wins.

Used in: physics simulations, financial option pricing (Black-Scholes variants for complex payoffs), Bayesian inference over complex posteriors.

Variance Reduction Techniques

Naive Monte Carlo can be slow to converge. Several techniques reduce variance (speed up convergence):

Importance sampling: instead of sampling uniformly, sample more from regions where f(x) is large. Weight the samples to correct for the non-uniform sampling.

∫ f(x) dx = ∫ [f(x)/g(x)] g(x) dx = E_g[f(X)/g(X)]

Sample from g(x) instead of uniform, multiply each sample by f(x)/g(x). If g is similar in shape to f, variance drops dramatically.

Stratified sampling: divide the domain into strata, sample a fixed number from each. Prevents accidental clustering.

Control variates: if you know the integral of a similar function exactly, use the difference as a correction term to reduce variance.

Monte Carlo Simulation

Beyond integration — simulating complex systems to understand behaviour.

Project risk analysis: run a project simulation thousands of times with randomly drawn task durations (from their probability distributions). The distribution of completion times tells you P(finish by deadline), expected delay, and tail risk.

Queueing models: simulate customer arrivals and service times to find expected wait times, queue lengths, and bottlenecks under different staffing scenarios.

Epidemic modelling: run SIR model with random contact rates and transmission probabilities many times. The distribution of outcomes shows the range of possible epidemic trajectories.

Financial VaR (Value at Risk): simulate portfolio returns under thousands of market scenarios. The 5th percentile of outcomes is the 95% VaR — the loss you’d expect to exceed only 5% of the time.

Markov Chain Monte Carlo (MCMC)

The most important application — sampling from probability distributions you can’t sample from directly.

The problem: you have a probability distribution π(x) you can evaluate (up to a normalising constant), but it’s too complex to sample from directly. This is ubiquitous in Bayesian inference — the posterior is known up to a constant, but sampling it is hard.

The solution: construct a Markov chain whose stationary distribution is π. Run the chain long enough to mix; the samples approximate π.

Metropolis-Hastings Algorithm

Start at some point x₀
Propose a new point x’ from a proposal distribution q(x’|x)

Compute acceptance ratio:

α = min(1, π(x')q(x|x') / π(x)q(x'|x))

Accept x’ with probability α; otherwise stay at x
Repeat

The chain visits states in proportion to π — it spends more time in high-probability regions. After a burn-in period (discarding early samples), the chain is effectively sampling from π.

Gibbs Sampling

A special case for multivariate distributions. Sample each variable in turn from its conditional distribution, given current values of all others.

For (x, y) ~ π(x, y):
x_new ~ π(x | y_current)
y_new ~ π(y | x_new)

If conditional distributions are easy to sample from (which they often are), Gibbs sampling is efficient even when the joint distribution is complex.

Used in: Bayesian hierarchical models, topic modelling (LDA), image segmentation, genetics.

Monte Carlo Tree Search (MCTS)

A decision-making algorithm for sequential choices — famously used in game AI.

Selection: traverse the tree using a policy that balances exploration and exploitation (UCB1: select child maximising Q + C√(ln N/n))
Expansion: add a new child node
Simulation: play out randomly from the new node to a terminal state
Backpropagation: update win counts back up the tree

Repeat many times. The most-visited child is the best move.

AlphaGo combined MCTS with neural networks — the networks guided which moves to explore, making the simulation far more efficient than pure random play. AlphaZero extended this to chess and shogi with no human knowledge except the rules.

The Law of Large Numbers and Convergence

Why Monte Carlo works:

Law of Large Numbers: as n → ∞, the sample mean converges to the true expectation:

(1/n) Σ f(xᵢ) → E[f(X)]

Central Limit Theorem: the error in the Monte Carlo estimate is approximately normal:

Error ~ N(0, σ²/n)

Standard error = σ/√n, where σ is the standard deviation of f(X).

Convergence rate: to halve the error, you need 4× more samples. To gain one decimal place, 100× more samples. This 1/√n rate is slow — but it doesn’t depend on the dimension of the problem, which is why Monte Carlo dominates in high dimensions.

When to Use Monte Carlo

Situation	Why Monte Carlo
High-dimensional integrals	Convergence rate independent of dimension
Complex posteriors (Bayesian)	MCMC samples from intractable distributions
Stochastic systems	Simulation captures randomness directly
No closed-form solution	Random sampling sidesteps the need for one
Rare event estimation	Importance sampling focuses samples where they matter

The tradeoff: Monte Carlo is approximate, requires many samples, and has random error. But for problems where exact methods fail or don’t exist, it’s often the only practical approach — which is why it’s been indispensable since Los Alamos.