Random walk
In this tutorial, we differentiate a random walk over the integers using StochasticAD
. We will need the following packages,
using Distributions # defines several supported discrete distributions
using StochasticAD
using StaticArrays # for more efficient small arrays
Setting up the random walk
Let's define a function for simulating the walk.
function simulate_walk(probs, steps, n)
state = 0
for i in 1:n
probs_here = probs(state) # transition probabilities for possible steps
step_index = rand(Categorical(probs_here)) # which step do we take?
step = steps[step_index] # get size of step
state += step
end
return state
end
simulate_walk (generic function with 1 method)
Here, steps
is a (1-indexed) array of the possible steps we can take. Each of these steps has a certain probability. To make things more interesting, we take in a function probs
to produce these probabilities that can depend on the current state of the random walk.
Let's zoom in on the two lines where discrete randomness is involved.
step_index = rand(Categorical(probs_here)) # which step do we take?
step = steps[step_index] # get size of step
This is a cute pattern for making a discrete choice. First, we sample from a Categorical
distribution from Distributions.jl
, using the probabilities probs_here
at our current position. This gives us an index between 1
and length(steps)
, which we can use to pick the actual step to take. Stochastic triples propagate through both steps!
Differentiating the random walk
Let's define a toy problem. We consider a random walk with -1
and +1
steps, where the probability of +1
starts off high but decays exponentially with a decay length of p
. We take n = 100
steps and set p = 50
.
using StochasticAD
const steps = SA[-1, 1] # move left or move right
make_probs(p) = X -> SA[1 - exp(-X / p), exp(-X / p)]
f(p, n) = simulate_walk(make_probs(p), steps, n)
@show f(50, 100) # let's run a single random walk with p = 50
@show stochastic_triple(p -> f(p, 100), 50) # let's see how a single stochastic triple looks like at p = 50
StochasticTriple of Int64:
24 + 0ε + (2 with probability 0.5623627757189044ε)
Time to differentiate! For fun, let's differentiate the square of the output of the random walk.
f_squared(p, n) = f(p, n)^2
samples = [derivative_estimate(p -> f_squared(p, 100), 50) for i in 1:1000] # many samples from derivative program at p = 50
derivative = mean(samples)
uncertainty = std(samples) / sqrt(1000)
println("derivative of 𝔼[f_squared] = $derivative ± $uncertainty")
derivative of 𝔼[f_squared] = 29.656889500123906 ± 0.6858685562976662
Computing variance
A crucial figure of merit for a derivative estimator is its variance. We compute the standard deviation (square root of the variance) of our estimator over a range of n
.
n_range = 10:10:100 # range for testing asymptotic variance behaviour
p_range = 2 .* n_range
nsamples = 10000
stds_triple = Float64[]
for (n, p) in zip(n_range, p_range)
std_triple = std(derivative_estimate(p -> f_squared(p, n), p)
for i in 1:(nsamples))
push!(stds_triple, std_triple)
end
@show stds_triple
10-element Vector{Float64}:
0.9892190802145568
1.5918965085962704
2.178650475006562
2.8056702276020036
3.4188749110468994
3.964868631191327
4.673267362279
5.239145499809949
5.871745973879744
6.479976908045952
For comparison with other unbiased estimators, we also compute stds_score
and stds_score_baseline
for the score function gradient estimator, both without and with a variance-reducing batch-average control variate (CV). (For details, see core.jl
and compare_score.jl
.) We can now graph the standard deviation of each estimator versus $n$, observing lower variance in the unbiased derivative estimate produced by stochastic triples:
⠀