StochasticAD
StochasticAD is an experimental, research package for automatic differentiation (AD) of stochastic programs. It implements AD algorithms for handling programs that can contain discrete randomness, based on the methodology developed in this NeurIPS 2022 paper.
Introduction
Derivatives are all about how functions are affected by a tiny change ε
in their input. First, let's imagine perturbing the input of a deterministic, differentiable function such as $f(p) = p^2$ at $p = 2$.
using StochasticAD
f(p) = p^2
stochastic_triple(f, 2) # Feeds 2 + ε into f
StochasticTriple of Int64:
4 + 4ε
The output tells us that if we change the input 2
by a tiny amount ε
, the output of f
will change by approximately 4ε
. This is the case we're familiar with: we can get the value 4
by applying the chain rule, $\frac{\mathrm{d}}{\mathrm{d} p} p^2 = 2p = 4$. Thinking in terms of tiny changes, the output above looks a lot like a dual number. But what happens with a discrete random function? Let's give it a try.
using StochasticAD, Distributions
f(p) = rand(Bernoulli(p)) # 1 with probability p, 0 otherwise
stochastic_triple(f, 0.5) # Feeds 0.5 + ε into f
StochasticTriple of Int64:
0 + 0ε + (1 with probability 2.0ε)
The output of a Bernoulli variable cannot change by a tiny amount: it is either 0
or 1
. But in the probabilistic world, there is another way to change by a tiny amount on average: jump by a large amount, with tiny probability. StochasticAD
introduces a stochastic triple object, which generalizes dual numbers by including a third component to describe these perturbations. Here, the stochastic triple says that the original random output was 0
, but given a small change ε
in the input, the output will jump up to 1
with probability approximately 2ε
.
Stochastic triples can be used to construct a new random program whose average is the derivative of the average of the original program. We simply propagate stochastic triples through the program via stochastic_triple
, and then sum up the "dual" and "triple" components at the end via derivative_contribution
. This process is packaged together in the function derivative_estimate
. Let's try a crazier example, where we mix discrete and continuous randomness!
using StochasticAD, Distributions
function X(p)
a = p * (1 - p)
b = rand(Binomial(10, p))
c = 2 * b + 3 * rand(Bernoulli(p))
return a * c * rand(Normal(b, a))
end
st = @show stochastic_triple(X, 0.6) # sample a single stochastic triple at p = 0.6
@show derivative_contribution(st) # which produces a single derivative estimate...
samples = [derivative_estimate(X, 0.6) for i in 1:1000] # many samples from derivative program
derivative = mean(samples)
uncertainty = std(samples) / sqrt(1000)
println("derivative of 𝔼[X(p)] = $derivative ± $uncertainty")
stochastic_triple(X, 0.6) = 7.539501501605484 + -6.165835836009139ε + (4.28487537540137 with probability 17.499999999999996ε)
derivative_contribution(st) = 68.81948323351482
derivative of 𝔼[X(p)] = 48.0276927090082 ± 0.6155333799779626
Index
See the public API for a walkthrough of the API, and the tutorials on differentiating a random walk, a stochastic game of life, and a particle filter, and solving stochastic optimization and variational problems with discrete randomness. This is a prototype package with a number of limitations.