Probabilistic Programming

Probabilistic Programming

Probabilistic programming is a relatively new form of programming, that was developed as a result of a conceptual breakthrough in generalizing probabilistic modelling from inflexible Bayesian networks and graphical models to a probabilistic analogy of higher-order logic (HOL). There is a very good Google talk by Noah Goodman about this in the fourth conference on artificial general intelligence in 2011, explaining this new paradigm.

Unlike deep learning, probabilistic programming enables humans to understand the structure of complex probability distributions in a similar way that a traditional source code allows people to understand a traditional computer program. (show more)

What are PPLs?

PPLs are Probabilistic Programming Languages, - languages designed to model and infer from complex probability distributions, and this way, reason under uncertainty. Anyone, who has a set of logically and probabilistically related uncertainties in life about the events of interest, could potentially use a PPL to model their situations and reason about them to come up with better decisions.

Probabilistic programs are simply experiment descriptions with random object generators. For a simple example, if we want to know what are the likelihoods of different numbers of points obtained by tossing 3 different biased coins (two with probability of $0.1$ of getting a point, and one with probability of $0.9$ of getting a point), we can write a probabilistic program to simulate these outcomes, and then normalize the measure to a probability distribution. One of the simplest, and easy-to-use PPLs is WebPPL (pronounced as Web People), developed by N. D. Goodman and A. Stuhlmüller at Stanford.

So, here is an example that models the above situation in WebPPL:

var model = function(){
    var a = flip(0.1)
    var b = flip(0.9)
    var c = flip(0.1)
    return a + b + c
}

var times = 10

print(EnumerateDepthFirst(model, times))

Click "run" to see the result. The result shows the outcomes (number of points) with expected probabilities. The simplest way to imagine how a PPL works, is by thinking of such modelling as a loop, within which we try to do random sampling with a strategy of choice (in this case, DepthFirst enumeration) a desired number of times (in our case, 10 times). What we get is estimate of probability distribution by which the function (in this case "model") is distributed.

Let's take a look at another example, where we try to flip a coin with with random bias.

var model = function(){
    var p = beta(5,5)
    var coin = flip(p)
    return coin
}

print(MCMC(model))

In this case, $p$ is sampled from beta distribution with shape parameters $\alpha=5, \beta=5$ (a symmetric, bell-shaped distribution with support on interval $(0,1)$ , and then sample $coin$ from Bernoulli distriubution with parameter $p$, and perform the sampling simulation with Markov-Chain Monte Carlo method. The PPLs provide random number samplers for various distributions for our convenience, so we don't have to derive them from Bernoulli distribution ourselves. (show note)

In summary

PPLs are languages designed to model and infer from complex probabilistic models. PPLs allow a programmer to define a complex probability distribution in terms of a set of probability distributions, where one distribution possibly parametrizes another arbitrarily deeply and with rich semantics (e.g., ability to use mathematical operations, conditionals, etc.), and do inference on it by running the program not once, but a large number of times, until we have a good approximation the frequency of all program execution paths that define the compound probability distribution, from which we then can do inference (e.g., obtain marginal distributions). The sampling can be done in many ways suitable for different types of situations (priority-based enumeration with caching, particle filtering, Markov chain Monte Carlo,..).

It pretty much summarizes how PPLs work. With the diversity of distributions known to model real life processes, and cheap GPU computing, PPLs open up a world of possibilities in probabilistic modelling and inference, and understanding our models to a similar degree, to which one programmer can understand another programmer's code.

In the past, PPLs used to be implemented independently (like BUGS), but it is convenient to implement them as a feature of existing languages. Notable ones - Church (under lisp), the WebPPL (under JavaScript), PyMC -- a package in Python (show note)

.