Jekyll2023-06-27T16:29:46+00:00https://mrandri19.github.io/feed.xmlHomemy blogOne Dimensional Variational Inference2023-06-25T00:00:00+00:002023-06-25T00:00:00+00:00https://mrandri19.github.io/2023/06/25/one-dimensional-variational-inference<p>In this post we will get our hands dirty and use the concepts we learned
in <a href="https://mrandri19.github.io/2023/05/28/what-is-the-elbo-in-variational-inference.html">“What is the ELBO in Variational
Inference?”</a>
to perform variational inference for a 1D distribution.</p>
<p>As this post is a Jupyter notebook, we first begin with the usual
scientific Python imports.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span><span class="o">=</span><span class="s">'retina'</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_theme</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">'darkgrid'</span><span class="p">)</span>
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">norm</span><span class="p">,</span> <span class="n">skewnorm</span>
<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">minimize</span>
<span class="kn">from</span> <span class="nn">scipy.integrate</span> <span class="kn">import</span> <span class="n">quad</span>
</code></pre></div></div>
<h2 id="prior-likelihood-and-posterior">Prior, likelihood, and posterior</h2>
<p>We begin by implementing the prior and the likelihood. The prior is a
Normal distribution centered around 0 and with standard deviation 5. We
make the standard deviation large to allow for a wide range posterior
values. The likelihood, thanks to Variational Inference, can be any
distribution, since we don’t need the prior and likelihood to be
conjugate. We chose the likelihood to have the PDF of a <a href="https://en.wikipedia.org/wiki/Skew_normal_distribution">skew normal
distribution</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">PRIOR_MU</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">PRIOR_SIGMA</span> <span class="o">=</span> <span class="mi">5</span>
<span class="k">def</span> <span class="nf">prior</span><span class="p">(</span><span class="n">theta</span><span class="p">):</span>
<span class="s">"p(theta) = N(theta | 0, 5)"</span>
<span class="k">return</span> <span class="n">norm</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">PRIOR_MU</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">PRIOR_SIGMA</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">likelihood</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">):</span>
<span class="s">"p(y | theta) = skewnorm(y | 5, theta, 2)"</span>
<span class="k">return</span> <span class="n">skewnorm</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">a</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">theta</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p>We numerically compute our posterior as there is no analytical solution
for it. This is possible because we are in 1D but for real-world
high-dimensional problems, this is untractable. We will use it to verify
the correctness of our solution.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="s">"p(theta | y) = p(y | theta) * p(theta) / p(y)"</span>
<span class="n">evidence</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">quad</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">theta_</span><span class="p">:</span> <span class="n">likelihood</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta_</span><span class="p">)</span> <span class="o">*</span> <span class="n">prior</span><span class="p">(</span><span class="n">theta_</span><span class="p">),</span>
<span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span>
<span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">likelihood</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span> <span class="o">*</span> <span class="n">prior</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span> <span class="o">/</span> <span class="n">evidence</span>
</code></pre></div></div>
<h2 id="variational-posterior-and-variational-objective">Variational posterior and variational objective</h2>
<p>And now to the juicy parts: we need to implement our variational
posterior and the variational objective that we will maximize. We choose
the variational posterior to be Normal as it is a flexible distribution
and it gives us a closed-form expression for the KL-divergence of the
variational posterior and prior. The variational objective uses
numerical integration to compute the 1D integrals of the data fit term
and closed-form expression for the KL divergence. Even if we were
solving this problem in higher dimensions, the data fit term will be a
product of 1D integrals, which is a bit more expensive to compute but
its complexity is linear in the number of dimensions, not exponential.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">variational_posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
<span class="s">"q(theta) = N(theta | m, s))"</span>
<span class="k">return</span> <span class="n">norm</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">m</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">s</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">data_fit_term</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
<span class="s">"E_q(theta | m, s) [log p(y | theta)]"</span>
<span class="n">integral</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">quad</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">theta</span><span class="p">:</span> <span class="n">variational_posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
<span class="o">*</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">likelihood</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1e-8</span><span class="p">)),</span>
<span class="o">-</span><span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span>
<span class="n">np</span><span class="p">.</span><span class="n">inf</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">integral</span>
<span class="k">def</span> <span class="nf">kl_term</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
<span class="s">"KL(q(theta | m, s) || p(theta))"</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">PRIOR_SIGMA</span> <span class="o">/</span> <span class="n">s</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span>
<span class="p">(</span><span class="n">s</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="p">(</span><span class="n">m</span> <span class="o">-</span> <span class="n">PRIOR_MU</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">PRIOR_SIGMA</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">-</span> <span class="mf">0.5</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">variational_objective</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
<span class="k">return</span> <span class="n">data_fit_term</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="o">-</span> <span class="n">kl_term</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="finding-the-best-variational-approximation">Finding the best variational approximation</h2>
<p>Before solving our optimization problem, let’s first see what our
starting state looks like with the plotting function defined below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot_variational_and_true_posterior</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">prior</span><span class="p">(</span><span class="n">theta</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"$p(</span><span class="se">\\</span><span class="s">theta)$"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">variational_posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"$q(</span><span class="se">\\</span><span class="s">theta)$"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">posterior</span><span class="p">(</span><span class="n">theta</span><span class="p">,</span> <span class="n">y</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"$p(</span><span class="se">\\</span><span class="s">theta | y)$"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"$</span><span class="se">\\</span><span class="s">theta$"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"Density"</span><span class="p">)</span>
<span class="n">obj</span> <span class="o">=</span> <span class="n">variational_objective</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span>
<span class="sa">f</span><span class="s">"""
Prior, variational posterior, and true posterior
$m$=</span><span class="si">{</span><span class="n">m</span><span class="p">:.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, $s$=</span><span class="si">{</span><span class="n">s</span><span class="p">:.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, $L[q]$=</span><span class="si">{</span><span class="n">obj</span><span class="p">:.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">
"""</span>
<span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">fig</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
</code></pre></div></div>
<p>We assume that our only observation is y=3 and we plot the prior,
variational posterior, and true posterior for a range of values.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">y</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">20</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">1000</span><span class="p">)</span>
<span class="n">plot_variational_and_true_posterior</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.3</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/one-dimensional-variational-inference/25ef419437339814f04b3e29d1ddd7b35be169dc.png" alt="" /></p>
<p>Now we can finally run Scipy’s optimizer to maximize (or minimize the
negated) variational objective. And see that it indeed worked! The
variational posterior is quite close to the true posterior distribution.
We can also see that the normal variational posterior is not skewed like
the likelihood, but it tries to compensate by moving the mean to the
left, where the posterior is skewed.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">res</span> <span class="o">=</span> <span class="n">minimize</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="o">-</span><span class="n">variational_objective</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span>
<span class="n">x0</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">5</span><span class="p">],</span>
<span class="n">bounds</span><span class="o">=</span><span class="p">[(</span><span class="o">-</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="p">(</span><span class="mf">0.1</span><span class="p">,</span> <span class="mi">10</span><span class="p">)],</span>
<span class="p">)</span>
<span class="n">m</span><span class="p">,</span> <span class="n">s</span> <span class="o">=</span> <span class="n">res</span><span class="p">.</span><span class="n">x</span>
<span class="n">plot_variational_and_true_posterior</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="n">m</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/one-dimensional-variational-inference/e6c3fa29b7c6e5e45c67c9814708a0b9cbe8b6c3.png" alt="" /></p>
<p>That’s all folks! In the next post we will run a similar experiment, but
using multiple dimensions.</p>In this post we will get our hands dirty and use the concepts we learned in “What is the ELBO in Variational Inference?” to perform variational inference for a 1D distribution.What is the ELBO in Variational Inference?2023-05-28T00:00:00+00:002023-05-28T00:00:00+00:00https://mrandri19.github.io/2023/05/28/what-is-the-elbo-in-variational-inference<link rel="stylesheet" href="/assets/katex/katex.min.css" />
<script defer="" src="/assets/katex/katex.min.js"></script>
<script defer="" src="/assets/katex/contrib/auto-render.min.js" onload="renderMathInElement(document.body);">
</script>
<h2 id="bayesian-inference">Bayesian Inference</h2>
<h3 id="the-three-main-objects-of-bayesian-inference">The three main objects of Bayesian inference</h3>
<p>In Bayesian Inference, there are <strong>three main objects</strong> we want to study:</p>
<ul>
<li>
<p>The <strong>distribution of observations</strong> \(y \sim Y\). Under the Bayesian framework, we believe that the observed data is sampled from a probability distribution.
For example, if we measure the height of every person in a group of people, we could assume that the observations come from a Normal distribution i.e. \(\text{height} \sim N\).</p>
</li>
<li>
<p>The <strong>distribution of parameters</strong> \(\theta \sim \Theta\). Parameters describe the process that generates the observed data and cannot be observed directly.
In our height measurement example, having picked a normal distribution for the observations, we can say that our parameters are the distribution’s mean and variance i.e. \(\theta = \{\mu, \sigma^2\}\).
Crucially, we need to specify the distribution of all parameters before seeing any observation. For example, we could say that the mean comes from a standard normal distribution and the variance comes from a standard half-normal distribution i.e.
\(\mu \sim N(0, 1^2)\), and \(\sigma^2 \sim N^+(0, 1^2)\).
Our complete model then would be that \(\text{height} \sim N(\mu, \sigma^2)\).</p>
</li>
<li>
<p>The <strong>posterior distribution</strong> \(p\left(\theta \mid y \right)\) connects observations and parameters by specifying the distribution of parameters conditional on the value of the observations.
In our example, it would tell us what the mean and the variance of the height are.
A possible posterior distribution could be \(\mu \mid y \sim N(1.79, 0.05^2)\) and \(\sigma^2 \mid y \sim N^+(0.10, 0.03^2)\).</p>
</li>
</ul>
<h3 id="bayes-theorem-likelihood-prior-and-evidence">Bayes’ theorem, likelihood, prior, and evidence</h3>
<p>How to obtain the posterior distribution? Given some observations, how do we know which parameters generated them?
This is where <strong>Bayes’ theorem</strong> comes in.
It tells us how to write the Probability Density Function (PDF) of the posterior distribution using three simpler PDFs:</p>
\[\underbrace{
p\left(\theta \mid y\right)
}_\text{posterior}
=
\frac{
\overbrace{
p\left(y \mid \theta\right)
}^\text{likelihood}
\overbrace{
p\left( \theta \right)
}^\text{prior}
}{
\underbrace{
p\left( y \right)
}_\text{evidence}
}\]
<p>We define our model by directly choosing the <strong>prior</strong> \(p\left( \theta \right)\) and the <strong>likelihood</strong> \(p\left(y \mid \theta\right)\) so they are usually easy to evaluate.
The <strong>evidence</strong> \(p\left( y \right)\) however, is not clearly computable, especially when written in this formulation.</p>
<h3 id="law-of-total-probability-and-intractability-of-the-evidence-integral">Law of total probability and intractability of the evidence integral</h3>
<p>We can use the <strong>law of total probability</strong> to rewrite the evidence in terms of objects that we know, the prior and the likelihood:</p>
\[p\left( y \right)
=
\int p\left( y \mid \theta \right) p\left( \theta \right) d\theta\]
<p>The intuition behind this is that the integral of the PDF of the posterior distribution must be equal to 1 to be a valid distribution. The “shape” of the posterior PDF is only determined by the likelihood times the posterior. To make the integral of likelihood times posterior equal to one, we divide by a constant. The evidence is this constant.</p>
<p>Unfortunately, for most interesting applications <strong>the evidence integral has no closed-form solution</strong> (Gaussian Processes are one of the few interesting exceptions).
Because of this, <strong>we cannot directly use Bayes’ theorem to compute our posterior</strong> but we will have to take a more “approximate” route.</p>
<h2 id="variational-inference">Variational Inference</h2>
<h3 id="approximate-variational-posterior">Approximate variational posterior</h3>
<p>Given that we cannot directly know the posterior’s PDF (and thus its distribution) by applying Bayes’ theorem, let’s do something else.
What if, instead of getting the exact posterior distribution, we got “close enough” by finding a distribution \(q\left( \theta \right)\) that is very similar to the posterior \(p\left(\theta \mid y\right)\) and also very easy to compute?
Let’s call this distribution \(q\left( \theta \right)\) <strong>the variational posterior</strong>.</p>
<p>What does it mean for \(q\left( \theta \right)\) to be “close” to \(p\left(\theta \mid y\right)\)? To answer that, we need a notion of distance between distributions. The distance we are going to use is the <strong>KL-divergence</strong>:</p>
\[\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
\stackrel{\text{def}}{=}
\int{
q\left( \theta \right)
\log\left(
\frac{
q\left( \theta \right)
}{
p\left(\theta \mid y\right)
}
\right)
}
d\theta\]
<p>With the distance between variational and true posterior defined, we can define our
<strong>target variational distribution as the solution of the optimization problem</strong>:</p>
\[q^*\left( \theta \right)
=
\argmin_q \text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)\]
<h3 id="solving-the-variational-optimization-problem">Solving the variational optimization problem</h3>
<p>Let’s <strong>write down the definition of KL-divergence</strong> and apply the properties of logarithms:</p>
\[\begin{aligned}
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
&=
\int{
q\left( \theta \right)
\log\left(
\frac{
q\left( \theta \right)
}{
p\left(\theta \mid y\right)
}
\right)
}
d\theta \\
&=
\int{
q\left( \theta \right)
\left[
\log\left(
q\left( \theta \right)
\right)
-
\log\left(
p\left(\theta \mid y\right)
\right)
\right]
}
d\theta \\
\end{aligned}\]
<p>The main problem with this expression is that our true posterior \(p\left(\theta \mid y\right)\) is on both sides of the expression.
Let’s work on the right hand side and <strong>remove the true posterior by using Bayes’ theorem</strong> and the properies of logarithms:</p>
\[\begin{aligned}
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
&=
\int{
q\left( \theta \right)
\left[
\log\left(
q\left( \theta \right)
\right)
-
\log\left(
p\left(\theta \mid y\right)
\right)
\right]
}
d\theta \\
&=
\int{
q\left( \theta \right)
\left[
\log\left(
q\left( \theta \right)
\right)
-
\log\left(
\frac{
p\left(y \mid \theta\right)
p\left( \theta \right)
}{
p\left( y \right)
}
\right)
\right]
}
d\theta \\
&=
\int{
q\left( \theta \right)
\left[
\log\left(
q\left( \theta \right)
\right)
-
\left(
\log p\left(y \mid \theta\right)
+
\log p\left( \theta \right)
-
\log p\left( y \right)
\right)
\right]
}
d\theta \\
&=
\int{
q\left( \theta \right)
\left[
\log\left(
q\left( \theta \right)
\right)
-
\log p\left(y \mid \theta\right)
-
\log p\left( \theta \right)
+
\log p\left( y \right)
\right]
}
d\theta \\
&=
\int{
q\left( \theta \right)
\left(
\log q\left( \theta \right)
-
\log p\left( \theta \right)
\right)
d\theta
}
-
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
+
\int{
q\left( \theta \right)
\log p\left( y \right)
d\theta
}
\\
&=
\int{
q\left( \theta \right)
\log\left(
\frac{
q\left( \theta \right)
}{
\log p\left( \theta \right)
}
\right)
d\theta
}
-
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
+
\int{
q\left( \theta \right)
\log p\left( y \right)
d\theta
}
\\
\end{aligned}\]
<p>We can now notice that <strong>the first term is a KL-divergence</strong>, and that
<strong>in the last term, \(\log p\left( \theta \right)\) can be extracted from the integral</strong>, which sums up to 1:</p>
\[\begin{aligned}
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
&=
\int{
q\left( \theta \right)
\log\left(
\frac{
q\left( \theta \right)
}{
\log p\left( \theta \right)
}
\right)
d\theta
}
-
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
+
\int{
q\left( \theta \right)
\log p\left( y \right)
d\theta
}
\\
&=
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)
-
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
+
\log p\left( y \right)
\\
\end{aligned}\]
<p>We have achieved our goal, the true posterior only appears in a single term of the expression, currently the left hand side.
Let’s rearrange the terms:</p>
\[\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
=
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)
-
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
+
\log p\left( y \right)\]
<p>to get:</p>
\[\log p\left( y \right)
=
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
- \text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)
+
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)\]
<h3 id="the-elbo">The ELBO</h3>
<p>The final step requires noticing that <strong>\(\log{p(y)}\) is constant w.r.t. \(\theta\)</strong>.
Thus, choosing a \(\theta\) that increases the first right term will make the second term smaller.</p>
\[\overbrace{
\log p\left( y \right)
}^\text{the evidence is a constant}
=
\overbrace{
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
- \text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)
}^\text{so when the 1st term increases}
+
\overbrace{
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \mid y\right)
\right)
}^\text{the 2nd term must decrease}\]
<p>And we are done!
Let’s give a name to the 1st term, i.e. <strong>the Evidence Lower BOund (ELBO)</strong>:</p>
\[\text{ELBO}(\theta)
\stackrel{\text{def}}{=}
\int{
q\left( \theta \right)
\log p\left(y \mid \theta\right)
d\theta
}
-
\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)\]
<p>Then <strong>maximising the ELBO is equivalent to minimizing the variational loss</strong>
\(\text{KL}\left(
q\left( \theta \right)
\middle\|
p\left(\theta \right)
\right)\).</p>The Volatility Smile2022-11-20T00:00:00+00:002022-11-20T00:00:00+00:00https://mrandri19.github.io/2022/11/20/the-volatility-smile<h1 id="introduction">Introduction</h1>
<p>This notebook shows that the volatility smile is a caused by the excess
kurtosis (the “fat tails”) of the log-returns distribution.</p>
<p>To show this we take three steps:</p>
<ul>
<li>
<p>Simulate the underlying prices using log-returns sampled from a normal
distribution (thin-tailed) and a t-distribution (fat tailed).</p>
</li>
<li>
<p>For every distribution, compute the prices of a series of call options
using samples of the underlying prices.</p>
</li>
<li>
<p>For every distribution, compute the “implied volatility”, i.e. the
volatility needed to make the Black Scholes model predictions match
the observed price. This is done for every strike.</p>
</li>
</ul>
<h1 id="log-returns-and-stock-prices">Log returns and stock prices</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#| code-fold: true
</span><span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span><span class="o">=</span><span class="s">'retina'</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">from</span> <span class="nn">numpy</span> <span class="kn">import</span> <span class="n">newaxis</span>
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">norm</span><span class="p">,</span> <span class="n">skew</span><span class="p">,</span> <span class="n">kurtosis</span>
<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">minimize_scalar</span>
<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">'figure.figsize'</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_theme</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">"darkgrid"</span><span class="p">)</span>
<span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">(</span><span class="n">seed</span><span class="o">=</span><span class="mi">42</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">yearly_volatility</span> <span class="o">=</span> <span class="mf">0.15</span>
<span class="n">initial_value</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">num_samples</span> <span class="o">=</span> <span class="mi">100_000</span>
<span class="k">def</span> <span class="nf">compute_normal_log_returns_and_stock_prices</span><span class="p">(</span>
<span class="n">yearly_volatility</span><span class="p">,</span> <span class="n">initial_value</span><span class="p">,</span> <span class="n">num_samples</span>
<span class="p">):</span>
<span class="n">lognormal_mean_one_correction</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">yearly_volatility</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">yearly_log_returns</span> <span class="o">=</span> <span class="n">rng</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span>
<span class="n">loc</span><span class="o">=</span><span class="n">lognormal_mean_one_correction</span><span class="p">,</span>
<span class="n">scale</span><span class="o">=</span><span class="n">yearly_volatility</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">num_samples</span><span class="p">,),</span>
<span class="p">)</span>
<span class="n">stock_price_samples</span> <span class="o">=</span> <span class="n">initial_value</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">yearly_log_returns</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">dict</span><span class="p">(</span>
<span class="n">yearly_log_returns</span><span class="o">=</span><span class="n">yearly_log_returns</span><span class="p">,</span> <span class="n">stock_price_samples</span><span class="o">=</span><span class="n">stock_price_samples</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">compute_t_dist_log_returns_and_stock_prices</span><span class="p">(</span>
<span class="n">yearly_volatility</span><span class="p">,</span> <span class="n">initial_value</span><span class="p">,</span> <span class="n">num_samples</span><span class="p">,</span> <span class="n">degrees_of_freedom</span>
<span class="p">):</span>
<span class="n">lognormal_mean_one_correction</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">yearly_volatility</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="c1"># the variance of a t-distributed random variable is df / (df - 2) so to match it
</span> <span class="c1"># with the Normal we need to add a correction factor.
</span> <span class="n">t_dist_std_correction</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">degrees_of_freedom</span> <span class="o">/</span> <span class="p">(</span><span class="n">degrees_of_freedom</span> <span class="o">-</span> <span class="mi">2</span><span class="p">))</span>
<span class="n">yearly_log_returns</span> <span class="o">=</span> <span class="n">lognormal_mean_one_correction</span> <span class="o">+</span> <span class="n">yearly_volatility</span> <span class="o">*</span> <span class="p">(</span>
<span class="n">t_dist_std_correction</span>
<span class="p">)</span> <span class="o">*</span> <span class="n">rng</span><span class="p">.</span><span class="n">standard_t</span><span class="p">(</span>
<span class="n">df</span><span class="o">=</span><span class="n">degrees_of_freedom</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">num_samples</span><span class="p">,),</span>
<span class="p">)</span>
<span class="n">stock_price_samples</span> <span class="o">=</span> <span class="n">initial_value</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">yearly_log_returns</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">dict</span><span class="p">(</span>
<span class="n">yearly_log_returns</span><span class="o">=</span><span class="n">yearly_log_returns</span><span class="p">,</span> <span class="n">stock_price_samples</span><span class="o">=</span><span class="n">stock_price_samples</span>
<span class="p">)</span>
<span class="k">def</span> <span class="nf">compute_skewed_t_dist_log_returns_and_stock_prices</span><span class="p">(</span>
<span class="n">yearly_volatility</span><span class="p">,</span> <span class="n">initial_value</span><span class="p">,</span> <span class="n">num_samples</span><span class="p">,</span> <span class="n">degrees_of_freedom</span>
<span class="p">):</span>
<span class="n">lognormal_mean_one_correction</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">yearly_volatility</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span>
<span class="c1"># the variance of a t-distributed random variable is df / (df - 2) so to match it
</span> <span class="c1"># with the Normal we need to add a correction factor.
</span> <span class="n">t_dist_std_correction</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">degrees_of_freedom</span> <span class="o">/</span> <span class="p">(</span><span class="n">degrees_of_freedom</span> <span class="o">-</span> <span class="mi">2</span><span class="p">))</span>
<span class="n">yearly_log_returns</span> <span class="o">=</span> <span class="n">lognormal_mean_one_correction</span> <span class="o">+</span> <span class="n">yearly_volatility</span> <span class="o">*</span> <span class="p">(</span>
<span class="n">t_dist_std_correction</span>
<span class="p">)</span> <span class="o">*</span> <span class="n">rng</span><span class="p">.</span><span class="n">standard_t</span><span class="p">(</span>
<span class="n">df</span><span class="o">=</span><span class="n">degrees_of_freedom</span><span class="p">,</span>
<span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">num_samples</span><span class="p">,),</span>
<span class="p">)</span>
<span class="n">stock_price_samples</span> <span class="o">=</span> <span class="n">initial_value</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">yearly_log_returns</span><span class="p">)</span>
<span class="k">return</span> <span class="nb">dict</span><span class="p">(</span>
<span class="n">yearly_log_returns</span><span class="o">=</span><span class="n">yearly_log_returns</span><span class="p">,</span> <span class="n">stock_price_samples</span><span class="o">=</span><span class="n">stock_price_samples</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">distributions</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"normal"</span><span class="p">:</span> <span class="n">compute_normal_log_returns_and_stock_prices</span><span class="p">(</span>
<span class="n">yearly_volatility</span><span class="p">,</span> <span class="n">initial_value</span><span class="p">,</span> <span class="n">num_samples</span>
<span class="p">),</span>
<span class="s">"t_dist"</span><span class="p">:</span> <span class="n">compute_t_dist_log_returns_and_stock_prices</span><span class="p">(</span>
<span class="n">yearly_volatility</span><span class="p">,</span> <span class="n">initial_value</span><span class="p">,</span> <span class="n">num_samples</span><span class="p">,</span> <span class="n">degrees_of_freedom</span><span class="o">=</span><span class="mi">3</span>
<span class="p">),</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="k">def</span> <span class="nf">plot_log_return_and_stock_price</span><span class="p">(</span><span class="n">distributions</span><span class="p">):</span>
<span class="n">fig</span><span class="p">,</span> <span class="p">(</span><span class="n">log_return_ax</span><span class="p">,</span> <span class="n">stock_price_ax</span><span class="p">)</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span>
<span class="n">nrows</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">ncols</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">5</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">ylr</span> <span class="o">=</span> <span class="n">dist</span><span class="p">[</span><span class="s">"yearly_log_returns"</span><span class="p">]</span>
<span class="n">log_return_ax</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span>
<span class="n">ylr</span><span class="p">,</span>
<span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">ylr</span><span class="p">.</span><span class="nb">min</span><span class="p">(),</span> <span class="n">ylr</span><span class="p">.</span><span class="nb">max</span><span class="p">(),</span> <span class="mf">0.05</span><span class="p">),</span>
<span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mean=</span><span class="si">{</span><span class="n">ylr</span><span class="p">.</span><span class="n">mean</span><span class="p">():.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, std=</span><span class="si">{</span><span class="n">ylr</span><span class="p">.</span><span class="n">std</span><span class="p">():.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">,</span><span class="se">\n</span><span class="s">"</span>
<span class="sa">f</span><span class="s">"skew=</span><span class="si">{</span><span class="n">skew</span><span class="p">(</span><span class="n">ylr</span><span class="p">):.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">, kurtosis=</span><span class="si">{</span><span class="n">kurtosis</span><span class="p">(</span><span class="n">ylr</span><span class="p">):.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">log_return_ax</span><span class="p">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">log_return_ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Log return"</span><span class="p">)</span>
<span class="n">log_return_ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Yearly log return distribution"</span><span class="p">)</span>
<span class="n">log_return_ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">sp</span> <span class="o">=</span> <span class="n">dist</span><span class="p">[</span><span class="s">"stock_price_samples"</span><span class="p">]</span>
<span class="n">stock_price_ax</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span>
<span class="n">sp</span><span class="p">,</span>
<span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">),</span>
<span class="nb">range</span><span class="o">=</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span>
<span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">, mean=</span><span class="si">{</span><span class="n">sp</span><span class="p">.</span><span class="n">mean</span><span class="p">():.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s">"</span><span class="p">,</span>
<span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">stock_price_ax</span><span class="p">.</span><span class="n">set_xlim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">200</span><span class="p">)</span>
<span class="n">stock_price_ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Stock price"</span><span class="p">)</span>
<span class="n">stock_price_ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Yearly stock price distribution"</span><span class="p">)</span>
<span class="n">stock_price_ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">fig</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="k">return</span> <span class="n">fig</span>
<span class="n">plot_log_return_and_stock_price</span><span class="p">(</span><span class="n">distributions</span><span class="p">)</span>
<span class="bp">None</span>
</code></pre></div></div>
<p><img src="/assets/images/the-volatility-smile/bf49e3cdb4cda1a633d167383de5fbef7c9c97f7.png" alt="" /></p>
<h1 id="monte-carlo-option-pricing">Monte Carlo Option Pricing</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_call_prices_monte_carlo</span><span class="p">(</span><span class="n">stock_price_samples</span><span class="p">,</span> <span class="n">strikes</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="p">.</span><span class="n">maximum</span><span class="p">(</span>
<span class="mi">0</span><span class="p">,</span>
<span class="n">stock_price_samples</span><span class="p">[:,</span> <span class="n">newaxis</span><span class="p">]</span> <span class="o">-</span> <span class="n">strikes</span><span class="p">[</span><span class="n">newaxis</span><span class="p">,</span> <span class="p">:],</span>
<span class="p">).</span><span class="n">mean</span><span class="p">(</span><span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">strikes</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">80</span><span class="p">,</span> <span class="mi">120</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">dist</span><span class="p">[</span><span class="s">"call_prices"</span><span class="p">]</span> <span class="o">=</span> <span class="n">compute_call_prices_monte_carlo</span><span class="p">(</span>
<span class="n">dist</span><span class="p">[</span><span class="s">"stock_price_samples"</span><span class="p">],</span> <span class="n">strikes</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="k">def</span> <span class="nf">plot_monte_carlo_prices</span><span class="p">(</span><span class="n">distributions</span><span class="p">):</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="n">strikes</span><span class="p">,</span>
<span class="n">np</span><span class="p">.</span><span class="n">maximum</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">initial_value</span> <span class="o">-</span> <span class="n">strikes</span><span class="p">),</span>
<span class="n">label</span><span class="o">=</span><span class="s">"Intrinsic value"</span><span class="p">,</span>
<span class="n">marker</span><span class="o">=</span><span class="s">"."</span><span class="p">,</span>
<span class="n">color</span><span class="o">=</span><span class="s">"black"</span><span class="p">,</span>
<span class="p">)</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="n">strikes</span><span class="p">,</span> <span class="n">dist</span><span class="p">[</span><span class="s">"call_prices"</span><span class="p">],</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s"> Monte Carlo prices"</span><span class="p">,</span> <span class="n">marker</span><span class="o">=</span><span class="s">"."</span>
<span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Strike price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"Call option price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Option prices under different underlying dynamics"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">fig</span>
<span class="n">plot_monte_carlo_prices</span><span class="p">(</span><span class="n">distributions</span><span class="p">)</span>
<span class="bp">None</span>
</code></pre></div></div>
<p><img src="/assets/images/the-volatility-smile/1361a759ca84aae73e7093ef5f353b23696dfe9d.png" alt="" /></p>
<h1 id="black-scholes-implied-volatilities">Black-Scholes implied volatilities</h1>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">compute_call_price_black_scholes</span><span class="p">(</span><span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">T</span><span class="p">):</span>
<span class="n">tau</span> <span class="o">=</span> <span class="n">T</span> <span class="o">-</span> <span class="n">t</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">norm</span><span class="p">.</span><span class="n">cdf</span>
<span class="n">d1</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">S</span> <span class="o">/</span> <span class="n">K</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">r</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">sigma</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">tau</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">sigma</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">tau</span><span class="p">))</span>
<span class="n">d2</span> <span class="o">=</span> <span class="n">d1</span> <span class="o">-</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">tau</span><span class="p">)</span>
<span class="n">V</span> <span class="o">=</span> <span class="n">S</span> <span class="o">*</span> <span class="n">N</span><span class="p">(</span><span class="n">d1</span><span class="p">)</span> <span class="o">-</span> <span class="n">K</span> <span class="o">*</span> <span class="n">N</span><span class="p">(</span><span class="n">d2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">V</span>
<span class="k">def</span> <span class="nf">value_to_iv</span><span class="p">(</span><span class="n">V</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">T</span><span class="p">):</span>
<span class="n">optimization_result</span> <span class="o">=</span> <span class="n">minimize_scalar</span><span class="p">(</span>
<span class="n">fun</span><span class="o">=</span><span class="k">lambda</span> <span class="n">sigma</span><span class="p">:</span> <span class="p">(</span>
<span class="p">(</span><span class="n">V</span> <span class="o">-</span> <span class="n">compute_call_price_black_scholes</span><span class="p">(</span><span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">+</span> <span class="mf">1e-9</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">K</span><span class="p">,</span> <span class="n">T</span><span class="p">))</span> <span class="o">**</span> <span class="mi">2</span>
<span class="p">),</span>
<span class="n">bounds</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
<span class="p">)</span>
<span class="k">assert</span> <span class="n">optimization_result</span><span class="p">.</span><span class="n">success</span>
<span class="k">return</span> <span class="n">optimization_result</span><span class="p">.</span><span class="n">x</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">values</span><span class="p">():</span>
<span class="n">dist</span><span class="p">[</span><span class="s">"implied_volatilities"</span><span class="p">]</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span>
<span class="p">[</span>
<span class="n">value_to_iv</span><span class="p">(</span><span class="n">V</span><span class="o">=</span><span class="n">V</span><span class="p">,</span> <span class="n">S</span><span class="o">=</span><span class="n">initial_value</span><span class="p">,</span> <span class="n">t</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">r</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">K</span><span class="o">=</span><span class="n">K</span><span class="p">,</span> <span class="n">T</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="k">for</span> <span class="n">K</span><span class="p">,</span> <span class="n">V</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">strikes</span><span class="p">,</span> <span class="n">dist</span><span class="p">[</span><span class="s">"call_prices"</span><span class="p">])</span>
<span class="p">]</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="k">def</span> <span class="nf">plot_implied_volatilities</span><span class="p">(</span><span class="n">distributions</span><span class="p">):</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist</span> <span class="ow">in</span> <span class="n">distributions</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">strikes</span><span class="p">,</span> <span class="n">dist</span><span class="p">[</span><span class="s">"implied_volatilities"</span><span class="p">],</span> <span class="n">marker</span><span class="o">=</span><span class="s">"."</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylim</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Strike price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"Yearly implied volatility"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Option BS implied volatilities under different underlying dynamics"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">fig</span>
<span class="n">plot_implied_volatilities</span><span class="p">(</span><span class="n">distributions</span><span class="p">)</span>
<span class="bp">None</span>
</code></pre></div></div>
<p><img src="/assets/images/the-volatility-smile/c93bfb99c27a4405d70487827f4a16e859aae92f.png" alt="" /></p>IntroductionOption Implied Stock Price Distributions2022-10-30T00:00:00+00:002022-10-30T00:00:00+00:00https://mrandri19.github.io/2022/10/30/option-implied-stock-price-distribution<link rel="stylesheet" href="/assets/katex/katex.min.css" />
<script defer="" src="/assets/katex/katex.min.js"></script>
<script defer="" src="/assets/katex/contrib/auto-render.min.js" onload="renderMathInElement(document.body);">
</script>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">QuantLib</span> <span class="k">as</span> <span class="n">ql</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">import</span> <span class="nn">yfinance</span> <span class="k">as</span> <span class="n">yf</span>
<span class="kn">from</span> <span class="nn">scipy.optimize</span> <span class="kn">import</span> <span class="n">minimize_scalar</span>
<span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">norm</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="o">%</span><span class="n">config</span> <span class="n">InlineBackend</span><span class="p">.</span><span class="n">figure_format</span><span class="o">=</span><span class="s">'retina'</span>
<span class="n">sns</span><span class="p">.</span><span class="n">set_theme</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">"whitegrid"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"figure.figsize"</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"axes.grid"</span><span class="p">]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">plt</span><span class="p">.</span><span class="n">rcParams</span><span class="p">[</span><span class="s">"lines.marker"</span><span class="p">]</span> <span class="o">=</span> <span class="s">"."</span>
</code></pre></div></div>
<h1 id="introduction">Introduction</h1>
<p>Options are financial contracts that give the buyer the right to
purchase (or sell) the underlying asset at a fixed price in a certain
time period. The <em>strike</em> \(E\) of an option determines the price at which
the underlying can be bought (for a call option) or sold (for a put
option). The <em>exiration</em> \(T\) determines the date at which the contract
stops being valid.</p>
<p>Option prices contain the market’s probabilistic forecast on the future
price of the underlying. This is because, in theory, the value of a call
option \(V(S, t)\) at time \(t\) on an underlying \(S\) is given by:</p>
\[V(S, t) = e^{-r(T - t)} \mathbb{E}^{\mathbb{Q}} \left[ \text{max}(S - E, 0) \right]\]
<p>where \(\mathbb{E}^{\mathbb{Q}}[f(S)]\) is the expectation of a function
of the underlying \(f(S)\) under the <em>risk-neutral</em> probability measure
\(\mathbb{Q}\).</p>
<p>We can recover the distribution (or probability meausure) of future
prices \(\mathbb{Q}\) by “inverting” the expectation that determines the
option prices. How? In 1978 Breeden and Litzenberger gave us a formula
that links option prices and the Probability Distribution Function (PDF)
of the risk-netral distribution:</p>
\[\frac{\partial^2 V}{\partial E^2} = e^{-r(T - t)} p^{\mathbb{Q}}(E)\]
<h1 id="method">Method</h1>
<h2 id="can-we-just-use-prices">Can we just use prices?</h2>
<p>The simplest implementation applies the Breeden-Litzenberger formula
directly to option prices by approximating the second-order derivative
with finite differences:</p>
\[\begin{align*}
\frac{\partial^2 V}{\partial E^2} & = e^{-r(T - t)} p^{\mathbb{Q}}(E) \\
\implies p^{\mathbb{Q}}(E) & = e^{r(T - t)} \frac{\partial^2 V}{\partial
E^2} \\
& \approx e^{r(T - t)} \frac{V(E + \delta E) + 2 V(E) - V(E - \delta
E)}{(\delta E)^2}
\end{align*}\]
<p>Unfortunately, the option’s strikes aren’t close enough to one another
get a good approximation of the second derivative. For underlyings in
the \(\$100-\$200\) range, the strikes are usually \(\$5\) apart from each other.</p>
<p>To improve the finite difference approximation we can interpolate the
prices. Unfortunately this can lead to arbitrages, because a linear or,
worse, a concave price interpolation implies a zero or negative (!)
value in the implied future price PDF.</p>
<h2 id="lets-work-in-iv-space">Let’s work in IV-space</h2>
<p>Instead of directly interpolating the option’s prices, we take a step
backwards and interpolate one of the parameters that determine the price
of an option.</p>
<p>The price of an option is determined by several parameters, and the full
formula is:</p>
\[\text{option price} = V(S, t; \sigma, r, D, E, T)\]
<p>where:</p>
<ul>
<li>\(S\) is the price of the underlying asset</li>
<li>\(t\) is the time to expiration (in years)</li>
<li>\(\sigma\) is the volatility (annualized standard deviation of the
returns) of the underlying</li>
<li>\(r\) is the interest rate (annualized, compounded continuously)</li>
<li>\(D\) is the dividend yield of the asset, e.g. a stock</li>
<li>\(E\) is the strike price</li>
<li>\(T\) is the expiration date</li>
</ul>
<p>We choose to interpolate the volatility \(\sigma\). But wait, under the
Black-Scholes (BS) model, isn’t the volatility constant? What does it
mean to interpolate it? It turns out that in practice, option prices do
not follow the Black-Scholes model exactly. Running an option trading
desk that trust the BS model unconditionally is a fun experiment left
for the reader (this is not investment advice).</p>
<p>By interpolating volatility we mean several steps:</p>
<ul>
<li>Using the option prices to compute the Implied Volatilities (IVs),
i.e. the volatilities that make the BS model match the market prices</li>
<li>Interpolating the IVs into a continuous curve, using some parametric
or nonparametric model</li>
<li>Using the interpolated IVs and the BS model to compute continuous
prices for the options</li>
</ul>
<p>After all of these steps we have continuous prices that we can use to
estimate the price distribution via finite differences.</p>
<h1 id="implementation">Implementation</h1>
<h2 id="price">Price</h2>
<p>We use the Yahoo Finance API to download the prices of call options on
Apple, expiring on February 17, 2023.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ticker</span> <span class="o">=</span> <span class="n">yf</span><span class="p">.</span><span class="n">Ticker</span><span class="p">(</span><span class="s">"AAPL"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="k">def</span> <span class="nf">decorate_price_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">:</span> <span class="n">plt</span><span class="p">.</span><span class="n">Axes</span><span class="p">):</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span><span class="n">ticker</span><span class="p">.</span><span class="n">info</span><span class="p">[</span><span class="s">"currentPrice"</span><span class="p">],</span> <span class="n">ls</span><span class="o">=</span><span class="s">"--"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s">"k"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Current price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">xaxis</span><span class="p">.</span><span class="n">set_major_formatter</span><span class="p">(</span><span class="s">"${x:.0f}"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">yaxis</span><span class="p">.</span><span class="n">set_major_formatter</span><span class="p">(</span><span class="s">"${x:.0f}"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">decorate_iv_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">:</span> <span class="n">plt</span><span class="p">.</span><span class="n">Axes</span><span class="p">):</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span><span class="n">ticker</span><span class="p">.</span><span class="n">info</span><span class="p">[</span><span class="s">"currentPrice"</span><span class="p">],</span> <span class="n">ls</span><span class="o">=</span><span class="s">"--"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s">"k"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Current price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">xaxis</span><span class="p">.</span><span class="n">set_major_formatter</span><span class="p">(</span><span class="s">"${x:.0f}"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">yaxis</span><span class="p">.</span><span class="n">set_major_formatter</span><span class="p">(</span><span class="s">"{x:.0%}"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Strike"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"Implied volatility"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prices</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">ticker</span><span class="p">.</span><span class="n">option_chain</span><span class="p">(</span><span class="n">date</span><span class="o">=</span><span class="s">"2023-02-17"</span><span class="p">)</span>
<span class="p">.</span><span class="n">calls</span><span class="p">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">"lastPrice"</span><span class="p">:</span> <span class="s">"last_price"</span><span class="p">})[</span>
<span class="p">[</span><span class="s">"strike"</span><span class="p">,</span> <span class="s">"bid"</span><span class="p">,</span> <span class="s">"ask"</span><span class="p">,</span> <span class="s">"last_price"</span><span class="p">]</span>
<span class="p">]</span>
<span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">mid_price</span><span class="o">=</span><span class="k">lambda</span> <span class="n">df</span><span class="p">:</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">"bid"</span><span class="p">]</span> <span class="o">+</span> <span class="n">df</span><span class="p">[</span><span class="s">"ask"</span><span class="p">])</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
<span class="p">.</span><span class="n">set_index</span><span class="p">(</span><span class="s">"strike"</span><span class="p">)[[</span><span class="s">"mid_price"</span><span class="p">,</span> <span class="s">"last_price"</span><span class="p">]]</span>
<span class="p">)</span>
</code></pre></div></div>
<p>We estimate the prices using the midprice, the arithmetic mean of bid
and ask. Using the last traded price for each option contract gives a
much noisier estimate.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">prices</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="n">style</span><span class="o">=</span><span class="s">"+"</span><span class="p">,</span>
<span class="n">xlabel</span><span class="o">=</span><span class="s">"Strike"</span><span class="p">,</span>
<span class="n">ylabel</span><span class="o">=</span><span class="s">"Option price"</span><span class="p">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">"Mid price vs last traded price"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">decorate_price_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">)</span>
<span class="bp">None</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/60147957364aa5f59827d14c4d5331e6466999aa.png" alt="" /></p>
<h2 id="implied-volatility-iv">Implied Volatility (IV)</h2>
<p>We implement in <code class="language-plaintext highlighter-rouge">call_option_value_black_scholes</code> the Black-Scholes
formula for the price of an European call option:</p>
<p>\(V(S, t) = S e^{-D (T- t)} N(d_1) - E e^{-r (T- t)} N(d_2)\) \(d_1 = \frac{
\log(S / E) + \left(r + \frac{\sigma^2}{2}\right) (T - t)
}{
\sigma \sqrt{T - t}
}\) \(d_2 = d_1 - \sigma \sqrt{T - t}\)</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">call_option_value_black_scholes</span><span class="p">(</span><span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">T</span><span class="p">,):</span>
<span class="n">tau</span> <span class="o">=</span> <span class="n">T</span> <span class="o">-</span> <span class="n">t</span>
<span class="n">N</span> <span class="o">=</span> <span class="n">norm</span><span class="p">.</span><span class="n">cdf</span>
<span class="n">d1</span> <span class="o">=</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">S</span> <span class="o">/</span> <span class="n">E</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">r</span> <span class="o">+</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">sigma</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">tau</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">sigma</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">tau</span><span class="p">))</span>
<span class="n">d2</span> <span class="o">=</span> <span class="n">d1</span> <span class="o">-</span> <span class="n">sigma</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">tau</span><span class="p">)</span>
<span class="n">V</span> <span class="o">=</span> <span class="n">S</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">D</span> <span class="o">*</span> <span class="n">tau</span><span class="p">)</span> <span class="o">*</span> <span class="n">N</span><span class="p">(</span><span class="n">d1</span><span class="p">)</span> <span class="o">-</span> <span class="n">E</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">r</span> <span class="o">*</span> <span class="n">tau</span><span class="p">)</span> <span class="o">*</span> <span class="n">N</span><span class="p">(</span><span class="n">d2</span><span class="p">)</span>
<span class="k">return</span> <span class="n">V</span>
</code></pre></div></div>
<p>Then, we implement in <code class="language-plaintext highlighter-rouge">value_to_iv</code> the “inverse” of this formula. This
function goes from the price of an European call option to the implied
volatility that makes the BS formula match the market price.</p>
<p>If you like formulas, this is the optimization problem that is being
solved:</p>
\[\sigma =
\underset{\tilde \sigma}{\mathrm{argmin}}
(V^{\text{market}} - V^{BS}(\tilde \sigma))^2\]
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">value_to_iv</span><span class="p">(</span><span class="n">V</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">T</span><span class="p">):</span>
<span class="n">optimization_result</span> <span class="o">=</span> <span class="n">minimize_scalar</span><span class="p">(</span>
<span class="n">fun</span><span class="o">=</span><span class="k">lambda</span> <span class="n">sigma</span><span class="p">:</span> <span class="p">(</span>
<span class="n">V</span> <span class="o">-</span> <span class="n">call_option_value_black_scholes</span><span class="p">(</span><span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">+</span> <span class="mf">1e-3</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">T</span><span class="p">)</span>
<span class="p">)</span>
<span class="o">**</span> <span class="mi">2</span><span class="p">,</span>
<span class="n">bounds</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">float</span><span class="p">(</span><span class="s">"+inf"</span><span class="p">)),</span>
<span class="p">)</span>
<span class="k">assert</span> <span class="n">optimization_result</span><span class="p">.</span><span class="n">success</span>
<span class="k">return</span> <span class="n">optimization_result</span><span class="p">.</span><span class="n">x</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">S</span> <span class="o">=</span> <span class="n">ticker</span><span class="p">.</span><span class="n">info</span><span class="p">[</span><span class="s">"currentPrice"</span><span class="p">]</span>
<span class="n">r</span> <span class="o">=</span> <span class="mf">0.0325</span>
<span class="n">D</span> <span class="o">=</span> <span class="mf">0.0059</span>
<span class="n">T</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="n">today</span> <span class="o">=</span> <span class="n">ql</span><span class="p">.</span><span class="n">Date</span><span class="p">(</span><span class="mi">31</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">2022</span><span class="p">)</span>
<span class="n">expiration</span> <span class="o">=</span> <span class="n">ql</span><span class="p">.</span><span class="n">Date</span><span class="p">(</span><span class="mi">17</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2023</span><span class="p">)</span>
<span class="n">trading_days</span> <span class="o">=</span> <span class="mi">252</span>
<span class="n">t</span> <span class="o">=</span> <span class="p">(</span><span class="n">trading_days</span> <span class="o">-</span> <span class="n">ql</span><span class="p">.</span><span class="n">TARGET</span><span class="p">().</span><span class="n">businessDaysBetween</span><span class="p">(</span><span class="n">today</span><span class="p">,</span> <span class="n">expiration</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="n">trading_days</span><span class="p">)</span>
<span class="n">ivs</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="n">prices</span><span class="p">.</span><span class="n">index</span><span class="p">)</span>
<span class="n">ivs</span><span class="p">[</span><span class="s">"iv"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">value_to_iv</span><span class="p">(</span><span class="n">V</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">T</span><span class="p">)</span> <span class="k">for</span> <span class="n">E</span><span class="p">,</span> <span class="n">V</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">prices</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">prices</span><span class="p">[</span><span class="s">"mid_price"</span><span class="p">])</span>
<span class="p">]</span>
</code></pre></div></div>
<p>We plot the <em>volatility surface</em>, the implied volatility for every
strike. First, we notice that the volatility is not constant, instead it
has a “smile-like” shape, implying that deep out-of-the money tails are
more expensive than the BS model predicts. Second, there seem to be some
outliers in the sub $100 strikes.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">"Implied Volatility (IV) for every strike"</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s">"+"</span><span class="p">)</span>
<span class="n">decorate_iv_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/9e76ddf1aebe13fd0e6d010638c36a4adfb6dee7.png" alt="" /></p>
<h2 id="smoothed-iv">Smoothed IV</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">smoothed_ivs</span> <span class="o">=</span> <span class="p">(</span>
<span class="n">ivs</span>
<span class="p">.</span><span class="n">rolling</span><span class="p">(</span><span class="n">window</span><span class="o">=</span><span class="mi">7</span><span class="p">,</span> <span class="n">center</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">win_type</span><span class="o">=</span><span class="s">"gaussian"</span><span class="p">)</span>
<span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">std</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="p">.</span><span class="n">dropna</span><span class="p">()</span>
<span class="p">.</span><span class="n">rename</span><span class="p">(</span><span class="n">columns</span><span class="o">=</span><span class="p">{</span><span class="s">"iv"</span><span class="p">:</span> <span class="s">"smoothed_iv"</span><span class="p">})</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">"IV vs smoothed IV for every strike"</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s">"+"</span><span class="p">)</span>
<span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s">"."</span><span class="p">)</span>
<span class="n">decorate_iv_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/4f10bdc5afe891cee7c683e3a89b556612d565df.png" alt="" /></p>
<h2 id="interpolated-iv">Interpolated IV</h2>
<p>Then, we intepolate between the smoothed IVs using a cubic spline.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">upsampling_rate</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">interpolated_ivs</span> <span class="o">=</span> <span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">reindex</span><span class="p">(</span>
<span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span>
<span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">index</span><span class="p">.</span><span class="nb">min</span><span class="p">(),</span>
<span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">index</span><span class="p">.</span><span class="nb">max</span><span class="p">(),</span>
<span class="p">(</span><span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">index</span><span class="p">.</span><span class="n">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">upsampling_rate</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">).</span><span class="n">assign</span><span class="p">(</span>
<span class="n">interpolated_iv</span><span class="o">=</span><span class="k">lambda</span> <span class="n">df</span><span class="p">:</span> <span class="n">df</span><span class="p">[</span><span class="s">"smoothed_iv"</span><span class="p">].</span><span class="n">interpolate</span><span class="p">(</span><span class="n">method</span><span class="o">=</span><span class="s">"cubicspline"</span><span class="p">)</span>
<span class="p">)[</span>
<span class="p">[</span><span class="s">"interpolated_iv"</span><span class="p">]</span>
<span class="p">]</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">"+"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"IV vs smoothed IV vs interpolated IV for every strike"</span><span class="p">)</span>
<span class="n">smoothed_ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s">"."</span><span class="p">)</span>
<span class="n">interpolated_ivs</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">style</span><span class="o">=</span><span class="s">"-"</span><span class="p">)</span>
<span class="n">decorate_iv_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/e9efa0e2efca50328153976e467a1c3f1489ae44.png" alt="" /></p>
<h2 id="iv-interpolated-price">IV-Interpolated price</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">interpolated_prices</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">index</span><span class="o">=</span><span class="n">interpolated_ivs</span><span class="p">.</span><span class="n">index</span><span class="p">)</span>
<span class="n">interpolated_prices</span><span class="p">[</span><span class="s">"interpolated_price"</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span>
<span class="n">call_option_value_black_scholes</span><span class="p">(</span><span class="n">S</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span> <span class="n">E</span><span class="p">,</span> <span class="n">T</span><span class="p">)</span>
<span class="k">for</span> <span class="n">E</span><span class="p">,</span> <span class="n">sigma</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">interpolated_ivs</span><span class="p">.</span><span class="n">index</span><span class="p">,</span> <span class="n">interpolated_ivs</span><span class="p">[</span><span class="s">"interpolated_iv"</span><span class="p">])</span>
<span class="p">]</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">prices</span><span class="p">[[</span><span class="s">"mid_price"</span><span class="p">]].</span><span class="n">plot</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">"+"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Mid prive vs IV-interpolated price"</span><span class="p">)</span>
<span class="n">interpolated_prices</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">style</span><span class="o">=</span><span class="s">"-"</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span>
<span class="n">decorate_price_axes</span><span class="p">(</span><span class="n">ax</span><span class="p">)</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/605e32822acb1e2149e9577bae0ef29e82aa2b5b.png" alt="" /></p>
<h2 id="implied-price-pdf">Implied price PDF</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">implied_price_pdf</span> <span class="o">=</span> <span class="n">interpolated_prices</span><span class="p">.</span><span class="n">pipe</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">df</span><span class="p">:</span> <span class="n">df</span><span class="p">.</span><span class="n">shift</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">df</span> <span class="o">+</span> <span class="n">df</span><span class="p">.</span><span class="n">shift</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">).</span><span class="n">dropna</span><span class="p">().</span><span class="n">transform</span><span class="p">(</span><span class="k">lambda</span> <span class="n">df</span><span class="p">:</span> <span class="n">df</span> <span class="o">/</span> <span class="n">df</span><span class="p">.</span><span class="nb">sum</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># | code-fold: true
</span><span class="n">ax</span> <span class="o">=</span> <span class="n">implied_price_pdf</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
<span class="n">style</span><span class="o">=</span><span class="s">"-"</span><span class="p">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">"Price"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"p"</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Implied price distribution"</span>
<span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span><span class="n">ticker</span><span class="p">.</span><span class="n">info</span><span class="p">[</span><span class="s">"currentPrice"</span><span class="p">],</span> <span class="n">ls</span><span class="o">=</span><span class="s">"--"</span><span class="p">,</span> <span class="n">c</span><span class="o">=</span><span class="s">"k"</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Current price"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="bp">None</span>
</code></pre></div></div>
<p><img src="/assets/images/option-implied-stock-price-distribution/4e5e7dcf9bcd900ec17975ad3bf339672970ec57.png" alt="" /></p>
<h1 id="conclusion-and-future-work">Conclusion and future work</h1>
<p>TODO(Andrea): write</p>
<h1 id="references">References</h1>
<ul>
<li><a href="https://www.wiley.com/en-us/Paul+Wilmott+Introduces+Quantitative+Finance%2C+2nd+Edition-p-9781118836798">Paul Wilmott Introduces Quantitative Finance, 2nd
Edition</a></li>
<li><a href="https://faculty.baruch.cuny.edu/lwu/890/BreedenLitzenberger78.pdf">Breeden, Litzenberger - Prices of State-Contingent Claims Implicit in
Option
Prices</a></li>
<li><a href="https://reasonabledeviations.com/2020/10/10/option-implied-pdfs-2/">Reasonable Deviations - Option-implied probability distributions,
part
2</a></li>
<li><a href="https://www.morganstanley.com/content/dam/msdotcom/en/assets/pdfs/Options_Probabilities_Exhibit_Link.pdf">Morgan Stanley - How Options Implied Probabilities Are
Calculated</a></li>
<li><a href="https://quant.stackexchange.com/questions/55239/explaining-the-risk-neutral-measure">Quant StackExchange - Explaining the Risk Netural
Measure</a></li>
</ul>Predictive sampling and graph traversals2022-01-24T00:00:00+00:002022-01-24T00:00:00+00:00https://mrandri19.github.io/2022/01/24/predictive-sampling-and-graph-traversals<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/katex.min.css" integrity="sha384-R4558gYOUz8mP9YWpZJjofhk+zx0AS11p36HnD2ZKj/6JR5z27gSSULCNHIRReVs" crossorigin="anonymous" />
<!-- The loading of KaTeX is deferred to speed up page rendering -->
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/katex.min.js" integrity="sha384-z1fJDqw8ZApjGO3/unPWUPsIymfsJmyrDVWC8Tv/a1HeOtGmkwNd/7xUS0Xcnvsx" crossorigin="anonymous"></script>
<!-- To automatically render math in text elements, include the auto-render extension: -->
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/contrib/auto-render.min.js" integrity="sha384-+XBljXPPiv+OzfbB3cVmLHf4hdUFHlWNZN5spNQ7rmHTXpd7WvJum6fIACpNNfIR" crossorigin="anonymous" onload="renderMathInElement(document.body);"></script>
<blockquote>
<p>Full code available at <a href="https://github.com/mrandri19/smolppl/tree/sampling">github.com/mrandri19/smolppl/tree/sampling</a></p>
</blockquote>
<h2 id="introduction">Introduction</h2>
<p>This post is the continuation of
<a href="https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lines-of-python.html">“A probabilistic programming language in 70 lines of Python”</a>.
Today, we extend the library built in the last post by creating an API for
sampling values from the prior and posterior distributions.</p>
<p>At the end we will have build an API like this one:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>
<span class="n">prior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="n">posterior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">{</span><span class="s">"x"</span><span class="p">:</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">})</span>
</code></pre></div></div>
<p>As users of a Probabilistic Programming Language (PPL), we are interested in
sampling the distributions defined by our model.
We do it, for example, to estimate means and variances via Monte Carlo
integration.
This is how we would estimate the prior and posteriror mean and standard
deviation of the <code class="language-plaintext highlighter-rouge">y</code> variable using the new API:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prior_samples</span> <span class="o">=</span> <span class="p">[</span><span class="n">prior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10_000</span><span class="p">)]</span>
<span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">prior_samples</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">prior_samples</span><span class="p">)</span>
<span class="c1"># => (0.0, 5.0)
</span>
<span class="n">posterior_samples</span> <span class="o">=</span> <span class="p">[</span><span class="n">posterior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">{</span><span class="s">"x"</span><span class="p">:</span> <span class="o">-</span><span class="mf">2.0</span><span class="p">})</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10_000</span><span class="p">)]</span>
<span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">posterior_samples</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">posterior_samples</span><span class="p">)</span>
<span class="c1"># => (-2.0, 4.0)
</span></code></pre></div></div>
<p>Mathematically speaking, the two snippets above define this probability model:</p>
\[x \sim \text{Normal}(0, 3)\]
\[y \sim \text{Normal}(x, 4)\]
<p>and compute the following expectations on it:</p>
\[E(y) = 0\]
\[\text{std}(y) = 5\]
\[E(y|x=-2) = -2\]
\[\text{std}(y|x=-2) = 4\]
<h2 id="related-work">Related work</h2>
<p>Most probabilistic programming languages implement sampling APIs.
We will briefly see how
<a href="https://mc-stan.org/">Stan</a>
and
<a href="https://docs.pymc.io/en/v3/">PyMC</a>, two of the most popular PPLs, allow
sampling.
Just skip this section if you only care about the implementation.</p>
<p>Stan’s sampling interface is lower-level compared to other PPLs.
Sampling from both the prior and the posterior must be implemented manually by
the user.
To perform <a href="https://mc-stan.org/docs/2_28/stan-users-guide/prior-predictive-checks.html">prior predictive</a> sampling,
the user must copy code from the <code class="language-plaintext highlighter-rouge">model</code> section into the <code class="language-plaintext highlighter-rouge">generated_quantities</code>
sections and replace all distribution calls (like <code class="language-plaintext highlighter-rouge">normal</code>, <code class="language-plaintext highlighter-rouge">binomial</code>) with
their <code class="language-plaintext highlighter-rouge">_rng</code> versions.
For <a href="https://mc-stan.org/docs/2_28/stan-users-guide/simulating-from-the-posterior-predictive-distribution.html">posterior predictive</a>
sampling, the procedure is similar, but model parameters are sampled from the
posterior chain rather than from a Random Number Generator (RNG).
In practice, this means that it is enough to keep using the same parameters as
in the <code class="language-plaintext highlighter-rouge">model</code> section.</p>
<p>In PyMC the process is much simpler from a user’s perspective.
PyMC uses its knowledge of the probabilistic DAG to automatically generate
implementations for the the likelihood, prior sampling, and posterior sampling.
In practice, this means calling <code class="language-plaintext highlighter-rouge">pm.sample_prior_predictive()</code> to get prior
samples
and <code class="language-plaintext highlighter-rouge">pm.sample_posterior_predictive(posterior_trace)</code> to get posterior samples.</p>
<h2 id="implementation">Implementation</h2>
<h3 id="dag-traversals">DAG traversals</h3>
<p>To understand the main challenge in sampling from a DAG let’s see an example.
Consider this probabilistic model:</p>
\[x \sim \text{Normal}(0, 3)\]
\[y \sim \text{Normal}(x, 4)\]
<p>in code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span> <span class="mf">4.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>
</code></pre></div></div>
<p>and its corresponding DAG:</p>
<figure>
<div style="display: flex;flex-direction: row;flex-wrap: nowrap;align-content: flex-start;justify-content: space-evenly;align-items: center;">
<img src="/assets/images/predictive-sampling-and-graph-traversals/DAG.svg" style="width: 16rem; min-width: 0;" />
</div>
<figcaption style="text-align: center; margin-top: 1rem;">
In-memory representation of the model
</figcaption>
</figure>
<p>Looking at the DAG we see that, to sample the <code class="language-plaintext highlighter-rouge">x</code> variable, we need the
floats <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">3</code> respectively for the mean and standard deviation.
To sample the <code class="language-plaintext highlighter-rouge">y</code> variable we need the value of <code class="language-plaintext highlighter-rouge">x</code> and the float value <code class="language-plaintext highlighter-rouge">4</code>.
This means that we need the values of a variable’s children <em>before</em> being able
to sample its value.
As a consequence, simple depth-first search (DFS) is not be enough.
Otherwise, it could happen that we try sampling the value of <code class="language-plaintext highlighter-rouge">y</code> without knowing
its mean <code class="language-plaintext highlighter-rouge">x</code>.</p>
<p>The solution is called
<a href="https://en.wikipedia.org/wiki/Depth-first_search#Vertex_orderings"><em>post-order DFS</em></a>.
Post-order traversal has the property that a node will only get visited <em>after</em>
all of its children have.
Simple DFS, on the other hand, does not have this property and does a
<em>pre-order</em> traversal.
To better understand the differences, check out the figure below.
With pre-order traversal the root node \(a\) is always evaluated before the
children \(b, c\), and in one case \(b\) is evaluated before its child
\(c\) is.</p>
<figure>
<div style="display: flex;flex-direction: row;flex-wrap: nowrap;align-content: flex-start;justify-content: space-evenly;align-items: center;">
<img src="/assets/images/predictive-sampling-and-graph-traversals/DAG-pre-order.svg" style="width: 16rem; min-width: 0;" />
<img src="/assets/images/predictive-sampling-and-graph-traversals/DAG-post-order.svg" style="width: 16rem; min-width: 0;" />
</div>
<figcaption style="text-align: center; margin-top: 1rem;">
Left: pre-order traversal of the DAG.
Right: post-order traversal of the DAG.
<br />
The blue, bold numbers represent the order in which the nodes where visited.
</figcaption>
</figure>
<p>The API we implement is heavily inspired by PyMC.
The function <code class="language-plaintext highlighter-rouge">prior_sample</code> samples one value from the prior distribution.
The function <code class="language-plaintext highlighter-rouge">posterior_sample</code> samples one value from the posterior, given a
dictionary of latent values from the posterior.
Inside these functions we traverse the probabilistic DAG and do what a Stan user
would do.
In <code class="language-plaintext highlighter-rouge">prior_sample</code> replace all variables with a new random value from their
respective distribution.
In <code class="language-plaintext highlighter-rouge">posterior_sample</code> replace all <code class="language-plaintext highlighter-rouge">ObservedVariables</code> with a new random value
from its distribution, and replace all <code class="language-plaintext highlighter-rouge">LatentVariables</code> with values from the
posterior chain.
Let’s see how to do it.</p>
<h3 id="distributions">Distributions</h3>
<p>First of all, we need to add a <code class="language-plaintext highlighter-rouge">sample</code> method to our <code class="language-plaintext highlighter-rouge">Distribution</code> abstract
class, and implement it for all of our distributions.
Just like we did for the log-density, we use SciPy.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Distribution</span><span class="p">:</span>
<span class="o"><</span><span class="n">rest</span> <span class="n">of</span> <span class="n">class</span><span class="o">></span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="n">params</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">NotImplementedError</span><span class="p">(</span><span class="s">"Must be implemented by a subclass"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Normal</span><span class="p">(</span><span class="n">Distribution</span><span class="p">):</span>
<span class="o"><</span><span class="n">rest</span> <span class="n">of</span> <span class="n">class</span><span class="o">></span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">sample</span><span class="p">(</span><span class="n">params</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">norm</span><span class="p">.</span><span class="n">rvs</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">scale</span><span class="o">=</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
</code></pre></div></div>
<h3 id="sampling-from-the-prior">Sampling from the prior</h3>
<p>We begin by implementing prior sampling as it is the simpler of the two.
We implement post-order depth-first search in the <code class="language-plaintext highlighter-rouge">collect_variables</code> inner
function.
It’s very similar to pre-order DFS but, instead of appending <code class="language-plaintext highlighter-rouge">variable</code> to
to <code class="language-plaintext highlighter-rouge">variables</code> <em>before</em> recursion, we do it <em>after</em>.
This results in a variable being “visited”, only after we have reached a leaf
of the DAG.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">prior_sample</span><span class="p">(</span><span class="n">root</span><span class="p">):</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">variables</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">collect_variables</span><span class="p">(</span><span class="n">variable</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="k">return</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="n">arg</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span>
<span class="c1"># post-order
</span> <span class="n">variables</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">root</span><span class="p">)</span>
</code></pre></div></div>
<p>Then, for every variable, we need to obtain the numeric value of every argument.
<code class="language-plaintext highlighter-rouge">float</code> arguments are already numeric so we take them as they are.
On the other hand, we will keep numeric values of variable in a dictionary
called <code class="language-plaintext highlighter-rouge">sampled_values</code>.
We are sure that we will never get a <code class="language-plaintext highlighter-rouge">KeyError</code> because of the post-order
traversal.
All children of a variable will be evaluated befoure evaluating the variable.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">sampled_values</span> <span class="o">=</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">variable</span> <span class="ow">in</span> <span class="n">variables</span><span class="p">:</span>
<span class="n">dist_params</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">dist_arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sampled_values</span><span class="p">[</span><span class="n">dist_arg</span><span class="p">.</span><span class="n">name</span><span class="p">])</span>
</code></pre></div></div>
<p>With all the required arguments, we call the <code class="language-plaintext highlighter-rouge">sample</code> method of the variable’s
distribution.
For prior sampling, we <code class="language-plaintext highlighter-rouge">sample</code> all types of variable, both <code class="language-plaintext highlighter-rouge">ObservedVariable</code>s
and <code class="language-plaintext highlighter-rouge">LatentVariable</code>s.
The result its stored in the <code class="language-plaintext highlighter-rouge">sampled_values</code> dict, to be used by one of the
variable’s parents (this is a DAG, we can have multiple parents).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">sampled_values</span><span class="p">[</span><span class="n">variable</span><span class="p">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_class</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span>
<span class="n">dist_params</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Finally, we return the sampled value of our root variable.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">return</span> <span class="n">sampled_values</span><span class="p">[</span><span class="n">root</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>
</code></pre></div></div>
<p>Let’s see an example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">5.0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>
<span class="n">prior_sample</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="c1"># => 5.09
</span>
<span class="n">prior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="c1"># => 6.40
</span></code></pre></div></div>
<h3 id="sampling-from-the-posterior">Sampling from the posterior</h3>
<p>For posterior sampling we repeat the same procedure: traverse the DAG in
post-order, starting at the root, accumulating variables inside <code class="language-plaintext highlighter-rouge">variables</code>.
The only difference being <code class="language-plaintext highlighter-rouge">latent_values</code>, which has the same role as it had in
<code class="language-plaintext highlighter-rouge">evaluate_log_density</code>: being a dictionary from latent variable names to their
numeric values.
When doing, for example, posterior predictive simulation we will use samples of
the posterior chain for the <code class="language-plaintext highlighter-rouge">latent_values</code> dictionary.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">posterior_sample</span><span class="p">(</span><span class="n">root</span><span class="p">,</span> <span class="n">latent_values</span><span class="p">):</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">variables</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">collect_variables</span><span class="p">(</span><span class="n">variable</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="k">return</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="n">arg</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span>
<span class="c1"># post-order
</span> <span class="n">variables</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">root</span><span class="p">)</span>
<span class="n">sampled_values</span> <span class="o">=</span> <span class="p">{}</span>
</code></pre></div></div>
<p>Again, we either use <code class="language-plaintext highlighter-rouge">float</code>s as they are, or fetch sampled children from the
<code class="language-plaintext highlighter-rouge">sampled_values</code> dictionary by their name.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">for</span> <span class="n">variable</span> <span class="ow">in</span> <span class="n">variables</span><span class="p">:</span>
<span class="n">dist_params</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">dist_arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sampled_values</span><span class="p">[</span><span class="n">dist_arg</span><span class="p">.</span><span class="n">name</span><span class="p">])</span>
</code></pre></div></div>
<p>And finally the new bit: instead of sampling both latent and observed variables,
we only sample observed values.
Latent variables instead come from the <code class="language-plaintext highlighter-rouge">latent_values</code> dictionary, just like
they did in <code class="language-plaintext highlighter-rouge">evaluate_log_density</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">LatentVariable</span><span class="p">):</span>
<span class="n">sampled_values</span><span class="p">[</span><span class="n">variable</span><span class="p">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">latent_values</span><span class="p">[</span><span class="n">variable</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">ObservedVariable</span><span class="p">):</span>
<span class="n">sampled_values</span><span class="p">[</span><span class="n">variable</span><span class="p">.</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_class</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span>
<span class="n">dist_params</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">sampled_values</span><span class="p">[</span><span class="n">root</span><span class="p">.</span><span class="n">name</span><span class="p">]</span>
</code></pre></div></div>
<p>Let’s see an example:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"x"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">5.0</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">])</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">x</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>
<span class="n">posterior_sample</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="p">{</span><span class="s">"x"</span><span class="p">:</span> <span class="o">-</span><span class="mi">2</span><span class="p">})</span>
<span class="c1"># => 3.19
</span><span class="n">posterior_sample</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="p">{</span><span class="s">"x"</span><span class="p">:</span> <span class="o">-</span><span class="mi">2</span><span class="p">})</span>
<span class="c1"># => -2.00
</span></code></pre></div></div>
<h2 id="conclusion">Conclusion</h2>
<p>We continue our work on a small Probabilistic Programming Language, introducing
and motivating the need for sampling.
After analyzing pre-order and post-order DAG traversals, we discover that we
need the latter to respect the DAG’s evaluation order.
Finally, we to implement an API for prior and posterior sampling, comparing its
implementation with the one for log density evaluation.</p>
<h3 id="bonus-more-on-dag-traversals">Bonus: more on DAG traversals</h3>
<p>For those who read the
<a href="https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lines-of-python.html">previous post</a>,
I want to point out that likelihood evaluation does not have this dependency
structure.
The likelihood of each variable can be computed independently
(<a href="https://www.multibugs.org/">even in parallel!</a>)
because we know all the children’s values, either from <code class="language-plaintext highlighter-rouge">latent_values</code> or from
<code class="language-plaintext highlighter-rouge">variable.observed</code>.
Since the order does not matter, I decided to use DFS because of its simplicity.</p>
<p>For people familiar with DAG traversals, post-order traversal on our DAG is
equivalent to performing a
<a href="https://en.wikipedia.org/wiki/Topological_sorting"><em>topological sorting</em></a>
of the transposed DAG.
A topological ordering is the reversed post-ordering of a DAG, while in our
implementation we are doing a post-ordering on the transposed DAG, without
reversing at the end.
Perhaps surprisingly these two actions are equivalent:<br />
<code class="language-plaintext highlighter-rouge">reverse-list ∘ post-order-traversal ≡ post-order-traversal ∘ transpose-DAG</code>.
Unsurprisingly, I was not the first to discover this. Check out these
stackoverflow questions:</p>
<ol>
<li><a href="https://cs.stackexchange.com/questions/124725/is-topological-sort-of-an-original-graph-same-as-post-ordering-dfs-of-its-transp">Is topological sort of an original graph same as post-ordering dfs of its transpose graph</a>,</li>
<li><a href="https://stackoverflow.com/questions/61419786/is-topological-sort-of-an-original-graph-same-as-dfs-of-the-transpose-graph">Is topological sort of an original graph same as dfs of the transpose graph</a></li>
</ol>
<p>Also, for applications of DAG traversals, check out
<a href="https://eli.thegreenplace.net/2015/directed-graph-traversal-orderings-and-applications-to-data-flow-analysis/">Eli’s blog</a>.</p>A probabilistic programming language in 70 lines of Python2022-01-12T00:00:00+00:002022-01-12T00:00:00+00:00https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lines-of-python<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/katex.min.css" integrity="sha384-R4558gYOUz8mP9YWpZJjofhk+zx0AS11p36HnD2ZKj/6JR5z27gSSULCNHIRReVs" crossorigin="anonymous" />
<!-- The loading of KaTeX is deferred to speed up page rendering -->
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/katex.min.js" integrity="sha384-z1fJDqw8ZApjGO3/unPWUPsIymfsJmyrDVWC8Tv/a1HeOtGmkwNd/7xUS0Xcnvsx" crossorigin="anonymous"></script>
<!-- To automatically render math in text elements, include the auto-render extension: -->
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.15.1/dist/contrib/auto-render.min.js" integrity="sha384-+XBljXPPiv+OzfbB3cVmLHf4hdUFHlWNZN5spNQ7rmHTXpd7WvJum6fIACpNNfIR" crossorigin="anonymous" onload="renderMathInElement(document.body);"></script>
<blockquote>
<p>Full code available at <a href="https://github.com/mrandri19/smolppl">github.com/mrandri19/smolppl</a></p>
</blockquote>
<blockquote>
<p>The continuation to this post, called
<a href="https://mrandri19.github.io/2022/01/24/predictive-sampling-and-graph-traversals.html">“Predictive sampling and graph traversals”</a>
is now available!</p>
</blockquote>
<h2 id="introduction">Introduction</h2>
<p>In this post I will explain how Probabilistic Programming Languages
(PPLs) work by showing step-by-step how to build a simple one in Python.</p>
<p>I expect the reader to be moderately familiar with PPLs and Bayesian
statistics, as well as having a basic understanding of Python.
They could be, for example, statisticians/AI researchers/or curious programmers.</p>
<p>At the end, we will have built an API like this one:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mu</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"mu"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>
<span class="n">y_bar</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y_bar"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">mu</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">3.0</span><span class="p">)</span>
<span class="n">evaluate_log_density</span><span class="p">(</span><span class="n">y_bar</span><span class="p">,</span> <span class="p">{</span><span class="s">"mu"</span><span class="p">:</span> <span class="mf">4.0</span><span class="p">})</span>
</code></pre></div></div>
<p>This first two lines define the probability model</p>
\[\mu \sim \text{Normal}(0, 5)\]
\[\bar y \sim \text{Normal}(\mu, 1)\]
<p>and the last line evaluates, at \(\mu = 4\), the (unnormalized) probability
distribution defined by the model, conditioned on the data \(\bar y = 3\).</p>
\[\log p(\mu = 4 | \bar y = 3)\]
<p>My hope is to give the reader an understanding of how PPLs work behind the
scenes as well as an understanding of Embedded Domain-Specific Languages (EDSLs)
implementation in Python.</p>
<h2 id="related-work">Related work</h2>
<p>As far as I know, there are no simple, didactic implementations of PPLs in
Python.</p>
<p>The book
<a href="http://dippl.org/">“The Design and Implementation of Probabilistic Programming Languages”</a>
is focused on programming language theory, requiring familiarity with
continuation-passing style and coroutines, as well as using JavaScript as their
implementation language.
The blog post
<a href="https://www.georgeho.org/prob-prog-frameworks/">“Anatomy of a Probabilistic Programming Framework”</a>
contains an great high-level overview, but does not delve into implementation
details or shows code samples.
Finally, Junpeng Lao’s
<a href="https://www.youtube.com/watch?v=WHoS1ETYFrw&feature=youtu.be">talk</a>
and
<a href="https://docs.pymc.io/en/v3/developer_guide.html">PyMC3’s Developer guide</a>
describe in detail the implementation details of PyMC, but it is not
straightforward to implement a PPL just based on those.</p>
<p>Update: another great overview is <a href="https://bayesiancomputationbook.com/markdown/chp_10.html">chapter 10 of Bayesian Modeling and
Computation in Python</a>.</p>
<h2 id="implementation">Implementation</h2>
<h3 id="high-level-representation">High-level representation</h3>
<p>We will use this model throughout the process as our guiding example.</p>
\[\mu \sim \text{Normal}(0, 5)\]
\[\bar y \sim \text{Normal}(\mu, 1)\]
<p>These expressions define a joint probability distribution with an associated
Probability Density Function (PDF):</p>
\[p(\mu, \bar y) = \text{Normal}(\mu | 0, 5) \text{Normal}(\bar y | \mu, 1)\]
<p>We can represent this expression (and the model) graphically in two ways:
graphical models and directed factor graphs.</p>
<figure>
<div style="display: flex;flex-direction: row;flex-wrap: nowrap;align-content: flex-start;justify-content: space-evenly;align-items: center;">
<img src="/assets/images/a-PPL-in-70-lines-of-python/probabilistic-graphical-model.png" style="width: 3.9rem; min-width: 0;" />
<img src="/assets/images/a-PPL-in-70-lines-of-python/directed-factor-graph.png" style="width: 10rem; min-width: 0;" />
</div>
<figcaption style="text-align: center; margin-top: 1rem;">
Left: model drawn as a probabilistic graphical model (PGM).
Right: model drawn as a directed factor graph (LFG).
</figcaption>
</figure>
<p>While PGMs are more common in the literature, I believe that directed factor
graphs are more useful for a PPL implementer.
The graph tells us several aspects of our representation:</p>
<ul>
<li>We need a way to represent two types of variables:
<ul>
<li>ones of which we know the observed value (\(\bar y\), gray background)</li>
<li>and ones which are latent and cannot be observed (\(\mu\), white background).</li>
</ul>
</li>
<li>We need to handle constants and the distribution of each variable.</li>
<li>Finally, we need a way to connect together observed variables, latent variables,
and constants.</li>
</ul>
<h3 id="distributions">Distributions</h3>
<p>For our purposes, a distribution is class with a function that can evaluate its
log probability density function at a point.
The <code class="language-plaintext highlighter-rouge">log_density</code> function takes a <code class="language-plaintext highlighter-rouge">float</code> representing a point in the
distribution’s support, a <code class="language-plaintext highlighter-rouge">List[float]</code> of the distribution’s parameters,
and returns a <code class="language-plaintext highlighter-rouge">float</code> equal to the log-PDF evaluated at the point.
To implement new distributions we will inherit from the <code class="language-plaintext highlighter-rouge">Distribution</code> abstract
class.
We will not support vector or matrix-valued distributions for now.</p>
<p>Using SciPy we implement the <code class="language-plaintext highlighter-rouge">Normal</code> distribution, with <code class="language-plaintext highlighter-rouge">param[0]</code> being the
mean and <code class="language-plaintext highlighter-rouge">param[1]</code> the standard deviation.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">norm</span>
<span class="k">class</span> <span class="nc">Distribution</span><span class="p">:</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">log_density</span><span class="p">(</span><span class="n">point</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">NotImplementedError</span><span class="p">(</span><span class="s">"Must be implemented by a subclass"</span><span class="p">)</span>
<span class="k">class</span> <span class="nc">Normal</span><span class="p">(</span><span class="n">Distribution</span><span class="p">):</span>
<span class="o">@</span><span class="nb">staticmethod</span>
<span class="k">def</span> <span class="nf">log_density</span><span class="p">(</span><span class="n">point</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">norm</span><span class="p">.</span><span class="n">logpdf</span><span class="p">(</span><span class="n">point</span><span class="p">,</span> <span class="n">params</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
</code></pre></div></div>
<h3 id="variables-and-dags">Variables and DAGs</h3>
<p>Let us now focus our attention on variables.
Three aspects characterize them: they have an associated distribution,
they can be latent or observed, and they are linked to one another (i.e they
can have children).</p>
<p>The <code class="language-plaintext highlighter-rouge">dist_class</code> field is a <code class="language-plaintext highlighter-rouge">Distribution</code> associated with the variable.
When evaluating the full log density, we will use this field to access the
<code class="language-plaintext highlighter-rouge">log_density</code> method of the variable’s distribution.</p>
<p>We differentiate latent from observed variables using the classes
<code class="language-plaintext highlighter-rouge">LatentVariable</code> and <code class="language-plaintext highlighter-rouge">ObservedVariable</code>.
Observed variables have an <code class="language-plaintext highlighter-rouge">observed</code> field with the observed value.
Since latent variables do not have a value at model-specification time, we will
have to give them a value at runtime, while evaluating the full log density.
To specify the runtime value of latent variables we use need to identify them
with a unique string <code class="language-plaintext highlighter-rouge">name</code>.</p>
<p>Finally, we can make the parameters of a variable’s distribution be variables or
constants.
In our example, the mean of \(\bar y\) is \(\mu\) a Normal random variable,
while its standard deviation is the constant \(1\).
To represent this we use the <code class="language-plaintext highlighter-rouge">dist_args</code> property.
The <a href="https://mypy.readthedocs.io/">mypy</a> signature of <code class="language-plaintext highlighter-rouge">dist_args</code> is
<code class="language-plaintext highlighter-rouge">dist_args: Union[float, LatentVariable, ObservedVariable]</code>.
This means that a latent/observed variable can have “arguments” which themselves
are latent/observed variables of constants, thus creating a
<a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph (DAG)</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">LatentVariable</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist_class</span><span class="p">,</span> <span class="n">dist_args</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="p">.</span><span class="n">dist_class</span> <span class="o">=</span> <span class="n">dist_class</span>
<span class="bp">self</span><span class="p">.</span><span class="n">dist_args</span> <span class="o">=</span> <span class="n">dist_args</span>
<span class="k">class</span> <span class="nc">ObservedVariable</span><span class="p">:</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">dist_class</span><span class="p">,</span> <span class="n">dist_args</span><span class="p">,</span> <span class="n">observed</span><span class="p">):</span>
<span class="bp">self</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="n">name</span>
<span class="bp">self</span><span class="p">.</span><span class="n">dist_class</span> <span class="o">=</span> <span class="n">dist_class</span>
<span class="bp">self</span><span class="p">.</span><span class="n">dist_args</span> <span class="o">=</span> <span class="n">dist_args</span>
<span class="bp">self</span><span class="p">.</span><span class="n">observed</span> <span class="o">=</span> <span class="n">observed</span>
</code></pre></div></div>
<p>We can visualize the DAG and notice a key difference from the latent factor
graph representations: the arrows are reversed.
This is a consequence of how we specify the variables in our modeling API, and
it turns out that having the observed variable as the root is also a better
representation for computing the joint log density.</p>
<figure>
<div style="display: flex;flex-direction: row;flex-wrap: nowrap;align-content: flex-start;justify-content: space-evenly;align-items: center;">
<img src="/assets/images/a-PPL-in-70-lines-of-python/directed-factor-graph.png" style="width: 10rem; min-width: 0;" />
<img src="/assets/images/a-PPL-in-70-lines-of-python/DAG.svg" style="width: 16rem; min-width: 0;" />
</div>
<figcaption style="text-align: center; margin-top: 1rem;">
Left: model drawn as a directed factor graph.
Right: how the DAG is represented in-memory.
</figcaption>
</figure>
<p>To further clarify, let’s see what the <code class="language-plaintext highlighter-rouge">dist_args</code> for our model look like:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mu</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"mu"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>
<span class="n">y_bar</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y_bar"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">mu</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">5.0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">mu</span><span class="p">)</span>
<span class="c1"># => <__main__.LatentVariable object at 0x7f14f96719a0>
</span><span class="k">print</span><span class="p">(</span><span class="n">mu</span><span class="p">.</span><span class="n">dist_args</span><span class="p">)</span>
<span class="c1"># => [0.0, 5.0]
</span><span class="k">print</span><span class="p">(</span><span class="n">y_bar</span><span class="p">)</span>
<span class="c1"># => <__main__.ObservedVariable object at 0x7f14f9671940>
</span><span class="k">print</span><span class="p">(</span><span class="n">y_bar</span><span class="p">.</span><span class="n">dist_args</span><span class="p">)</span>
<span class="c1"># => [<__main__.LatentVariable object at 0x7f14f96719a0>, 1.0]
</span></code></pre></div></div>
<h3 id="evaluating-the-log-density">Evaluating the log density</h3>
<p>We are almost done, the missing piece is a way to evaluate the joint log-density
using our DAG.
To do it we need to traverse the DAG, and add together the log-densities of each
variable.
Adding log densities is equal to multiplying the densities, but it is
a lot more numerically stable.</p>
<p>To traverse the DAG we use a recursive algorithm called
<a href="https://en.wikipedia.org/wiki/Depth-first_search">depth-first search</a>.
The <code class="language-plaintext highlighter-rouge">collect_variables</code> function visits all variables once, collecting all
non-<code class="language-plaintext highlighter-rouge">float</code> variables into a list.
The algorithm starts from the root, and then recursively visits all <code class="language-plaintext highlighter-rouge">dist_args</code>
to collect each variable.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">evaluate_log_density</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">latent_values</span><span class="p">):</span>
<span class="n">visited</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
<span class="n">variables</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">def</span> <span class="nf">collect_variables</span><span class="p">(</span><span class="n">variable</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="k">return</span>
<span class="n">visited</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="n">variables</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
<span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="n">arg</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">visited</span><span class="p">:</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">arg</span><span class="p">)</span>
<span class="n">collect_variables</span><span class="p">(</span><span class="n">variable</span><span class="p">)</span>
</code></pre></div></div>
<p>For each variable we need to obtain a numeric value for each one of its
arguments, and using them evaluate the distribution’s log density.
<code class="language-plaintext highlighter-rouge">float</code> arguments are already numbers, <code class="language-plaintext highlighter-rouge">LatentVariables</code> take different values
depending on where we wish to evaluate the log density.
To specify the values of the latent variables we pass a dictionary of variable
name to numbers, called <code class="language-plaintext highlighter-rouge">latent_values</code>.
Notice how <code class="language-plaintext highlighter-rouge">ObservedVariable</code>s cannot be arguments, they can only be roots.</p>
<blockquote>
<p>N.B.</p>
<p><code class="language-plaintext highlighter-rouge">dist_args</code> can be <code class="language-plaintext highlighter-rouge">float</code> constants or <code class="language-plaintext highlighter-rouge">LatentVariables</code>.</p>
<p><code class="language-plaintext highlighter-rouge">dist_params</code> are all <code class="language-plaintext highlighter-rouge">float</code>s, either constants or values we assigned to the
latent variables via <code class="language-plaintext highlighter-rouge">latent_values</code> at runtime (i.e. when we actually compute
the log density).</p>
</blockquote>
<p>Finally, with the distribution’s parameters extracted from the arguments, we
can update the total log density.
<code class="language-plaintext highlighter-rouge">LatentVariable</code>s need to evaluate the log density at the point specified in
<code class="language-plaintext highlighter-rouge">latent_values</code> while <code class="language-plaintext highlighter-rouge">ObservedValues</code> evaluate the log density at the point
specified in <code class="language-plaintext highlighter-rouge">observed</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="n">log_density</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="k">for</span> <span class="n">variable</span> <span class="ow">in</span> <span class="n">variables</span><span class="p">:</span>
<span class="n">dist_params</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">dist_arg</span> <span class="ow">in</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_args</span><span class="p">:</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">,</span> <span class="nb">float</span><span class="p">):</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">)</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">dist_arg</span><span class="p">,</span> <span class="n">LatentVariable</span><span class="p">):</span>
<span class="n">dist_params</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">latent_values</span><span class="p">[</span><span class="n">dist_arg</span><span class="p">.</span><span class="n">name</span><span class="p">])</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">LatentVariable</span><span class="p">):</span>
<span class="n">log_density</span> <span class="o">+=</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_class</span><span class="p">.</span><span class="n">log_density</span><span class="p">(</span>
<span class="n">latent_values</span><span class="p">[</span><span class="n">variable</span><span class="p">.</span><span class="n">name</span><span class="p">],</span> <span class="n">dist_params</span>
<span class="p">)</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">variable</span><span class="p">,</span> <span class="n">ObservedVariable</span><span class="p">):</span>
<span class="n">log_density</span> <span class="o">+=</span> <span class="n">variable</span><span class="p">.</span><span class="n">dist_class</span><span class="p">.</span><span class="n">log_density</span><span class="p">(</span>
<span class="n">variable</span><span class="p">.</span><span class="n">observed</span><span class="p">,</span> <span class="n">dist_params</span>
<span class="p">)</span>
<span class="k">return</span> <span class="n">log_density</span>
</code></pre></div></div>
<p>Let’s check that the total log probability is equal to what we expect</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mu</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"mu"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>
<span class="n">y_bar</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y_bar"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">mu</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">5.0</span><span class="p">)</span>
<span class="n">latent_values</span> <span class="o">=</span> <span class="p">{</span><span class="s">"mu"</span><span class="p">:</span> <span class="mf">4.0</span><span class="p">}</span>
<span class="k">print</span><span class="p">(</span><span class="n">evaluate_log_density</span><span class="p">(</span><span class="n">y_bar</span><span class="p">,</span> <span class="n">latent_values</span><span class="p">))</span>
<span class="c1"># => -4.267314978843446
</span><span class="k">print</span><span class="p">(</span><span class="n">norm</span><span class="p">.</span><span class="n">logpdf</span><span class="p">(</span><span class="n">latent_values</span><span class="p">[</span><span class="s">"mu"</span><span class="p">],</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">)</span>
<span class="o">+</span> <span class="n">norm</span><span class="p">.</span><span class="n">logpdf</span><span class="p">(</span><span class="mf">5.0</span><span class="p">,</span> <span class="n">latent_values</span><span class="p">[</span><span class="s">"mu"</span><span class="p">],</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="c1"># => -4.267314978843446
</span></code></pre></div></div>
<h2 id="conclusion-and-future-work">Conclusion and future work</h2>
<p>Distributions, variable DAGs, and log density evaluation are the components of a
probabilistic programming language.
The variables can be latent, observed, or constants and each one must be handled
separately in the log density calculation.
We implement these concepts in Python leading to a simple but powerful PPL.</p>
<p>The next steps would be to add support for tensors and transformations of random
variables, in order to support more useful models like linear regression and
hierarchical/mixed effects models.
Another useful feature would be to build an API for prior predictive sampling,
Finally, instead of doing the calculations in python, using a compute graph
framework like theano/aesara, JAX, or TensorFlow would be greatly beneficial to
the performance. A computation graph would also allow to calculate the gradient
of the log density via reverse-mode automatic differentiation which is needed
for advanced samplers like Hamiltonian Monte Carlo.</p>
<h2 id="bonus-posterior-grid-approximation">Bonus: posterior grid approximation</h2>
<p>We have not talked about what the log density is useful for.
One example would be to find the mode of the posterior distribution, i.e.
finding the most likely value for our parameters.</p>
<p>In this case the observed sample mean is \(1.5\), which will be moved a little
towards \(0\) by the Normal zero-mean prior. This means that the Maximum A
Posteriori (MAP) estimate will be around \(1.4\).</p>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega@5"></script>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-lite@4.17.0"></script>
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm//vega-embed@6"></script>
<div style="
display: flex;
flex-direction: row;
flex-wrap: nowrap;
align-content: center;
justify-content: center;
">
<div id="vis"></div>
</div>
<script>
(function(vegaEmbed) {
var spec = {"config": {"view": {"continuousWidth": 240, "continuousHeight": 180}, "axis": {"labelFontSize": 16, "titleFontSize": 16}}, "data": {"name": "data-2a00cc4b505d4c82994537b3cc056b7e"}, "mark": {"type": "line", "point": true}, "encoding": {"x": {"axis": {"title": "mu"}, "field": "grid", "type": "quantitative"}, "y": {"axis": {"title": "log density"}, "field": "evaluations", "type": "quantitative"}}, "selection": {"selector005": {"type": "interval", "bind": "scales", "encodings": ["x", "y"]}}, "$schema": "https://vega.github.io/schema/vega-lite/v4.17.0.json", "datasets": {"data-2a00cc4b505d4c82994537b3cc056b7e": [{"grid": -4.0, "evaluations": -18.892314978843448}, {"grid": -3.5789473684210527, "evaluations": -16.601345449757574}, {"grid": -3.1578947368421053, "evaluations": -14.494752651973638}, {"grid": -2.736842105263158, "evaluations": -12.572536585491644}, {"grid": -2.3157894736842106, "evaluations": -10.834697250311589}, {"grid": -1.8947368421052633, "evaluations": -9.281234646433475}, {"grid": -1.473684210526316, "evaluations": -7.912148773857297}, {"grid": -1.0526315789473686, "evaluations": -6.727439632583058}, {"grid": -0.6315789473684212, "evaluations": -5.727107222610759}, {"grid": -0.2105263157894739, "evaluations": -4.911151543940399}, {"grid": 0.21052631578947345, "evaluations": -4.2795725965719775}, {"grid": 0.6315789473684212, "evaluations": -3.8323703805054956}, {"grid": 1.0526315789473681, "evaluations": -3.5695448957409526}, {"grid": 1.473684210526315, "evaluations": -3.491096142278349}, {"grid": 1.8947368421052628, "evaluations": -3.5970241201176836}, {"grid": 2.3157894736842106, "evaluations": -3.887328829258958}, {"grid": 2.7368421052631575, "evaluations": -4.362010269702171}, {"grid": 3.1578947368421044, "evaluations": -5.021068441447322}, {"grid": 3.578947368421052, "evaluations": -5.864503344494414}, {"grid": 4.0, "evaluations": -6.892314978843445}]}};var embedOpt = {"mode": "vega-lite"};
function showError(el, error){
el.innerHTML = ('<div class="error" style="color:red;">'
+ '<p>JavaScript Error: ' + error.message + '</p>'
+ "<p>This usually means there's a typo in your chart specification. "
+ "See the javascript console for the full traceback.</p>"
+ '</div>');
throw error;
}
const el = document.getElementById('vis');
vegaEmbed("#vis", spec, embedOpt)
.catch(error => showError(el, error));
})(vegaEmbed);
</script>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">altair</span> <span class="k">as</span> <span class="n">alt</span>
<span class="kn">from</span> <span class="nn">smolppl</span> <span class="kn">import</span> <span class="n">Normal</span><span class="p">,</span> <span class="n">LatentVariable</span><span class="p">,</span> <span class="n">ObservedVariable</span><span class="p">,</span>
<span class="n">evaluate_log_density</span>
<span class="c1"># Define model
# Weakly informative mean prior
</span><span class="n">mu</span> <span class="o">=</span> <span class="n">LatentVariable</span><span class="p">(</span><span class="s">"mu"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">])</span>
<span class="c1"># Observation model. I make some observations y_1, y_2, ..., y_n and compute the
# sample mean y_bar. It is given that the sample mean has standard deviation 1.
</span><span class="n">y_bar</span> <span class="o">=</span> <span class="n">ObservedVariable</span><span class="p">(</span><span class="s">"y_bar"</span><span class="p">,</span> <span class="n">Normal</span><span class="p">,</span> <span class="p">[</span><span class="n">mu</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span> <span class="n">observed</span><span class="o">=</span><span class="mf">1.5</span><span class="p">)</span>
<span class="c1"># Grid approximation for the posterior
# Since the prior has mean 0, and the observations have some uncertainty, I
# expect the mode to be a bit smaller than 1.5. Something like 1.4
</span><span class="n">grid</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">20</span><span class="p">)</span>
<span class="n">evaluations</span> <span class="o">=</span> <span class="p">[</span><span class="n">evaluate_log_density</span><span class="p">(</span><span class="n">y_bar</span><span class="p">,</span> <span class="p">{</span><span class="s">"mu"</span><span class="p">:</span> <span class="n">mu</span><span class="p">})</span> <span class="k">for</span> <span class="n">mu</span> <span class="ow">in</span> <span class="n">grid</span><span class="p">]</span>
<span class="c1"># Plotting
</span><span class="n">data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"grid"</span><span class="p">:</span> <span class="n">grid</span><span class="p">,</span> <span class="s">"evaluations"</span><span class="p">:</span> <span class="n">evaluations</span><span class="p">})</span>
<span class="n">chart</span> <span class="o">=</span> <span class="n">alt</span><span class="p">.</span><span class="n">Chart</span><span class="p">(</span><span class="n">data</span><span class="p">).</span><span class="n">mark_line</span><span class="p">(</span><span class="n">point</span><span class="o">=</span><span class="bp">True</span><span class="p">).</span><span class="n">encode</span><span class="p">(</span>
<span class="n">x</span><span class="o">=</span><span class="n">alt</span><span class="p">.</span><span class="n">X</span><span class="p">(</span><span class="s">'grid'</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="n">alt</span><span class="p">.</span><span class="n">Axis</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">"mu"</span><span class="p">)),</span>
<span class="n">y</span><span class="o">=</span><span class="n">alt</span><span class="p">.</span><span class="n">Y</span><span class="p">(</span><span class="s">'evaluations'</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="n">alt</span><span class="p">.</span><span class="n">Axis</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">"log density"</span><span class="p">))</span>
<span class="p">).</span><span class="n">interactive</span><span class="p">().</span><span class="n">configure_axis</span><span class="p">(</span>
<span class="n">labelFontSize</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span>
<span class="n">titleFontSize</span><span class="o">=</span><span class="mi">16</span>
<span class="p">)</span>
<span class="n">chart</span>
</code></pre></div></div>Bayesian linear regression with conjugate priors2020-09-28T00:00:00+00:002020-09-28T00:00:00+00:00https://mrandri19.github.io/2020/09/28/bayesian-linear-regression-with-conjugate-priors<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.css" integrity="sha384-AfEj0r4/OFrOo5t7NnNe46zW/tFgW6x/bCJG8FqQCEo3+Aro6EYUG4+cU+KJWu/X" crossorigin="anonymous" />
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/katex.min.js" integrity="sha384-g7c+Jr9ZivxKLnZTDUhnkOnsh30B4H0rpLUpJ4jAIKs4fnJI+sEnkvrMWph2EDg4" crossorigin="anonymous"></script>
<script defer="" src="https://cdn.jsdelivr.net/npm/katex@0.12.0/dist/contrib/auto-render.min.js" integrity="sha384-mll67QQFJfxn0IYznZYonOWZ644AWYC+Pt2cHqMaRhXVrursRwvLnLaebdGIlYNa" crossorigin="anonymous"></script>
<script>
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.querySelector('article'), {delimiters: [
{left: "\\[", right: "\\[", display: true},
{left: "$", right: "$", display: false},
{left: "\\(", right: "\\)", display: false},
{left: "\\[", right: "\\]", display: true}
]})
});
</script>
<script>
window.WebFontConfig = {
custom: {
families: ['KaTeX_AMS', 'KaTeX_Caligraphic:n4,n7', 'KaTeX_Fraktur:n4,n7',
'KaTeX_Main:n4,n7,i4,i7', 'KaTeX_Math:i4,i7', 'KaTeX_Script',
'KaTeX_SansSerif:n4,n7,i4', 'KaTeX_Size1', 'KaTeX_Size2', 'KaTeX_Size3',
'KaTeX_Size4', 'KaTeX_Typewriter'],
},
};
</script>
<script defer="" src="https://cdn.jsdelivr.net/npm/webfontloader@1.6.28/webfontloader.js" integrity="sha256-4O4pS1SH31ZqrSO2A/2QJTVjTPqVe+jnYgOWUVr7EEc=" crossorigin="anonymous"></script>
<blockquote>
<h3 id="target-readerrequired-knowledge">Target Reader/Required Knowledge</h3>
<p>This post is an introduction to conjugate priors in the context of linear regression. Conjugate priors are a technique from Bayesian statistics/machine learning.</p>
<p>The reader is expected to have some basic knowledge of Bayes’ theorem, basic probability (conditional probability and chain rule), machine learning and a pinch of matrix algebra.</p>
<p>In addition the code will be in the Julia language, but it can be easily translated to Python/R/MATLAB.</p>
</blockquote>
<h2 id="quick-bayesian-refresher">Quick Bayesian refresher</h2>
<p>Ever since the advent of computers, Bayesian methods have become more and more important in the fields of Statistics and Engineering. The main case for using these techniques is to reason about uncertainty of an inference. In particular, we can use prior information about the our model, together with new information coming from the data, to update our beliefs and obtain a better knowledge about the observed phenomenon.</p>
<p>Bayes’ theorem, viewed from a Machine Learning perspective, can be written as:</p>
<p>\[
p(\theta \mid \mathcal{D}) = \frac{p(\mathcal{D}\mid \theta) p(\theta)}{p(\mathcal{D})}
\]</p>
<p>where $\theta$ are the parameters of the model which, we believe, has generated our data $\mathcal{D}$. Thanks to Bayes’ theorem, given our data $\mathcal{D}$, we can learn the distribution of the parameters $\theta$.</p>
<p>Now, let’s examine each term of the first equation:</p>
<ul>
<li>
<p>$p(\theta\mid \mathcal{D})$ is called <strong>posterior</strong>. It represents how much we know about the parameters of the model <em>after</em> seeing the data.</p>
</li>
<li>
<p>$p(\mathcal{D}\mid \theta)$ is called <strong>likelihood</strong>. It represents how <em>likely</em> it is too see the data, had that data been generated by our model using parameters $\theta$.</p>
</li>
<li>
<p>$p(\theta)$ is called <strong>prior</strong>. It represents our beliefs about the parameters <em>before</em> seeing any data.</p>
</li>
<li>
<p>$p(\mathcal{D})$ is called <strong>model evidence</strong> or <strong>marginal likelihood</strong>. It represents the probability of observing our data <em>without any</em> assumption about the parameters of our model. It does not depend on $\theta$ and thus evaluates to just a constant. Because of the fact that is constant and the high cost to compute it, it is generally ignored.</p>
</li>
</ul>
<p>Ignoring the marginal likelihood $p(\mathcal{D})$ we usually write Bayes’ theorem as:</p>
<p>\[
p(\theta \mid \mathcal{D}) \propto p(\mathcal{D}\mid \theta) p(\theta)
\]</p>
<p>The $\propto$ symbol means <strong>proportional to</strong>, i.e. equal except for a normalizing constant.</p>
<h2 id="multivariate-linear-regression">Multivariate linear regression</h2>
<h3 id="motivation">Motivation</h3>
<p>Let’s consider the problem of multivariate linear regression. This means that
we want to find the best set of intercept and slopes to minimize the distance between
our linear model’s previsions and the actual data. But it doesn’t end here, we may be interested
in getting some estimates about the uncertainty of our model, e.g. a confidence interval
for each parameter.</p>
<p>Another feature we might be interest in is supporting <strong>streaming</strong> data. When deploying
our algorithm, we may have only had the opportunity to train it on a small quantity
of data compared to what our user create everyday, and we want our system to react
to new emerging behaviours of the users without retraining.</p>
<h3 id="problem-setting">Problem setting</h3>
<p>Our data $\mathcal{D}=\{X,Y\}$ contains the <strong>predictors</strong> (or design matrix) $X \in \mathbb{R}^{n \times d}$, and the <strong>response</strong> $Y \in \mathbb{R}^{n\times 1}$.</p>
<p>$n$ is the number of observations and $d$ is the number of features.</p>
<p>A single observation is called $x_i \in \mathbb{R}^{n \times 1}, i \in 1,..,n$, and a single response is $y_i \in \mathbb{R}$.</p>
<p>Our model will be $Y = X\beta + \epsilon$ where $\epsilon \sim \mathcal{N}(0,\sigma^2 I)$ is the noise. This can be rewritten as $Y \sim \mathcal{N}(X\beta, \sigma^2 I)$ thus having an $n$-dimensional multivariate Normal distribution.</p>
<h3 id="likelihood">Likelihood</h3>
<p>Let’s write the likelihood for multivariate linear regresssion, i.e. how likely it it to observe the data $\mathcal{D}$, given a certain linear model specified by $\beta$.</p>
<p>\[p(\mathcal{D}\mid \theta) = p((X,Y)\mid \beta) = p(Y=\mathcal{N}(X\beta,\sigma^2I)) = (2\pi\sigma^2)^{-k/2}exp{-\frac{1}{2\sigma^2}(Y-X\beta)^T(Y-X\beta)}\]</p>
<p>The last expression was obtained by substituting the Gaussian PDF with mean $\mu=X\beta$ and covariance matrix $\Sigma=\sigma^2 I$.</p>
<p>Also, since all of the observations $X, Y$ are I.I.D. we can factorize the likelihood as:</p>
<p>\[p(\mathcal{D}\mid \theta) = p((X,Y)\mid \beta) = p(Y=\mathcal{N}(X\beta,\sigma^2I)) = \prod\limits_{i=1}^{n} p(y_i = \mathcal{N}(x_i\beta, \sigma^2))\]</p>
<h4 id="visualization">Visualization</h4>
<p>How can we visualize this distribution? For a single pair $(x_i, y_i)$ (with fixed $\beta$) the multivariate Normal collapses to a probability. Plotting this for a bunch of values of x and y we can see how the points with highest probability are on the line $y=1+2x$, as expected since our parameters are $\beta = {1,2}$. The variance $\sigma^2=1$, which for now we will treat as a known constant, influences how “fuzzy” the resulting plot is.</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Plots</span><span class="x">,</span> <span class="n">Distributions</span><span class="x">,</span> <span class="n">LaTeXStrings</span><span class="x">,</span> <span class="n">LinearAlgebra</span><span class="x">,</span> <span class="n">Printf</span>
</code></pre></div> </div>
</div>
</div>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">β</span> <span class="o">=</span> <span class="x">[</span><span class="mf">1.0</span><span class="x">,</span> <span class="mf">2.0</span><span class="x">]</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2-element Array{Float64,1}:
1.0
2.0
</code></pre></div> </div>
</div>
</div>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">heatmap</span><span class="x">(</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span><span class="o">-></span><span class="n">pdf</span><span class="x">(</span><span class="n">Normal</span><span class="x">([</span><span class="mi">1</span> <span class="n">x</span><span class="x">]</span><span class="n">⋅β</span><span class="x">,</span> <span class="mi">1</span><span class="x">),</span><span class="n">y</span><span class="x">),</span>
<span class="n">xlabel</span><span class="o">=</span><span class="s">L"x"</span><span class="x">,</span>
<span class="n">ylabel</span><span class="o">=</span><span class="s">L"y"</span><span class="x">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">L"p\;(\mathcal{D}=(x,y)\mid</span><span class="se">\b</span><span class="s">eta=\{1,2\})"</span>
<span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<p style="text-align: center;"><img src="/assets/images/bayesian-linear-regression-with-conjugate-priors/Blog Post Test_files/output_13_0.svg" alt="svg" /></p>
</div>
</div>
<p>In alternative, we can also plot how likely is each combination of weights given a certain point $(x_i, y_i)$. Notice how, for a single point, many combinations of angular coefficient $\beta_1$ and intercept $\beta_0$ are possible. Also notice how these combinations are distributed on a line, if you increase the intercept, the angular coefficient has to go down.</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">heatmap</span><span class="x">(</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="x">(</span><span class="n">β0</span><span class="x">,</span><span class="n">β1</span><span class="x">)</span><span class="o">-></span><span class="n">pdf</span><span class="x">(</span><span class="n">Normal</span><span class="x">([</span><span class="mi">1</span> <span class="mi">1</span><span class="x">]</span><span class="n">⋅</span><span class="x">[</span><span class="n">β0</span><span class="x">,</span> <span class="n">β1</span><span class="x">],</span> <span class="mi">1</span><span class="x">),</span><span class="mi">2</span><span class="x">),</span>
<span class="n">xlabel</span><span class="o">=</span><span class="s">L"</span><span class="se">\b</span><span class="s">eta_0"</span><span class="x">,</span>
<span class="n">ylabel</span><span class="o">=</span><span class="s">L"</span><span class="se">\b</span><span class="s">eta_1"</span><span class="x">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">L"p\;(\mathcal{D}=(x=1,y=2)\mid</span><span class="se">\b</span><span class="s">eta=\{</span><span class="se">\b</span><span class="s">eta_0,</span><span class="se">\b</span><span class="s">eta_1\})"</span>
<span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<p style="text-align: center;"><img src="/assets/images/bayesian-linear-regression-with-conjugate-priors/Blog Post Test_files/output_15_0.svg" alt="svg" /></p>
</div>
</div>
<h3 id="conjugate-priors">Conjugate priors</h3>
<p>Now comes the question of what our prior should look like and how to combine it with the likelihood to obtain a posterior.
We could just use an uniform prior as we have no idea of how our $\beta$ are distributed. This is what Vincent D. Warmerdam does in his <a href="https://koaning.io/posts/bayesian-propto-streaming/">excellent post</a> on this topic.</p>
<p>Another option is to use what is called <strong>conjugate prior</strong>, that is, a specially chosen prior distribution such that, when multiplied with the likelihood, the resulting posterior distribution belongs to the same family of the prior. Why would we want to do so? The main reason here is <em>speed</em>. Since we know the analytic expression for our posterior, almost no calculations need to be performed, it’s just a matter of calculating the new distribution’s parameters. This is a breath of fresh air considering the high cost of Markov Chain Monte Carlo methods usually used to calculate these posteriors. This speed allows us to consider using bayesian methods in high-throughput streaming contexts.</p>
<p>How do we find these pairs of likelihood and priors? The usual approach is to look at likelihood’s algebraic equation and come up with a distribution PDF similar enough so that the posterior is in the same family. We don’t need to do all of this work, we can just look on <a href="https://en.wikipedia.org/wiki/Conjugate_prior">Wikipedia</a> or <a href="https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf">other</a> <a href="http://www.biostat.umn.edu/~ph7440/pubh7440/BayesianLinearModelGoryDetails.pdf">sources</a>.</p>
<h4 id="multivariate-normal-prior">Multivariate Normal prior</h4>
<p>For a Normal likelihood with known variance, the conjugate prior is another Normal distribution with parameters $\mu_\beta$ and $\Sigma_\beta$. The parameter $\mu_\beta$ describes the initial values for $\beta$ and $\Sigma_\beta$ describes how uncertain we are of these values.</p>
<p>\[p(\theta) = p(\beta) = \mathcal{N}(\mu_\beta, \Sigma_\beta)\]</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">heatmap</span><span class="x">(</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="o">-</span><span class="mi">3</span><span class="o">:</span><span class="mf">2e-2</span><span class="o">:</span><span class="mi">3</span><span class="x">,</span>
<span class="x">(</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">)</span><span class="o">-></span><span class="n">pdf</span><span class="x">(</span><span class="n">MvNormal</span><span class="x">([</span><span class="mf">0.5</span><span class="x">,</span> <span class="mf">1.5</span><span class="x">],</span> <span class="n">I</span><span class="x">(</span><span class="mi">2</span><span class="x">)),[</span><span class="n">x</span><span class="x">,</span><span class="n">y</span><span class="x">]),</span>
<span class="n">xlabel</span><span class="o">=</span><span class="s">L"</span><span class="se">\b</span><span class="s">eta_0"</span><span class="x">,</span>
<span class="n">ylabel</span><span class="o">=</span><span class="s">L"</span><span class="se">\b</span><span class="s">eta_0"</span><span class="x">,</span>
<span class="n">title</span><span class="o">=</span><span class="s">L"p\;(</span><span class="se">\b</span><span class="s">eta\mid\mu_</span><span class="se">\b</span><span class="s">eta=\{0.5, 1.5\}, \Sigma_</span><span class="se">\b</span><span class="s">eta=I\;)"</span>
<span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<p style="text-align: center;"><img src="/assets/images/bayesian-linear-regression-with-conjugate-priors/Blog Post Test_files/output_22_0.svg" alt="svg" /></p>
</div>
</div>
<p>Using this prior, the formula for our posterior now looks like this:</p>
<p>\[p(\beta \mid (X,Y)) \propto p((X,Y)\mid \beta) p(\beta)\]</p>
<p>\[p(\beta \mid (X,Y)) = \mathcal{N}(X\beta,\sigma^2) \times \mathcal{N}(\mu_\beta,\Sigma_\beta) = \mathcal{N}(\mu_\beta^{new},\Sigma_\beta^{{new}})\]</p>
<p>The posterior only depends on $\mu_\beta^{new}$ and $\Sigma_{\beta}^{new}$ which can be calculated using the prior and the newly observed data. Recall that $\sigma^2$ is the variance of the data model’s noise.</p>
<p>\[
\Sigma_\beta^{new} = (\Sigma_\beta^{-1} + X^TX)^{-1} \sigma^2
\]</p>
<p>\[
\mu_\beta^{new} = (\Sigma_\beta^{-1} + X^TX)^{-1} (\Sigma_\beta^{-1}\mu_\beta + X^TY)
\]</p>
<p>Since matrix inversions and multiplications have cubic time complexity, each update will cost us $O(d^3)$ where $d$ is the number of features.</p>
<h3 id="julia-implementation">Julia implementation</h3>
<p>We can now proceed to the implementation. I chose the Julia language because of its excellent speed and scientific libraries. Also I like shiny things and Julia is much newer than Python/R/MATLAB.</p>
<p>First, we generate the data which we will use to verify the implementation of the algorithm. Notice how we save the variance $\sigma^2$, which we will treat as a known constant and use when updating our prior. There are ways to estimate it from the data, i.e. using a Normal-Inverse-Chi-Squared prior, which we will examine in a future blog post.</p>
<h4 id="training-data">Training data</h4>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n</span> <span class="o">=</span> <span class="mi">100</span>
<span class="n">X</span> <span class="o">=</span> <span class="x">[</span><span class="n">ones</span><span class="x">(</span><span class="n">n</span><span class="x">)</span> <span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="n">β</span> <span class="o">=</span> <span class="x">[</span><span class="o">-</span><span class="mf">13.0</span><span class="x">,</span> <span class="mf">42.0</span><span class="x">]</span>
<span class="n">σ²</span> <span class="o">=</span> <span class="mi">250</span><span class="o">^</span><span class="mi">2</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">X</span><span class="o">*</span><span class="n">β</span> <span class="o">+</span> <span class="n">sqrt</span><span class="x">(</span><span class="n">σ²</span><span class="x">)</span><span class="o">*</span><span class="n">randn</span><span class="x">(</span><span class="n">n</span><span class="x">)</span>
<span class="n">scatter</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">,</span> <span class="n">Y</span><span class="x">,</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">"X"</span><span class="x">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"Y"</span><span class="x">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Training data"</span><span class="x">,</span> <span class="n">legend</span><span class="o">=</span><span class="nb">false</span><span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<p style="text-align: center;"><img src="/assets/images/bayesian-linear-regression-with-conjugate-priors/Blog Post Test_files/output_30_0.svg" alt="svg" /></p>
</div>
</div>
<h4 id="prior">Prior</h4>
<p>First of all, using <code class="language-plaintext highlighter-rouge">MvNormal</code> from the <code class="language-plaintext highlighter-rouge">Distributions</code> package, let’s define our prior.</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">μᵦ</span> <span class="o">=</span> <span class="x">[</span><span class="mf">0.0</span><span class="x">,</span> <span class="mf">0.0</span><span class="x">]</span>
<span class="n">Σᵦ</span> <span class="o">=</span> <span class="x">[</span><span class="mf">1.0</span> <span class="mf">0.0</span>
<span class="mf">0.0</span> <span class="mf">1.0</span><span class="x">]</span>
<span class="n">prior</span> <span class="o">=</span> <span class="n">MvNormal</span><span class="x">(</span><span class="n">μᵦ</span><span class="x">,</span> <span class="n">Σᵦ</span><span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FullNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
</code></pre></div> </div>
</div>
</div>
<h4 id="update-step-definition">Update step definition</h4>
<p>Then, using the posterior hyperparameter update formulas, let’s implement the update function. The update function takes a prior and our data, and return the posterior distribution. Notice how by using Julia’s unicode support, we can have our code closely resembling the math.</p>
<p>\[
\Sigma_\beta^{new} = (\Sigma_\beta^{-1} + X^TX)^{-1} \sigma^2
\]</p>
<p>\[
\mu_\beta^{new} = (\Sigma_\beta^{-1} + X^TX)^{-1} (\Sigma_\beta^{-1}\mu_\beta + X^TY)
\]</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">function</span><span class="nf"> update</span><span class="x">(</span><span class="n">prior</span><span class="x">,</span> <span class="n">X</span><span class="x">,</span> <span class="n">Y</span><span class="x">,</span> <span class="n">σ²</span><span class="x">)</span>
<span class="n">μᵦ</span> <span class="o">=</span> <span class="n">mean</span><span class="x">(</span><span class="n">prior</span><span class="x">)</span>
<span class="n">Σᵦ</span> <span class="o">=</span> <span class="n">cov</span><span class="x">(</span><span class="n">prior</span><span class="x">)</span>
<span class="n">Σᵦ_new</span> <span class="o">=</span> <span class="x">(</span><span class="n">Σᵦ</span><span class="o">^-</span><span class="mi">1</span> <span class="o">+</span> <span class="n">X</span><span class="err">'</span><span class="n">X</span><span class="x">)</span><span class="o">^-</span><span class="mi">1</span> <span class="o">*</span> <span class="n">σ²</span>
<span class="n">μᵦ_new</span> <span class="o">=</span> <span class="x">(</span><span class="n">Σᵦ</span><span class="o">^-</span><span class="mi">1</span> <span class="o">+</span> <span class="n">X</span><span class="err">'</span><span class="n">X</span><span class="x">)</span><span class="o">^-</span><span class="mi">1</span> <span class="o">*</span> <span class="x">(</span><span class="n">Σᵦ</span><span class="o">^-</span><span class="mi">1</span><span class="o">*</span><span class="n">μᵦ</span> <span class="o">+</span> <span class="n">X</span><span class="err">'</span><span class="n">Y</span><span class="x">)</span>
<span class="n">posterior</span> <span class="o">=</span> <span class="n">MvNormal</span><span class="x">(</span><span class="n">μᵦ_new</span><span class="x">,</span> <span class="kt">Symmetric</span><span class="x">(</span><span class="n">Σᵦ_new</span><span class="x">))</span>
<span class="k">return</span> <span class="n">posterior</span>
<span class="k">end</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>update (generic function with 1 method)
</code></pre></div> </div>
</div>
</div>
<h4 id="training">Training</h4>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">posterior</span> <span class="o">=</span> <span class="n">update</span><span class="x">(</span><span class="n">prior</span><span class="x">,</span> <span class="n">X</span><span class="x">,</span> <span class="n">Y</span><span class="x">,</span> <span class="n">σ²</span><span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FullNormal(
dim: 2
μ: [29.797457399445193, 40.716620886842236]
Σ: [2438.8256259319187 -36.40027489487599; -36.40027489487599 0.7280054978975199]
)
</code></pre></div> </div>
</div>
</div>
<p>Let’s extract the estimates along with standard error from the posterior.</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@printf</span><span class="x">(</span><span class="s">"β₀: %.3f ± %.3f (2σ)</span><span class="se">\n</span><span class="s">"</span><span class="x">,</span> <span class="n">mean</span><span class="x">(</span><span class="n">posterior</span><span class="x">)[</span><span class="mi">1</span><span class="x">],</span> <span class="mi">2</span><span class="o">*</span><span class="n">sqrt</span><span class="x">(</span><span class="n">cov</span><span class="x">(</span><span class="n">posterior</span><span class="x">)[</span><span class="mi">1</span><span class="x">,</span><span class="mi">1</span><span class="x">]))</span>
<span class="nd">@printf</span><span class="x">(</span><span class="s">"β₁: %.3f ± %.3f (2σ)</span><span class="se">\n</span><span class="s">"</span><span class="x">,</span> <span class="n">mean</span><span class="x">(</span><span class="n">posterior</span><span class="x">)[</span><span class="mi">2</span><span class="x">],</span> <span class="mi">2</span><span class="o">*</span><span class="n">sqrt</span><span class="x">(</span><span class="n">cov</span><span class="x">(</span><span class="n">posterior</span><span class="x">)[</span><span class="mi">2</span><span class="x">,</span><span class="mi">2</span><span class="x">]))</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>β₀: 29.797 ± 98.769 (2σ)
β₁: 40.717 ± 1.706 (2σ)
</code></pre></div> </div>
</div>
</div>
<p>We can see how the parameters we used to generate the data ($-13, 42$) are well within
the $2\sigma$ confidence interval of our estimation.</p>
<h3 id="posterior-predictive">Posterior predictive</h3>
<p>To use our posterior in a predictive setting, we need the <em>predictive</em> distribution, which can be obtained with the following formula:</p>
<p>\[
p(y_i\mid x_i,\beta) = \mathcal{N}(x_i\beta, \sigma^2 + x_i^T\Sigma_\beta x_i)
\]</p>
<p>where $x_i$ is the feature vector for a single observation and $y_i$ is the predicted response.</p>
<div class="codecell">
<div class="input_area">
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plot</span><span class="x">(</span>
<span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">,</span>
<span class="n">X</span><span class="o">*</span><span class="n">mean</span><span class="x">(</span><span class="n">posterior</span><span class="x">),</span>
<span class="n">ribbon</span><span class="o">=</span><span class="mi">2</span><span class="o">*</span><span class="n">sqrt</span><span class="o">.</span><span class="x">(</span>
<span class="n">σ²</span> <span class="o">.+</span> <span class="x">[</span><span class="n">X</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">]</span><span class="err">'</span><span class="o">*</span><span class="n">cov</span><span class="x">(</span><span class="n">posterior</span><span class="x">)</span><span class="o">*</span><span class="n">X</span><span class="x">[</span><span class="n">i</span><span class="x">,</span><span class="o">:</span><span class="x">]</span> <span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">]</span>
<span class="x">),</span> <span class="n">xlabel</span><span class="o">=</span><span class="s">"X"</span><span class="x">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"Y"</span><span class="x">,</span> <span class="n">title</span><span class="o">=</span><span class="s">"Posterior"</span><span class="x">,</span>
<span class="n">label</span><span class="o">=</span><span class="s">"Predicted"</span>
<span class="x">)</span>
<span class="n">scatter!</span><span class="x">(</span><span class="mi">1</span><span class="o">:</span><span class="n">n</span><span class="x">,</span> <span class="n">Y</span><span class="x">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Training data"</span><span class="x">)</span>
</code></pre></div> </div>
</div>
<div class="output_area">
<p style="text-align: center;"><img src="/assets/images/bayesian-linear-regression-with-conjugate-priors/Blog Post Test_files/output_43_0.svg" alt="svg" /></p>
</div>
</div>
<h3 id="sourcesbibliography">Sources/Bibliography</h3>
<ul>
<li><a href="https://maxhalford.github.io/blog/bayesian-linear-regression">https://maxhalford.github.io/blog/bayesian-linear-regression</a></li>
<li><a href="https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf">https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf</a></li>
<li><a href="https://koaning.io/posts/bayesian-propto-streaming/">https://koaning.io/posts/bayesian-propto-streaming/</a></li>
<li><a href="http://www.biostat.umn.edu/~ph7440/pubh7440/BayesianLinearModelGoryDetails.pdf">http://www.biostat.umn.edu/~ph7440/pubh7440/BayesianLinearModelGoryDetails.pdf</a></li>
</ul>Andrea CognolatoModern text rendering with Linux: Antialiasing2019-08-08T00:00:00+00:002019-08-08T00:00:00+00:00https://mrandri19.github.io/2019/08/08/modern-text-rendering-linux-ep2<h2 id="introduction">Introduction</h2>
<p>Welcome to part 2 of Modern text rendering in Linux. Check out the other posts
in the series:
<a href="/2019/07/18/modern-text-rendering-linux-ep1.html">part 1</a> and
<a href="/2019/07/24/modern-text-rendering-linux-overview.html">Overview</a>.</p>
<p>In this post I will show how to render a glyph to an image and the differences
between grayscale and LCD (subpixel) antialiasing.</p>
<h2 id="setup">Setup</h2>
<p>I will use the same code, OS, compiler and libraries used in
<a href="/2019/07/18/modern-text-rendering-linux-ep1.html">part 1</a>
and extend the code.</p>
<p>This will be our final result. And <a href="https://gist.github.com/mrandri19/fe5dc2709d761568d749f8125d0f4490">here</a> is the code.</p>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/grayscale.jpg" style="min-height: 8rem;image-rendering: pixelated;image-rendering: crisp-edges; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;">A grayscale antialiased glyph</figcaption>
</figure>
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/lcd.jpg" style="min-height: 8rem;image-rendering: pixelated;image-rendering: crisp-edges; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;">A subpixel antialiased glyph</figcaption>
</figure>
</div>
<p>Let’s start right back from where we stopped last time:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>clang <span class="nt">-I</span>/usr/include/freetype2 <span class="se">\</span>
<span class="nt">-I</span>/usr/include/libpng16 <span class="se">\</span>
<span class="nt">-Wall</span> <span class="nt">-Werror</span> <span class="se">\</span>
<span class="nt">-o</span> main <span class="se">\</span>
<span class="nt">-lfreetype</span> <span class="se">\</span>
main.c <span class="o">&&</span> ./main
FreeType<span class="s1">'s version is 2.8.1
.*****.
.********.
.*********
. ***.
***
***
.********
***********
.**. ***
*** ***
*** ***
***. ***
.***********
***********
.*******..
</span></code></pre></div></div>
<h3 id="including-stb_image_write">Including stb_image_write</h3>
<p>To render a glyph to a JPG image we will use nothing’s
<a href="https://raw.githubusercontent.com/nothings/stb/master/stb_image_write.h">stb_image_write</a>.</p>
<p>In the same folder as <code class="language-plaintext highlighter-rouge">main.c</code> run:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>wget https://raw.githubusercontent.com/nothings/stb/master/stb_image_write.h
</code></pre></div></div>
<p>If wget is not installed run <code class="language-plaintext highlighter-rouge">sudo apt install wget</code> to install it
and then retry the previous command.</p>
<p>In <code class="language-plaintext highlighter-rouge">main.c</code> add the library to the includes.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> #include <freetype2/ft2build.h>
#include FT_FREETYPE_H
<span class="gi">+
+ #define STB_IMAGE_WRITE_IMPLEMENTATION
+ #include "./stb_image_write.h"
+
</span> int main() {
</code></pre></div></div>
<h3 id="rendering-a-grayscale-antialiased-glyph">Rendering a Grayscale Antialiased Glyph</h3>
<p>First of all extract <code class="language-plaintext highlighter-rouge">face->glyph->bitmap</code> to a variable to make the
code tidier.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gi">+ FT_Bitmap bitmap = face->glyph->bitmap;
</span><span class="gd">- for (size_t i = 0; i < face->glyph->bitmap.rows; i++) {
- for (size_t j = 0; j < face->glyph->bitmap.width; j++) {
- unsigned char pixel_brightness =
- face->glyph->bitmap.buffer[i * face->glyph->bitmap.pitch + j];
</span><span class="gi">+ for (size_t i = 0; i < bitmap.rows; i++) {
+ for (size_t j = 0; j < bitmap.width; j++) {
+ unsigned char pixel_brightness = bitmap.buffer[i * bitmap.pitch + j];
</span>
if (pixel_brightness > 169) {
printf("*");
} else if (pixel_brightness > 84) {
printf(".");
} else {
printf(" ");
}
}
printf("\n");
}
</code></pre></div></div>
<p>Create a buffer for the image data (remember to <code class="language-plaintext highlighter-rouge">free</code> it at the end).</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> FT_Bitmap bitmap = face->glyph->bitmap;
<span class="gi">+
+ unsigned char* data =
+ malloc(bitmap.width * bitmap.rows * sizeof(unsigned char*));
+
</span> for (size_t i = 0; i < bitmap.rows; i++) {
</code></pre></div></div>
<p>Delete the printing instructions and copy the pixel data into the buffer.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> for (size_t i = 0; i < bitmap.rows; i++) {
for (size_t j = 0; j < bitmap.width; j++) {
<span class="gd">- unsigned char pixel_brightness = bitmap.buffer[i * bitmap.pitch + j];
</span><span class="gi">+ data[i * bitmap.width + j] = bitmap.buffer[i * bitmap.pitch + j];
+
</span><span class="gd">- if (pixel_brightness > 169) {
- printf("*");
- } else if (pixel_brightness > 84) {
- printf(".");
- } else {
- printf(" ");
- }
</span> }
<span class="gd">- printf("\n");
</span> }
</code></pre></div></div>
<p>Finally, write the image to a JPG file called <code class="language-plaintext highlighter-rouge">image.jpg</code>.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> }
<span class="gi">+
+ stbi_write_jpg("image.jpg", bitmap.width, bitmap.rows, 1, data, 100);
+
</span> return 0;
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">image.jpg</code> should look like this.</p>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/grayscale.jpg" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
</figure>
</div>
<h3 id="rendering-a-lcd-subpixel-antialiased-glyph">Rendering a LCD (Subpixel) Antialiased Glyph</h3>
<p>Changing the antialiasing technique is easy with FreeType.</p>
<p>Add this header to include the LCD filtering functionality.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> #include FT_FREETYPE_H
<span class="gi">+ #include FT_LCD_FILTER_H
</span>
#define STB_IMAGE_WRITE_IMPLEMENTATION
</code></pre></div></div>
<p>Set which LCD filter to use. For a
list of filters check the
<a href="https://www.freetype.org/freetype2/docs/reference/ft2-lcd_rendering.html">FreeType docs</a>
on Subpixel Rendering. We will use the default one.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code> FT_Int major, minor, patch;
FT_Library_Version(ft, &major, &minor, &patch);
printf("FreeType's version is %d.%d.%d\n", major, minor, patch);
<span class="gi">+
+ FT_Library_SetLcdFilter(ft, FT_LCD_FILTER_DEFAULT);
+
</span> FT_Face face;
</code></pre></div></div>
<p>Change the render mode to LCD.
<a href="https://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#ft_render_mode">Here</a>
is a list of the available render modes.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- FT_Int32 render_flags = FT_RENDER_MODE_NORMAL;
</span><span class="gi">+ FT_Int32 render_flags = FT_RENDER_MODE_LCD;
</span></code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">image.jpg</code> should look like this.</p>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/thicc.jpg" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
</figure>
</div>
<p>What happened? LCD Antialiasing works by treating each pixel as three
separate light sources, each one capable of emitting either Red, Green, or Blue light.
Grayscale Antialiasing instead treats each pixel as a single light source, emitting
white light.</p>
<p>We will explain how this works in detail later, but for now what we care about
is that LCD AA triples the horizontal resolution of the image.</p>
<p>Let’s change the code so that we see a colored image where each pixel is composed
of three channels: R,G,B.</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gd">- stbi_write_jpg("image.jpg", bitmap.width, bitmap.rows, 1, data, 100);
</span><span class="gi">+ stbi_write_jpg("image.jpg", bitmap.width / 3, bitmap.rows, 3, data, 100);
</span></code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">image.jpg</code> should look like this.</p>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/lcd.jpg" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
</figure>
</div>
<p>You can find the complete code <a href="https://gist.github.com/mrandri19/fe5dc2709d761568d749f8125d0f4490">here</a>.</p>
<h2 id="how-do-they-work">How do they work?</h2>
<p>Draw on the grids, then click <strong>sample</strong> to see how the different antialiasing tecniques work.
Try drawing the same shape in both, for example an ‘A’, and then compare the results.</p>
<div style="
display: flex;
flex-flow: row wrap;
justify-content: space-around;
text-align: center;
margin-bottom: 1.5rem;">
<div>
<p style="margin: auto;">Grayscale</p>
<canvas id="canvas-grayscale" style="display: block;"></canvas>
<button id="sample-grayscale" class="my-button">Sample</button>
<button id="clear-grayscale" class="my-button">Clear</button>
<script src="/assets/js/modern-text-rendering-linux-ep2modern-text-rendering-linux-ep2/grayscale.js"></script>
</div>
<div>
<p style="margin: auto;">LCD</p>
<canvas id="canvas-lcd" style="display: block;"></canvas>
<button id="sample-lcd" class="my-button">Sample</button>
<button id="clear-lcd" class="my-button">Clear</button>
<script src="/assets/js/modern-text-rendering-linux-ep2modern-text-rendering-linux-ep2/lcd.js"></script>
</div>
</div>
<p>What you should be seeing after drawing and clicking sample.</p>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/interactive-example-screenshot.png" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;"></figcaption>
</figure>
</div>
<h3 id="grayscale-antialiasing">Grayscale Antialiasing</h3>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/rasterization-strategies.png" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;">Ideal shape, monochrome and grayscale antialiasing<br />
Image taken from <a href="https://www.smashingmagazine.com/2012/04/a-closer-look-at-font-rendering/">Smashing Magazine</a>
</figcaption>
</figure>
</div>
<p>Grayscale Antialiasing divides the image to render in a grid, then, for
each square in the grid, counts how much the area of a grid’s square is covered by
the image. If 100% of the square is covered then the pixel will have 100%
opacity, if 50% of the square is covered then the pixel will be half-transparent.</p>
<h3 id="lcd-subpixel-antialiasing">LCD (Subpixel) Antialiasing</h3>
<div style="display: flex; flex-direction: row; justify-content: space-evenly;">
<figure style="display: inline-block;">
<img src="/assets/images/modern-text-rendering-linux-ep2/rasterization-subpixel.png" style="min-height: 8rem;image-rendering: pixelated; image-rendering: crisp-edges; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;">An LCD
antialiased glyph, showing an RGB image, showing its individual
subpixels, showing each subpixel's brightness. The white square
represents a single pixel.<br />
Image taken from <a href="https://www.smashingmagazine.com/2012/04/a-closer-look-at-font-rendering/">Smashing Magazine</a>
</figcaption>
</figure>
</div>
<p>LCD Antialiasing exploits the fact that each pixel is made of three independent
light sources, usually thin rectangles, which we call subpixels. By knowing
the order (Red Green Blue or Blue Green Red) in which these subpixels form a pixel, we can turn them on individually
to triple the horizontal resolution.</p>
<p>The main downside of this method is that when rendering an image, you need to know the
order in which subpixels are placed on the screen, which may not be available.
Also think of a phone screen being rotated 90 degrees, the images need to be re-rendered (or at least re-antialiased)
because the subpixels are now one on top of the other instead of side by side.
This is the reason why iOS doesn’t use subpixel rendering while macOS pre 10.14 does.</p>
<h2 id="sources">Sources</h2>
<ul>
<li><a href="https://www.smashingmagazine.com/2012/04/a-closer-look-at-font-rendering/">A closer look at font rendering</a> by Smashing Magazine</li>
<li><a href="http://www.puredevsoftware.com/blog/2019/01/22/sub-pixel-gamma-correct-font-rendering/">Sub-pixel, gamma correct, font rendering</a> by Puredev Software</li>
<li><a href="https://web.archive.org/web/20180921225907/http://antigrain.com/research/font_rasterization/index.html#FONT_RASTERIZATION">Font Rasterization</a> by The AGG Project</li>
</ul>andreaIntroductionModern text rendering with Linux: Overview2019-07-24T00:00:00+00:002019-07-24T00:00:00+00:00https://mrandri19.github.io/2019/07/24/modern-text-rendering-linux-overview<h2 id="introduction">Introduction</h2>
<p>Text is an essential part of Human-computer Interaction. In the age of Instagram,
YouTube, and VR we still consume information primarily through text. Computing
started with text-only interfaces and every major Operating System provides
libraries to render text.</p>
<p>Text rendering, despite being ubiquitous, has little up to date documentation,
especially on Linux systems. The goal of this post is to give an overview of the
modern Linux text rendering stack and to give the reader an understanding of
the complexity behind it.</p>
<h2 id="overview">Overview</h2>
<figure>
<img src="/assets/images/modern-text-rendering-linux-overview/overview.svg" style="max-width: auto; display: block; margin: auto;" />
<figcaption style="text-align: center; margin-top: 1rem;">The data flow of text rendering.</figcaption>
</figure>
<p>The stack is composed of 4 components:</p>
<ul>
<li><a href="https://www.freedesktop.org/wiki/Software/fontconfig/">Fontconfig</a></li>
<li><a href="https://github.com/fribidi/fribidi">FriBidi</a> or <a href="http://site.icu-project.org/home">LibICU</a></li>
<li><a href="https://www.freedesktop.org/wiki/Software/fontconfig/">HarfBuzz</a></li>
<li><a href="https://www.freetype.org/">FreeType</a></li>
</ul>
<h3 id="fontconfig">Fontconfig</h3>
<figure>
<img src="/assets/images/modern-text-rendering-linux-overview/fontconfig.svg" style="max-width: auto; display: block; margin: auto;" />
</figure>
<p>Fontconfig takes a list of font properties (name, width, weight, style, etc.),
then selects a font file on your system which most closely matches the properties
you requested. It is configured with an XML-based language which allows you to
specify default fonts, whitelist or blacklist them, create bold and italic styles
for incomplete fonts, set the hinting and antialiasing levels and much more.
The <a href="https://wiki.archlinux.org/index.php/Font_configuration">Arch Wiki</a> is an
excellent resource for creating or editing your configuration.</p>
<h3 id="fribidi-or-libicu">FriBidi or LibICU</h3>
<p>FriBidi and LibICU, among other things, provide an implementation of the <a href="http://www.unicode.org/reports/tr9/">Unicode Bidirectional</a>
algorithm. The Unicode Bidirectional Algorithm allows you to convert between text
in <em>logical/storage</em> format to text in <em>visual</em> format. The W3C has a beautiful
<a href="https://www.w3.org/International/articles/inline-bidi-markup/uba-basics">article</a>
on it.</p>
<p>Consider the following example, where Arabic or Hebrew letters are represented by uppercase
English letters and English text is represented by lowercase letters:</p>
<p>In the rendered text, the English letter h is visually followed by the Arabic letter C,
but logically h is followed by the rightmost letter A. The next letter, in logical order,
will be R. In other words, the logical/storage order of the same text would be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>english ARABIC text
</code></pre></div></div>
<p>But the rendered version also called visual format would be:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>english CIBARA text
</code></pre></div></div>
<p>Because Arabic or Hebrew are Right-To-Left languages whereas English is Left-To-Right.</p>
<p>(Example taken from <a href="http://userguide.icu-project.org/transforms/bidi">http://userguide.icu-project.org/transforms/bidi</a>)</p>
<h3 id="harfbuzz">HarfBuzz</h3>
<figure>
<img src="/assets/images/modern-text-rendering-linux-overview/harfbuzz.svg" style="max-width: auto; display: block; margin: auto;" />
</figure>
<p>HarfBuzz is the text shaping engine: it maps codepoints from a string to their
corresponding glyph indices.
HarfBuzz takes a Unicode string, three properties: direction, script, language, and a font. It returns a list of glyph indices, each with a position in x, y coordinates, which incorporates kerning data from the font’s GPOS table.</p>
<p>Examples:</p>
<p><code class="language-plaintext highlighter-rouge">Hello, world</code> becomes <code class="language-plaintext highlighter-rouge">43 72 79 79 82 15 3 90 82 85 79 71</code> with font: Ubuntu Mono</p>
<p><code class="language-plaintext highlighter-rouge">-></code> becomes <code class="language-plaintext highlighter-rouge">16 33</code> with font: Ubuntu Mono</p>
<p><code class="language-plaintext highlighter-rouge">-></code> becomes <code class="language-plaintext highlighter-rouge">1603 1064</code> with font: Fira Code Retina</p>
<p>HarfBuzz is essential to correctly handle accents, emojis, ligatures and almost
every language except English.</p>
<h3 id="freetype">FreeType</h3>
<figure>
<img src="/assets/images/modern-text-rendering-linux-overview/freetype.svg" style="max-width: auto; display: block; margin: auto;" />
</figure>
<p>FreeType is the text rendering engine: it takes a glyph index and a font, then renders an image of that glyph index. It also provides basic kerning support.
You can configure it by specifying the hinting level and which and how much antialiasing to use.</p>
<h2 id="sources">Sources</h2>
<ul>
<li><a href="http://behdad.org/text/">State of Text Rendering</a> by Behdad Esfahbod (2010)</li>
<li><a href="https://hal.inria.fr/hal-00821839/document">Higher Quality 2D Text Rendering</a> by Nicolas Rougier (2013)</li>
<li><a href="https://www.slideshare.net/NicolasRougier1/siggraph-2018-digital-typography">Digital Typography - SIGGRAPH 2018</a> by Behdad Esfahbod and Nicolas Rougier (2018)</li>
<li><a href="https://www.youtube.com/watch?v=wzEZhzeRjFk">Linux Font Rendering Stack</a> by Max Harmathy (2018)</li>
</ul>andreaIntroductionModern text rendering with Linux: Part 12019-07-18T00:00:00+00:002019-07-18T00:00:00+00:00https://mrandri19.github.io/2019/07/18/modern-text-rendering-linux-ep1<h2 id="introduction">Introduction</h2>
<p>Welcome to part 1 of Modern text rendering in Linux.
In each part of this series we will build a self-contained C program to render a character or sequence of characters.
Each of these programs will implement a feature which I consider essential to achieve state of the art text rendering.</p>
<p>In this first part I will show how to setup FreeType and we will build a console character renderer.</p>
<p><img alt="The 'a' glyph rendered in a terminal emulator console" src="/assets/images/modern-text-rendering-linux-ep1/image.png" style="max-width: 20rem; display: block; margin: auto;" /></p>
<p>This is what you will build. And <a href="https://gist.github.com/mrandri19/245237a6bde246f119ab2b6245e540ba">here</a> is the code.</p>
<h2 id="setup">Setup</h2>
<ul>
<li>My operating system is <code class="language-plaintext highlighter-rouge">Ubuntu 18.04.2 LTS (bionic)</code></li>
<li>My C compiler is <code class="language-plaintext highlighter-rouge">clang version 6.0.0-1ubuntu2</code></li>
</ul>
<h3 id="installing-freetype">Installing FreeType</h3>
<p>On Ubuntu you will need to install FreeType and libpng</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>libfreetype6 libfreetype6-dev
<span class="nv">$ </span><span class="nb">sudo </span>apt <span class="nb">install </span>libpng16-16 libpng-dev
</code></pre></div></div>
<ul>
<li>My FreeType version is <code class="language-plaintext highlighter-rouge">2.8.1-2ubuntu2</code>although, at the time of writing, the most recent version is <code class="language-plaintext highlighter-rouge">FreeType-2.10.1</code> which will work as well</li>
<li>My libpng version <code class="language-plaintext highlighter-rouge">(1.6.34-1ubuntu0.18.04.2)</code></li>
</ul>
<h2 id="the-console-renderer">The console renderer</h2>
<h4 id="create-your-c-file-mainc-in-my-case">Create your C file (<code class="language-plaintext highlighter-rouge">main.c</code> in my case)</h4>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello, world</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>clang <span class="nt">-Wall</span> <span class="nt">-Werror</span> <span class="nt">-o</span> main main.c
<span class="nv">$ </span>./main
Hello, world
</code></pre></div></div>
<h4 id="include-the-freetype-libraries">Include the FreeType libraries</h4>
<p>To find the include path (i.e. the directories your compiler traverses when searching for the files you <code class="language-plaintext highlighter-rouge">#include</code>) for FreeType run:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>pkg-config <span class="nt">--cflags</span> freetype2
<span class="nt">-I</span>/usr/include/freetype2 <span class="nt">-I</span>/usr/include/libpng16
</code></pre></div></div>
<p>This line <code class="language-plaintext highlighter-rouge">-I/usr/include/freetype2 -I/usr/include/libpng16</code> contains the compilation flags needed to include FreeType in the C program</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
</span>
<span class="cp">#include <freetype2/ft2build.h>
#include FT_FREETYPE_H
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello, world</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>clang <span class="nt">-I</span>/usr/include/freetype2 <span class="se">\</span>
<span class="nt">-I</span>/usr/include/libpng16 <span class="se">\</span>
<span class="nt">-Wall</span> <span class="nt">-Werror</span> <span class="se">\</span>
<span class="nt">-o</span> main <span class="se">\</span>
main.c
<span class="nv">$ </span>./main
Hello, world
</code></pre></div></div>
<h4 id="print-freetypes-version">Print FreeType’s version</h4>
<p>Inside <code class="language-plaintext highlighter-rouge">main()</code> initialize FreeType using <code class="language-plaintext highlighter-rouge">FT_Init_FreeType(&ft)</code> and check for errors (FreeType functions return 0 on success).</p>
<p>(From now on all the functions I will use come from the <a href="https://www.freetype.org/freetype2/docs/reference/index.html">FreeType API Reference</a>).</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FT_Library</span> <span class="n">ft</span><span class="p">;</span>
<span class="n">FT_Error</span> <span class="n">err</span> <span class="o">=</span> <span class="n">FT_Init_FreeType</span><span class="p">(</span><span class="o">&</span><span class="n">ft</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Failed to initialize FreeType</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Then using <code class="language-plaintext highlighter-rouge">FT_Library_Version</code> obtain the version.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="n">FT_Int</span> <span class="n">major</span><span class="p">,</span> <span class="n">minor</span><span class="p">,</span> <span class="n">patch</span><span class="p">;</span>
<span class="n">FT_Library_Version</span><span class="p">(</span><span class="n">ft</span><span class="p">,</span> <span class="o">&</span><span class="n">major</span><span class="p">,</span> <span class="o">&</span><span class="n">minor</span><span class="p">,</span> <span class="o">&</span><span class="n">patch</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"FreeType's version is %d.%d.%d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">major</span><span class="p">,</span> <span class="n">minor</span><span class="p">,</span> <span class="n">patch</span><span class="p">);</span>
</code></pre></div></div>
<p>If we compile it using the last command we will have a linker error:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/tmp/main-d41304.o: In function `main':
main.c:(.text+0x14): undefined reference to `FT_Init_FreeType'
main.c:(.text+0x54): undefined reference to `FT_Library_Version'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
</code></pre></div></div>
<p>To fix it add <code class="language-plaintext highlighter-rouge">-lfreetype</code></p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>clang <span class="nt">-I</span>/usr/include/freetype2 <span class="se">\</span>
<span class="nt">-I</span>/usr/include/libpng16 <span class="se">\</span>
<span class="nt">-Wall</span> <span class="nt">-Werror</span> <span class="se">\</span>
<span class="nt">-o</span> main <span class="se">\</span>
<span class="nt">-lfreetype</span> <span class="se">\</span>
main.c
<span class="nv">$ </span>./main
FreeType<span class="s1">'s version is 2.8.1
</span></code></pre></div></div>
<h4 id="loading-a-font-face">Loading a font face</h4>
<p>The first step to render a character is to load the font file. I am using <a href="https://fonts.google.com/specimen/Ubuntu+Mono">ubuntu mono</a>.</p>
<p>To understand the exact difference between <em>face</em>, <em>family</em> and <em>font</em> refer to the <a href="https://www.freetype.org/freetype2/docs/glyphs/glyphs-1.html#section-1">FreeType Docs</a>.</p>
<p>The third argument is called the <em>face index</em>, which was created to allow font creators to embed several faces in a single font size.
Because every font has at least 1 face in it 0 will always work and will select the first one.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FT_Face</span> <span class="n">face</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">FT_New_Face</span><span class="p">(</span><span class="n">ft</span><span class="p">,</span> <span class="s">"./UbuntuMono.ttf"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">face</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Failed to load face</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="setting-the-faces-pixel-size">Setting the face’s pixel size</h4>
<p>With this instruction we tell FreeType our desired width and height for the rendered characters.</p>
<p>We can omit the width by passing 0, FreeType will interpret this as: “same as the other”, in this case 32px.
This can be used to render a character with e.g. 10px width and 16px height.</p>
<p>This operation can fail on a fixed-size face, which we will encounter when talking about emojis.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">err</span> <span class="o">=</span> <span class="n">FT_Set_Pixel_Sizes</span><span class="p">(</span><span class="n">face</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">32</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Failed to set pixel size</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="getting-the-characters-index">Getting the character’s index</h4>
<p>First of all let’s go back to the <a href="https://www.freetype.org/freetype2/docs/glyphs/glyphs-1.html#section-2">FreeType docs</a> and establish a naming convention.
A character is not the same thing as a <em>glyph</em>. A character is what you have in your <code class="language-plaintext highlighter-rouge">char</code>, a glyph is an image which is in some way relater to that character.
This relation is quite complex because a char can correspond to many glyphs: i.e. accents. A glyph can correspond to many chars: i.e. ligatures where -> is represented as a single image.</p>
<p>To obtain the index of the glyph corresponding to a character we use <code class="language-plaintext highlighter-rouge">FT_Get_Char_Index</code>.
This, as you can imagine, only allows one-to-one mapping between characters and glyphs.
We will solve this in a future part by using the <em>HarfBuzz</em> library.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FT_UInt</span> <span class="n">glyph_index</span> <span class="o">=</span> <span class="n">FT_Get_Char_Index</span><span class="p">(</span><span class="n">face</span><span class="p">,</span> <span class="sc">'a'</span><span class="p">);</span>
</code></pre></div></div>
<h4 id="loading-a-glyph-from-the-face">Loading a glyph from the face</h4>
<p>Having obtained the glyph_index we can load the corresponding glyph from our face.</p>
<p>In a future part we will discuss in depth about the various load flags and how they allow
using features like hinting and bitmap fonts.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FT_Int32</span> <span class="n">load_flags</span> <span class="o">=</span> <span class="n">FT_LOAD_DEFAULT</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">FT_Load_Glyph</span><span class="p">(</span><span class="n">face</span><span class="p">,</span> <span class="n">glyph_index</span><span class="p">,</span> <span class="n">load_flags</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Failed to load glyph</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="rendering-the-glyph-in-a-glyph-slot">Rendering the glyph in a glyph slot</h4>
<p>At this point we can finally render the our glyph into a <em>glyph slot</em>, contained in <code class="language-plaintext highlighter-rouge">face->glyph</code>.</p>
<p>We will discuss the render flags in the future too because they allow using LCD (or Subpixel) Rendering and Grayscale Antialiasing.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">FT_Int32</span> <span class="n">render_flags</span> <span class="o">=</span> <span class="n">FT_RENDER_MODE_NORMAL</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">FT_Render_Glyph</span><span class="p">(</span><span class="n">face</span><span class="o">-></span><span class="n">glyph</span><span class="p">,</span> <span class="n">render_flags</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">err</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Failed to render the glyph</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">exit</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<h4 id="printing-the-glyph-to-the-console">Printing the glyph to the console</h4>
<p>The bitmap of the rendered glyph can be accessed with <code class="language-plaintext highlighter-rouge">face->glyph->bitmap.buffer</code> and is represented as an unsigned char array, therefore
its values are between 0 and 255.</p>
<p>The buffer is returned as a 1D array but represents a 2D image.
To access the i-th row and j-th column of it we use <code class="language-plaintext highlighter-rouge">column * row_width + row</code>, as seen in <code class="language-plaintext highlighter-rouge">bitmap.buffer[i * face->glyph->bitmap.pitch + j]</code>.</p>
<p>You can see that we used <code class="language-plaintext highlighter-rouge">bitmap.width</code> in the for loop and <code class="language-plaintext highlighter-rouge">bitmap.pitch</code> in the array access, this is because each row of pixels is <code class="language-plaintext highlighter-rouge">bitmap.width</code> wide
but the buffer has a “width” of <code class="language-plaintext highlighter-rouge">bitmap.pitch</code>.</p>
<p><img src="/assets/images/modern-text-rendering-linux-ep1/Group.svg" style="max-width: 20rem; display: block; margin: auto;" /></p>
<p>In the following code each row and column is iterated and depending on the pixel brightness a different symbol is drawn.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">face</span><span class="o">-></span><span class="n">glyph</span><span class="o">-></span><span class="n">bitmap</span><span class="p">.</span><span class="n">rows</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">face</span><span class="o">-></span><span class="n">glyph</span><span class="o">-></span><span class="n">bitmap</span><span class="p">.</span><span class="n">width</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">pixel_brightness</span> <span class="o">=</span>
<span class="n">face</span><span class="o">-></span><span class="n">glyph</span><span class="o">-></span><span class="n">bitmap</span><span class="p">.</span><span class="n">buffer</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="n">face</span><span class="o">-></span><span class="n">glyph</span><span class="o">-></span><span class="n">bitmap</span><span class="p">.</span><span class="n">pitch</span> <span class="o">+</span> <span class="n">j</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pixel_brightness</span> <span class="o">></span> <span class="mi">169</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"*"</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">pixel_brightness</span> <span class="o">></span> <span class="mi">84</span><span class="p">)</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"."</span><span class="p">);</span>
<span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">" "</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The console output.</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>clang <span class="nt">-I</span>/usr/include/freetype2 <span class="se">\</span>
<span class="nt">-I</span>/usr/include/libpng16 <span class="se">\</span>
<span class="nt">-Wall</span> <span class="nt">-Werror</span> <span class="se">\</span>
<span class="nt">-o</span> main <span class="se">\</span>
<span class="nt">-lfreetype</span> <span class="se">\</span>
main.c <span class="o">&&</span> ./main
FreeType<span class="s1">'s version is 2.8.1
.*****.
.********.
.*********
. ***.
***
***
.********
***********
.**. ***
*** ***
*** ***
***. ***
.***********
***********
.*******..
</span></code></pre></div></div>
<p>You can find the complete code <a href="https://gist.github.com/mrandri19/245237a6bde246f119ab2b6245e540ba">here</a>.</p>
<h2 id="conclusion">Conclusion</h2>
<p>We have built a basic console character renderer. This example can, and will,
be extended to render a character in an OpenGL texure, to support emojis,
subpixel rendering, ligatures and much much more. In the the next part we will
talk about Grayscale vs LCD antialiasing and their pros and cons.</p>
<p>See you soon😁🖐️.</p>andreaIntroduction