Low Discprecancy Disk Sampling

Qausi-Monte Carlo integration have been used in graphics for many decades. To perform such integration we need to generate sample points across the domain of integration. It turns out that we can improve the efficiency of the integration process by choosing these sample points in clever ways. Often we want to generate sample points that have low discrepancy (few lumps and few gaps) without being too regular.

Probability theory provides us with methods to sample pseudo-randomly and uniformly from a given domain. However, the inherent randomness of such techniques often leads to undesirable "lumps" or "holes" in the set of sample points. We can use low discrepancy sequences (LDS) to solve this issue. But exactly how do we combine probability theory and low discrepancy sequences in practice? In this post I will explain how to devise an algorithm that generates low discrepancy irregular sample sets on the unit disk. The approach presented should generalize to more complex sets such as the unit sphere in 3D. The target audience is programmers with basic understanding of probability theory and an interest in low discrepancy sampling.

This document is split into two parts. In section 1 I will use the inversion method to derive a formula for doing uniform random sampling on the unit disk. If you already know how to use this method, you may skip to section 2 where I show how to sample quasi-randomly with low discrepancy using the golden ratio.

1. Sampling Uniformly On The Unit Disk

Before we get to low discrepancy sequences, we need to find a way to random sample positions on the unit disk uniformly. You may guess that we can do that by simply choosing radius rr and θ\theta uniformly, i.e. (1)r(u1)=u1,θ(u2)=2πu2r(u_1) = u_1, \quad \theta(u_2) = 2\pi u_2\tag{1} where u1,u2u_1, u_2 are independent realizations of a uniform random variable on [0,1][0,1]. However, this turns out to be wrong as shown in figure 1.

Fig 1: Naive guess vs. proper uniform sampling.

The left side of equation 1 shows the result of sampling each polar component uniformly and independently. Compare this to the right side of equation 1 which shows proper uniform sampling of the disk. From the figure, it is clear that the naive guess generates points that tend to lump together near (0,0)(0, 0).

To do it correctly we can use the inversion method. In the following I will use this method but I will not explain the general idea behind it. If you don't know this method, I recommend that you read about it in the PBRT book. The high-level idea is that we want to derive a pair of cumulative distribution functions (CDFs) and then use canonical uniform random variables with these CDFs to enable uniform sampling over the unit disk. This process involves several steps which I will now go through.

Since we want to sample uniformly with respect to area, we know that the PDF p(x,y)p(x, y) must be constant, i.e. p(x,y)=cp(x, y) = c for some cRc \in \mathbb{R}. We also know that the area of the unit disk DD is π\pi and therefore Dp(x,y)dxdy=cDdxdy=cπ.\iint \limits_{D} p(x, y) \, \dif{x} \dif{y} = c \iint \limits_{D} \dif{x} \dif{y} = c \pi.

Since p(x,y)p(x, y) is a probability distribution it has to integrate to 1, and therefore 1=Dp(x,y)dxdy=cπ    c=1π.1 = \iint \limits_{D} p(x, y) \, \dif{x} \dif{y} = c \pi \iff c = \frac{1}{\pi}.

Thus p(x,y)=c=1/πp(x, y) = c = 1/\pi.

The inversion method involves integration over our domain DD. This is generally difficult to do in Cartesian coordinates, so we will convert p(x,y)p(x, y) to polar coordinates. Converting polar coordinates to Cartesian coordinates is done via the mapping (2)T([rθ])=[x(r,θ)y(r,θ)]=[rcosθrsinθ]. T\left(\begin{bmatrix}r \\ \theta \end{bmatrix}\right) = \begin{bmatrix} x(r, \theta) \\ y(r, \theta) \end{bmatrix} = \begin{bmatrix} r \cos \theta \\ r \sin \theta \end{bmatrix}. \tag{2}

Probability theory tells us that (3)p(r,θ)=JT(r,θ)p(x,y)p(r, \theta) = |J_T(r, \theta)| p(x, y)\tag{3} where (4)JT(r,θ)=[xrxθyryθ]. |J_T(r, \theta)| = \left| \begin{bmatrix} \frac{\partial x}{\partial r} & \frac{\partial x}{\partial \theta} \\ \frac{\partial y}{\partial r} & \frac{\partial y}{\partial \theta} \end{bmatrix} \right|. \tag{4} is the determinant of the Jacobian matrix of TT. The determinant of the Jacobian encodes how much tiny areas in (r,θ)(r, \theta) space is "stretched" compared to imathuivalent tiny areas in Cartesian (x,y)(x, y) space. To understand why determinants quantifies stretching like this, I recommend this video by 3Blue1Brown. This stretching property is the reason we in equation 3 use JT|J_T| as a "conversion factor" between the distributions p(r,θ)p(r, \theta) and p(x,y)p(x, y).

Using equation 2 and equation 4 we can calculate that JT(r,θ)=cosθrsinθsinθrcosθ=rcos2θ+rsin2θ=r(cos2θ+sin2θ)=r, |J_T(r, \theta)| = \left| \begin{matrix} \cos \theta & -r \sin \theta \\ \sin \theta & r \cos \theta \end{matrix} \right| = r \cos^2 \theta + r \sin^2 \theta = r (\cos^2 \theta + \sin^2 \theta) = r, which we can plug into equation 3 to see that p(r,θ)=rp(x,y)=rπ.p(r, \theta) = r \, p(x, y) = \frac{r}{\pi}.

The next step of the inversion method is the compute the marginal distributionpr(r)p_r(r) and the conditional distributionpθ(θr)p_\theta(\theta | r). If you are unfamiliar with these concepts, I recommend that you read up on them before continuing. Note that here we could have chosen to go for pθ(θ)p_\theta(\theta) and pr(rθ)p_r(r | \theta) instead, the final result should be the same regardless. We can calculate pr(r)p_r(r) by integrating out θ\theta, i.e. pr(r)=02πp(r,θ)dθ=rπ02πdθ=2r.p_r(r) = \int_0^{2\pi} p(r, \theta) \, \dif{\theta} = \frac{r}{\pi} \int_0^{2\pi} \, \dif{\theta} = 2r.

By definition of conditional probability we get that pθ(θr)=p(r,θ)pr(r)=12π.p_\theta(\theta | r) = \frac{p(r, \theta)}{p_r(r)} = \frac{1}{2\pi}.

Notice that both distributions do not depend on θ\theta. This makes sense if you consider the symmetry of the disk. Since we are working with a uniform distribution a probability density should not depend on the angle θ\theta.

We are now ready to calculate the CDFs P(r)P(r) and P(θr)P(\theta | r). By the definition of the CDF, we find these by integrating the PDFs, i.e. (5)Pr(r)=0rpr(r)dr=0r2rdr=r20r=r2,P_r(r) = \int_0^r p_r(r') \, \dif{r'} = \int_0^r 2r' \, \dif{r'} = \left. r'^2 \right\rvert_0^r = r^2,\tag{5}(6)Pθ(θr)=0θpθ(r,θ)dθ=0θ12πdθ=θ2π0θ=θ2π. P_\theta(\theta | r) = \int_0^\theta p_\theta(r, \theta') \, \dif{\theta'} = \int_0^\theta \frac{1}{2\pi} \, \dif{\theta'} = \left. \frac{\theta'}{2\pi}\right\rvert_0^\theta = \frac{\theta}{2\pi}. \tag{6}

The final step of the inversion method, is to find the inverses Pr1P_r^{-1} and Pθ1P_\theta^{-1} of the CDFs. Inversion a function requires that the function is strictly monotonically increasing. This is the case for our CDFs since the corresponding PDFs are never zero. We can find the inverse functions Pr1P_r^{-1} and Pθ1P_\theta^{-1} by applying them to equation 5 and equation 6 respectively: Pr1(P(r))=Pr1(r2)    r=Pr1(r2)    Pr1(r)=r, P_r^{-1}(P(r)) = P_r^{-1}(r^2) \iff r = P_r^{-1}(r^2) \iff P_r^{-1}(r) = \sqrt{r}, Pθ1(P(θ)=Pθ1(θ2π)    θ=Pθ1(θ2π)    Pθ1(θ)=2πθ. P_\theta^{-1}(P(\theta) = P_\theta^{-1}\left( \frac{\theta}{2\pi} \right) \iff \theta = P_\theta^{-1}\left( \frac{\theta}{2\pi} \right) \iff P_\theta^{-1}(\theta) = 2 \pi \theta.

By the inversion method this means that we can now sample the unit disk via (7)r(u1)=Pr1(u1)=u1,θ(u2)=Pθ1(u2)=2πu2r(u_1) = P_r^{-1}(u_1) = \sqrt{u_1}, \quad \theta(u_2) = P_\theta^{-1}(u_2) = 2\pi u_2\tag{7} where u1,u2u_1, u_2 are realizations of independent uniform random variables on [0,1][0, 1]. It is interesting to compare these new expression with the naive guess from equation 1. The only difference is that the rr component is now u1\sqrt{u_1} rather than just u1u_1. If you look at the plot for x\sqrt{x} on [0,1][0, 1] you can see that the square root in some sense "pushes" its input values upwards toward 1, e.g. (0.5)0,71\sqrt{(0.5)} \approx 0,71. Figuratively this means that our sample points on the unit disk will be pushed away from the origin. This sounds reasonable if you consider how the sample points lumped together near the origin in figure 1.

Let's test it out! The left plot of figure 2 shows some realizations of (u1,u2)(u_1, u_2). The right plot shows how these realizations map to the disk using the adjusted r(u1)r(u_1) and θ(u2)\theta(u_2). The tendency to lump together near the origin from figure 1 is gone as we hoped.

Fig 2: Uniform distribution (left) mapped to the disk (right).

We still have some lumping but at least the lumps themselves are now somewhat distributed over the domain. The lumping occurs only because there are lumps in the input, i.e. the random variable realizations shown on the left of figure 2. We can solve this problem by replacing uniform sampling with low discrepancy sequences.

2. Applying Low Discrepancy Sequences

In the previous section we used probability theory to derive a sampling scheme that is correct mathematically. In this section we will see that usage of low discrepancy sequences can be a bit more like an art than a rigorous mathematical procedure (although mathematics do play an important role).

The inversion method yielded two functions r(u1),θ(u2)r(u_1), \theta(u_2) that take as input pairs of uniform random variables (u1,u2)(u_1, u_2) and return sample points that are uniformly distributed on the disk (see equation 7). But the inherent "lumping" of uniformly sampled random variables carried over to the points on the disk as shown in figure 2. To fix this, how about we just deterministically choose some input points (xi,yi)(x_i, y_i) that are spread out and have no lumps in them? An obvious input to try is a regular grid of points (since these are spread out and have no lumps). I show this experiment in figure 3. On the left you see a plot of a grid of 12×1212 \times 12 points (xi,yi)(x_i, y_i) and on the right we have the corresponding points (r(xi),θ(yi))(r(x_i), \theta(y_i)) mapped to the disk. By comparing this with figure 2, we see that the random lumping is gone and instead replaced with regular spokes. Note how each spoke corresponds to a row of points in the plot on the left.

Fig 3: A regular grid of points (left) mapped to the unit disk (right).

How can we avoid these spokes? The problem is of course that all 1212=14412 \cdot 12 = 144 input points (xi,yi)(x_i, y_i) only have 12 distinct yiy_i values and thus also 12 distinct θ(yi)\theta(y_i) values. We can solve this with irrational numbers. In the regular grid in figure 3 I chose yi=(i/12)mod1y_i = (i / 12) \mod 1. If we instead use (8)yi=(ik)mod1y_i = (i \cdot k) \mod 1\tag{8} where kk is an irrational number, we get something like in figure 4 where I used k=πk = \pi. Since π\pi is irrational all yiy_i values will be distinct. In fact the sequence yiy_i is an actual low discrepancy sequence if kk is chosen wisely. In the plot of (xi,yi)(x_i, y_i) on the left side of figure 4 we see that the sample points are now slightly skewed in the vertical direction. These slight differences in yiy_i cause slight differences in θ(yi)\theta(y_i). The result of this is that spokes become "curled" as shown on the right of figure 4.

Fig 4: Inputs created via irrational numbers cause the spokes to "curl".

The next step is to choose a "better" value for kk so that we can get rid of the spokes. The following may sound weird if you hear it for the first time, but mathematicians have found that some irrational numbers are more irrational than others. Numbers that are only slightly irrational (like π\pi) will yield only a minor curl (see figure 4). As the "degree of irrationality" of kk increases, the curling effect gets stronger until the point where the spokes mix into each other and thus seem to vanish (at least for the human eye). For a visualization of this mathematical phenomenon and its relation to "degrees of irrationality", I recommend this video by Numberphile.

The most irrational number is the golden ratio, i.e. ϕ=1+52=1.618. \phi = \frac{1+\sqrt{5}}{2} = 1.618 \ldots.

In figure 5 you can see what happens if we use k=ϕk = \phi in equation 8. With k=ϕk = \phi the curling effect is so strong that the spokes have blended together and thereby disappeared (technically they are still there, we just don't perceive them).

Fig 5: Using the golden ratio to generate input points.

The polar plot is starting to look pretty good now. The only remaining problem is that the input points (xi,yi)(x_i, y_i) still only have 12 distinct xix_i values. For example this is the reason we see a circle of points near the origin on the right side of figure 5. The resolution to this is fairly simple. We simple set xi=i/Nx_i = i / N where NN is the number of sample points. This essentially causes xix_i and therefore also r(xi)=xir(x_i) = \sqrt{x_i} to increase gradually for all points. I show the final result in figure 6. I hope you agree that this is what we want. The points on the disk are now evenly spread out, there's no gaps, and it is irregular.

Fig 6: The final version has no gaps or holes, yet it is still irregular.

3. Conclusion

We used the inversion method to devise a scheme to sample uniformly from the unit disk. We then modified this scheme by replacing the uniformly sampled input (u1,u2)(u_1, u_2) with explicit low discrepancy sequences (xi,yi)(x_i, y_i). The final sampling scheme can be formulated as r(xi)=xi=i/N,θ(yi)=2πyi=2π(iϕmod1).r(x_i) = \sqrt{x_i} = \sqrt{i/N}, \quad \theta(y_i) = 2\pi y_i = 2\pi (i \phi \mod 1).

Consider that the output of the inversion method was a pair of expression r(u1),θ(u2)r(u_1), \theta(u_2) that was fundamentally random, since they depended on realizations of random variables u1,u2u_1, u_2. It is interesting to consider that we sacrificed this randomness when we replaced the input with explicit sequences (xi,yi)(x_i, y_i). In other words, we made the input deterministic and therefore the output became completely deterministic as well. Since the output looks random but really isn't, we call such sets for "quasi-random".

Another thing to note is that the method explained here generalizes to other domains as well. For example, we could use the inversion method to derive a scheme to sample uniformly across the 3D sphere and then feed the resulting functions Pθ1P_\theta^{-1} and Pϕ1P_\phi^{-1} with low discrepancy inputs based on the golden ratio. I show the result of doing this in figure 7.

Fig 7: Low discrepancy sampled 3D sphere.

4. Code

Below I include the Python scripts used for the generation of the plots in this document. I used Python 3.7.0 and matplotlib 3.0.2.

import numpy as np
from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as plt

# For determinism.
plt.rcParams['svg.hashsalt'] = 42
np.random.seed(0)

dot_size = 20
N_sqrt = 12
N = 12*12
golden_ratio = (1 + 5**0.5) / 2.0

def xs(rs, thetas):
  return rs*cos(thetas)

def ys(rs, thetas):
  return rs*sin(thetas)

def draw_subplot(idx, xs, ys, ylim, xlim, viz):
  plt.subplot(1, 2, idx, aspect = 'equal')
  plt.ylim(ylim)
  plt.xlim(xlim)
  plt.scatter(xs, ys, s = dot_size)

  if viz:
    theta = np.linspace(-np.pi, np.pi, 100)
    plt.plot(np.sin(theta), np.cos(theta), color = 'red')
    plt.scatter([0], [0], color = 'red', s = dot_size)

# Create two plots side by side.
def gen2(
  xs1, ys1, circle_viz1, ylim1, xlim1,
  xs2, ys2, circle_viz2, ylim2, xlim2,
  filename
):
  print("Creating " + filename)
  draw_subplot(1, xs1, ys1, ylim1, xlim1, circle_viz1)
  draw_subplot(2, xs2, ys2, ylim2, xlim2, circle_viz2)
  plt.tight_layout()
  plt.savefig(filename, bbox_inches = 'tight')
  plt.clf()

def gen2_cart_polar(us, filename):
  rs = np.sqrt(us[0])
  thetas = 2 * pi * us[1]
  gen2(
    us[0], us[1], False, (0, 1), (0, 1),
    xs(rs, thetas), ys(rs, thetas), True, (-1, 1), (-1, 1),
    filename
  )

def gen_naive():
  n = 400
  rs_uniform = np.random.uniform(0, 1, n)
  thetas_uniform = np.random.uniform(-np.pi, np.pi, n)
  rs_sqrooted = np.sqrt(rs_uniform)
  gen2(
    xs(rs_uniform, thetas_uniform), ys(rs_uniform, thetas_uniform), True, (-1, 1), (-1, 1),
    xs(rs_sqrooted, thetas_uniform), ys(rs_sqrooted, thetas_uniform), True, (-1, 1), (-1, 1),
    'naive.svg'
  )

def gen_uniform():
  us = np.random.uniform(0, 1, (N, N))
  gen2_cart_polar(us, 'uniform.svg')

def gen_regular():
  indices = arange(0, N, dtype=float)
  us = [
    np.floor(indices / N_sqrt) / N_sqrt + 0.5/N_sqrt,
    np.mod(indices, N_sqrt) / N_sqrt + 0.5/N_sqrt
  ]
  gen2_cart_polar(us, 'regular.svg')

def gen_irrational1():
  indices = arange(0, N, dtype=float)
  us = [
    np.floor(indices / N_sqrt) / N_sqrt + 0.5/N_sqrt,
    np.mod(indices * np.pi, 1.0)
  ]
  gen2_cart_polar(us, 'irrational1.svg')

def gen_golden_ratio1():
  indices = arange(0, N, dtype=float)
  us = [
    np.floor(indices / N_sqrt) / N_sqrt + 0.5/N_sqrt,
    np.mod(indices * golden_ratio, 1.0)
  ]
  gen2_cart_polar(us, 'golden_ratio1.svg')

def gen_final():
  indices = arange(0, N, dtype=float)
  us = [
    (indices + 0.5) / N, # push by 0.5 to "center" sample point
    np.mod(indices * golden_ratio, 1.0)
  ]
  gen2_cart_polar(us, 'final.svg')

gen_naive()
gen_uniform()
gen_regular()
gen_irrational1()
gen_golden_ratio1()
gen_final()
Listing 1: Generation of 2D disk plots.
import numpy as np
from numpy import pi, cos, sin, sqrt, arange
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def gen_spherical():
  # Create a sphere
  r = 1
  N = 800
  golden_ratio = (1 + 5**0.5) / 2.0
  indices = arange(0, N, dtype=float)
  # us = np.random.uniform(0, 1, (500, 500))
  us = [
    (indices + 0.5) / N, # push by 0.5 to "center" sample point
    np.mod(indices * golden_ratio, 1.0)
  ]

  pi = np.pi
  cos = np.cos
  sin = np.sin
  phi = np.arccos(1 - 2 * us[0])
  theta = 2 * np.pi * us[1]
  x = r*sin(phi)*cos(theta)
  y = r*sin(phi)*sin(theta)
  z = r*cos(phi)

  fig = plt.figure()
  ax = fig.add_subplot(111, projection='3d')
  ax.scatter(x, y, z, s=40)

  ax.set_xlim([-1,1])
  ax.set_ylim([-1,1])
  ax.set_zlim([-1,1])
  ax.set_aspect("equal")
  plt.tight_layout()
  plt.savefig("spherical.svg", bbox_inches = 'tight')

gen_spherical()
Listing 2: Generation of 3D spherical plot.