Here I maintain a log of software projects that I have worked on over the years. Some of these are hobby projects while others were made in an academic or professional context.
I have found ReSTIR to be a delightfully flexible toolbox. It offers so many tweakable knobs and in this project I wanted to see how far I could push it in terms of quality and performance. My goal was to implement multi-bounce diffuse in a way that would run well on efficiency-oriented GPUs, like those found in laptops and mobiles.
By sharing new ReSTIR candidate samples across 4x4 pixel tiles and by using shadow mapping, my implementation fires at most 1/8th rays per screen space pixel. Within this tight budget I managed to sample both the sun and local lights (see figure 1). I achieved this by slightly reformulating ReSTIR GI to also take direct light from emissive triangles into account. I reduced noise by using a mixture PDF for the candidate samples that stochastically picks between BRDF sampling or explicit light sampling.
I approximated infinite (diffuse) bounces via a sparse cascaded voxel irradiance cache (see figure 2). The cache used a ranking system heavily inspired by Kajiya Renderer to effectively avoid leaking. I achieved fast parallel voxel reallocation by arranging voxel data in a ring buffer designed for time-sliced compaction. I also found that the cache was useful in screenspace denoising: Instead of starting integration from zero (black) when surfaces are disoccluded, the denoiser queries the cache for a much better starting value.
The renderer was built directly on top of MacOS using Rust and Metal. Using the scene in figure 1, a full frame (including gbuffer, motion vectors, ReSTIR, denoising, etc.) takes approximately 10ms on a Macbook Air M1 (2.6 teraflops).
In this hobby project I wanted to build a realtime GI solution for less powerful devices (e.g. laptops), and to explore realtime importance sampling techniques. I settled on a realtime lightmapper based on Metal's intersection API.
I used a modular design that can switch between integrator and filter implementations at runtime. In addition to a standard MC integrator, the renderer includes a path-guided integrator (figure 4), and a ReSTIR integrator.
Inspired by Stachowiak, the renderer captures short-term statistics (mean, variance) for each lightmap texel. This data enables you to approximate standard deviations, which are then used to clamp history buffers. This yields a temporal filter that is more stable and more reactive than a naive Exponential Moving Average (at the cost of increased memory usage).
Lightmaps can be a pain but many issues can be addressed via careful implementations. I wrote a specialized lightmap rasterizer which among other things take bilinear filtering into account and approximates the ideal sample position within each lightmap texel. Inspired by Precomputed Global Illumination in Frostbite (Yuriy O'Donnell), I also added an adaptive chart packer (parallelized on GPU using the MapReduce model), which guarantees no bleeding between charts while ensuring high texture utilization (figure 6).
Inspired by the irradiance cache structure in Tomasz Stachowiak's amazing Kajiya renderer, we spent a Unity Hackweek prototyping an idea for an efficient probe cache.
The cache is fully GPU-driven and is sparse in the sense that it only allocate probes where they are needed (usually near surfaces). Once allocated, probes integrate irradiance into spherical harmonics. Probes that haven't been requested for a number of frames, are automatically deallocated.
On low-end devices you may want to query the cache directly. On high-end devices, the cache can service a high quality final gather pass (similar to Epic's Lumen). We had many more ideas we wanted to try out, but you can only do so much in a week.
Inspired by Project PICA PICA and EA GIBS, I implemented realtime GI based on surfels. One of the key benefits of this approach is that it doesn't require UV mapping and that it works relatively well with most types of geometry: static, dynamic, skinned, high/low frequency. Another is that the surfel structure can sample itself which yields relatively cheap infinite light bounces.
I precompute surfel positions/normals at mesh import time. This means that no computation is spent on surfel placement at runtime and that we get light bounces even from surfaces which the camera hasn't yet seen. The renderer was written in Rust/Metal and runs on a Macbook Air.
In this project I maintain a temporally integrated surface cache encoded as UV-mapped lightmaps. The fact that the cache can resample itself over time means that I can approximate infinite bounces using relatively few rays per frame. Inspired by Epic's Lumen, I do a screenspace final gather on top of the surface cache which is filtered temporally and spatially. The final gather affords camera-dependent resolution of secondary rays, something that would not have been possible using the surface cache alone.
The renderer supports dynamic lights and dynamic geometry. It is written in Rust and Apple Metal, and it runs smoothly on a Macbook Air M1 (without a discrete GPU).
A Constructive Solid Geometry realtime pathtracer that utilizes temporal and spatial filtering techniques to eliminate noise. In each frame each pixel samples the rendering equation integral (several bounces) and accumulates the results via a Moving Exponential Average. The output is then denoised via my adaption of SVGF.
The pathtracer supports realtime changes to geometry and lighting. In addition to diffusely bounced light, I added support for volumetric fog (extinction + in-scattering), day/night cycle, simple color grading, and a subtle vignette effect. It runs at 60fps on a Macbook Air.
At a Unity hackweek our team made a Minecraft clone. Inspired by Teardown I implemented realtime voxel traced sky occlusion on the GPU and integrated it into Unity's Universal Render Pipeline.
We adopted a sparse data layout that significantly reduced memory usage by not storing anything in empty regions of the world.
In this hobby project I used Monte Carlo integration to calculate irradiance probes across several frames. I automatically generated a 3D SDF texture to enable fast raytracing on the GPU. The probe data was encoded using a sphere-to-square octahedral projection to ensure efficient per pixel sampling during shading.
The renderer was written from scratch in Rust and Metal. It was heavily inspired by SDFGI (Linietsky) and DDGI (Majercik, Guertin, Nowrouzezahrai, McGuire).
At Unity Hackweek 2020 my group and I implemented GPU raymarching of signed distance fields stored as 3D textures. We used this to generate directional occlusion probes in realtime.
I started this project to solidify my understanding of pathtracing, BRDFs, and importance sampling. A few highlights:
In realtime computer graphics we are often interested in compressing sets of spherical functions such as an irradiance field. In my thesis I evaluated and compared several known spherical function bases such as Spherical Harmonics, Spherical Gaussians and Ambient Cubes. The result was a set of recommendations about which encoding techniques to use for particular types of signals (irradiance, radiance, occlusion/visibility, etc.).
I received the maximum grade for my report and defence. You can read the thesis here.
A machine learning-based denoiser can smooth out variance in lightmaps caused by low sample counts. This allows you to generate good-looking lightmaps much faster since you do not need to wait for convergence.
At Unity Hackweek 2019 my group and I ported the Intel Open Image Denoiser to Unity's Barracuda platform which enabled it to run on the GPU. Our primary goal was to learn about machine learning denoising.
For our final assignment in a course about parallel computation at university, my group and I wrote an adaptive GPU pathtracer written in C++/CUDA. The pathtracer detects converged pixels and removes them from the working set. Our primary focus was to make this detection and reduction logic as efficient as possible on modern GPUs.
Lightmap baking involves packing all lightmapped object into a set of lightmaps. I devised and implemented a stable packing algorithm that bundled object that were nearby in world space into the same lightmaps. The benefit of this is that it makes it possible to batch draw calls more efficiently at runtime.
Due to other priorities at Unity, this feature unfortunately never shipped.
For Unity Hackweek 2018 my group and I added explicit sampling of disk/sphere/line lights in Unity's progressive lightmapper.
A known problem with lightmapping is seam artifacts along the borders of the UV islands that are neighbours in object space but separated in lightmap space. To solve this problem in Unity, I implemented a technique that "stitches" together the seams by performing a least square error minization over the border texels of the UV islands. My solution to this was heavily inspired by Naughty Dog and Sebastian Sylvan.
An implementation of the lockstep synchronization algorithm used in some types of networked games. By making the simulation (collision detection etc.) fully deterministic on the client, you only need to transmit player actions across the network (as opposed to continuously transmitting the full server state).
For fun and educational purposes I wrote a game engine from scratch in C++/OpenGL. A few highlights:
Source code is available on Github.