Project Log

Here I maintain a log of software projects that I have worked on over the years. Some of these are hobby projects while others were made in an academic or professional context.

1. Realtime GI Using ReSTIR And Irradiance Caching (2024)

I have found ReSTIR to be a delightfully flexible toolbox. It offers so many tweakable knobs and in this project I wanted to see how far I could push it in terms of quality and performance. My goal was to implement multi-bounce diffuse in a way that would run well on efficiency-oriented GPUs, like those found in laptops and mobiles.

By sharing new ReSTIR candidate samples across 4x4 pixel tiles and by using shadow mapping, my implementation fires at most 1/8th rays per screen space pixel. Within this tight budget I managed to sample both the sun and local lights (see figure 1). I achieved this by slightly reformulating ReSTIR GI to also take direct light from emissive triangles into account. I reduced noise by using a mixture PDF for the candidate samples that stochastically picks between BRDF sampling or explicit light sampling.

I approximated infinite (diffuse) bounces via a sparse cascaded voxel irradiance cache (see figure 2). The cache used a ranking system heavily inspired by Kajiya Renderer to effectively avoid leaking. I achieved fast parallel voxel reallocation by arranging voxel data in a ring buffer designed for time-sliced compaction. I also found that the cache was useful in screenspace denoising: Instead of starting integration from zero (black) when surfaces are disoccluded, the denoiser queries the cache for a much better starting value.

The renderer was built directly on top of MacOS using Rust and Metal. Using the scene in figure 1, a full frame (including gbuffer, motion vectors, ReSTIR, denoising, etc.) takes approximately 10ms on a Macbook Air M1 (2.6 teraflops).

Fig 1: The renderer samples sun light and local light by using only 1/8th ray per pixel.
Fig 2: A cascaded sparse voxel irradiance cache enables approximation of infinite bounces.

2. Realtime Lightmapper (2022)

In this hobby project I wanted to build a realtime GI solution for less powerful devices (e.g. laptops), and to explore realtime importance sampling techniques. I settled on a realtime lightmapper based on Metal's intersection API.

Fig 3: The renderer has support for dynamic lighting and dynamic geometry.

I used a modular design that can switch between integrator and filter implementations at runtime. In addition to a standard MC integrator, the renderer includes a path-guided integrator (figure 4), and a ReSTIR integrator.

Fig 4: The path-guided integrator progressively learns which directions and emissives are important.

Inspired by Stachowiak, the renderer captures short-term statistics (mean, variance) for each lightmap texel. This data enables you to approximate standard deviations, which are then used to clamp history buffers. This yields a temporal filter that is more stable and more reactive than a naive Exponential Moving Average (at the cost of increased memory usage).

Fig 5: The statistics-guided temporal filter is more stable and more reactive than standard EMA.

Lightmaps can be a pain but many issues can be addressed via careful implementations. I wrote a specialized lightmap rasterizer which among other things take bilinear filtering into account and approximates the ideal sample position within each lightmap texel. Inspired by Precomputed Global Illumination in Frostbite (Yuriy O'Donnell), I also added an adaptive chart packer (parallelized on GPU using the MapReduce model), which guarantees no bleeding between charts while ensuring high texture utilization (figure 6).

Fig 6: The packer adjusts lightmap resolution as needed and time-slices GPU work across multiple frames.

3. Sparse Probe Surface Cache (2022)

Inspired by the irradiance cache structure in Tomasz Stachowiak's amazing Kajiya renderer, we spent a Unity Hackweek prototyping an idea for an efficient probe cache.

The cache is fully GPU-driven and is sparse in the sense that it only allocate probes where they are needed (usually near surfaces). Once allocated, probes integrate irradiance into spherical harmonics. Probes that haven't been requested for a number of frames, are automatically deallocated.

On low-end devices you may want to query the cache directly. On high-end devices, the cache can service a high quality final gather pass (similar to Epic's Lumen). We had many more ideas we wanted to try out, but you can only do so much in a week.

Fig 7: Our sparse probe cache prototype in action. Probes are allocated and deallocated dynamically.

4. Realtime GI Using Surfels (2022)

Inspired by Project PICA PICA and EA GIBS, I implemented realtime GI based on surfels. One of the key benefits of this approach is that it doesn't require UV mapping and that it works relatively well with most types of geometry: static, dynamic, skinned, high/low frequency. Another is that the surfel structure can sample itself which yields relatively cheap infinite light bounces.

I precompute surfel positions/normals at mesh import time. This means that no computation is spent on surfel placement at runtime and that we get light bounces even from surfaces which the camera hasn't yet seen. The renderer was written in Rust/Metal and runs on a Macbook Air.

Fig 8: Surfels are versatile but sampling them efficiently in a scalable way can be tricky.

5. Realtime GI Using Surface Cache + Final Gather (2021)

In this project I maintain a temporally integrated surface cache encoded as UV-mapped lightmaps. The fact that the cache can resample itself over time means that I can approximate infinite bounces using relatively few rays per frame. Inspired by Epic's Lumen, I do a screenspace final gather on top of the surface cache which is filtered temporally and spatially. The final gather affords camera-dependent resolution of secondary rays, something that would not have been possible using the surface cache alone.

The renderer supports dynamic lights and dynamic geometry. It is written in Rust and Apple Metal, and it runs smoothly on a Macbook Air M1 (without a discrete GPU).

Fig 9: The UV-mapped surface cache works particularly well with low-poly scenes.
Fig 10: A somewhat challenging scene with a small non-analytical light source.

6. Realtime Constructive Solid Geometry Pathtracer (2021)

A Constructive Solid Geometry realtime pathtracer that utilizes temporal and spatial filtering techniques to eliminate noise. In each frame each pixel samples the rendering equation integral (several bounces) and accumulates the results via a Moving Exponential Average. The output is then denoised via my adaption of SVGF.

The pathtracer supports realtime changes to geometry and lighting. In addition to diffusely bounced light, I added support for volumetric fog (extinction + in-scattering), day/night cycle, simple color grading, and a subtle vignette effect. It runs at 60fps on a Macbook Air.

Fig 11: Constructive Solid Geometry affords manipulations that are not trivial to do with polygons.
Fig 12: A minimalistic CSG remake of Sponza that showcases day/night cycle and volumetric fog.

7. Realtime Sky Occlusion Using Voxel Tracing (2020)

At a Unity hackweek our team made a Minecraft clone. Inspired by Teardown I implemented realtime voxel traced sky occlusion on the GPU and integrated it into Unity's Universal Render Pipeline.

We adopted a sparse data layout that significantly reduced memory usage by not storing anything in empty regions of the world.

Fig 13: Demonstration of the effect. I did not have time to do denoising.
Fig 14: Comparison of Unity's screen space ambient occlusion (SSAO) and our effect.

8. Realtime GI Probes Using SDF Textures (2020)

In this hobby project I used Monte Carlo integration to calculate irradiance probes across several frames. I automatically generated a 3D SDF texture to enable fast raytracing on the GPU. The probe data was encoded using a sphere-to-square octahedral projection to ensure efficient per pixel sampling during shading.

The renderer was written from scratch in Rust and Metal. It was heavily inspired by SDFGI (Linietsky) and DDGI (Majercik, Guertin, Nowrouzezahrai, McGuire).

Fig 15: As the directional light moves, the irradiance probes update accordingly in realtime.

9. Realtime Occlusion Probes Using SDF Textures (2020)

At Unity Hackweek 2020 my group and I implemented GPU raymarching of signed distance fields stored as 3D textures. We used this to generate directional occlusion probes in realtime.

Fig 16: Occlusion probes are updated in realtime by raymarching a SDF 3D texture.

10. Hobby CPU Pathtracer (2020)

I started this project to solidify my understanding of pathtracing, BRDFs, and importance sampling. A few highlights:

Fig 17: Cornell box with dielectric, diffuse, and specular materials.

11. Bachelor Thesis: Evaluation of Spherical Function Bases (2019)

In realtime computer graphics we are often interested in compressing sets of spherical functions such as an irradiance field. In my thesis I evaluated and compared several known spherical function bases such as Spherical Harmonics, Spherical Gaussians and Ambient Cubes. The result was a set of recommendations about which encoding techniques to use for particular types of signals (irradiance, radiance, occlusion/visibility, etc.).

I received the maximum grade for my report and defence. You can read the thesis here.

Fig 18: A graph from the report's analysis section that shows RMAE vs space requirements for various bases.
Fig 19: SH illustrations from the report's theory section.

12. Lightmap Denoising Using Machine Learning (2019)

A machine learning-based denoiser can smooth out variance in lightmaps caused by low sample counts. This allows you to generate good-looking lightmaps much faster since you do not need to wait for convergence.

At Unity Hackweek 2019 my group and I ported the Intel Open Image Denoiser to Unity's Barracuda platform which enabled it to run on the GPU. Our primary goal was to learn about machine learning denoising.

Fig 20: A lightmapped scene before (top) and after (bottom) denoising.

13. GPU (CUDA) Pathtracer With Adaptive Sampling (2018)

For our final assignment in a course about parallel computation at university, my group and I wrote an adaptive GPU pathtracer written in C++/CUDA. The pathtracer detects converged pixels and removes them from the working set. Our primary focus was to make this detection and reduction logic as efficient as possible on modern GPUs.

Fig 21: Rendering of animated directional light in GPU pathtracer.
Fig 22: Visualization of per-pixel convergence.

14. Spatially Coherent Lightmaps in Unity (2018)

Lightmap baking involves packing all lightmapped object into a set of lightmaps. I devised and implemented a stable packing algorithm that bundled object that were nearby in world space into the same lightmaps. The benefit of this is that it makes it possible to batch draw calls more efficiently at runtime.

Due to other priorities at Unity, this feature unfortunately never shipped.

Fig 23: Before and after enabling spatially coherent packing (color denotes lightmap index).

15. Explicit Shape Sampling in Unity's Progressive Lightmapper (2018)

For Unity Hackweek 2018 my group and I added explicit sampling of disk/sphere/line lights in Unity's progressive lightmapper.

Fig 24: Unity's progressive lightmapper with explicit sphere sampling.

16. Lightmap Seam Stitching (2017)

A known problem with lightmapping is seam artifacts along the borders of the UV islands that are neighbours in object space but separated in lightmap space. To solve this problem in Unity, I implemented a technique that "stitches" together the seams by performing a least square error minization over the border texels of the UV islands. My solution to this was heavily inspired by Naughty Dog and Sebastian Sylvan.

Fig 25: Seam stitching on lightmapped sphere.
Fig 26: Seam stitching on lightmapped terrain.

17. Networked Lockstep Synchronization (2016)

An implementation of the lockstep synchronization algorithm used in some types of networked games. By making the simulation (collision detection etc.) fully deterministic on the client, you only need to transmit player actions across the network (as opposed to continuously transmitting the full server state).

Fig 27: Deterministic simulation synchronized over TCP (original is 60fps).

18. Hobby Engine (2015)

For fun and educational purposes I wrote a game engine from scratch in C++/OpenGL. A few highlights:

Source code is available on Github.

Fig 28: Sped-up day/night cycle in my hobby engine.