Local Lights in HypeHype by Jarkko Lempiäinen

When I joined HypeHype, one of the most requested yet technically challenging features was missing: local lights. At the time, creators could only use directional sun and ambient lighting, which severely limited the visual richness and atmosphere of user-generated games.

HypeHype is a UGC platform where anyone — even kids as young as 7 — can create interactive content. That means our lighting system must be intuitive, robust, and scale effortlessly, even when users fill the world with dozens of lights. It has to just work — no performance drops, no visual artifacts.

To solve this, I developed a new stochastic lighting algorithm, taking it from concept to prototype to production in just six months. There's still more to come — denoising and further optimizations are in progress — but we now have a fully dynamic, fixed-cost local lighting system with shadows, running efficiently across a wide range of mobile devices.

For comparison, engines like Unity HDRP use clustered lighting with a limit of 64 lights per 16×16 pixel tile. Exceeding that leads to visible tiling artifacts and skyrocketing GPU costs — not acceptable for our use case. Unlike traditional game development, we can’t rely on experienced lighters working around technical constraints. Our lighting system must deliver great results by default, with minimal setup, even for non-technical users.

Performance is also critical. Games in HypeHype need to run smoothly across a vast range of devices — from sub-$100 Android phones to high-end gaming PCs. A game built on the latest iPhone should still perform well on entry-level hardware. On top of that, lighting must remain visually consistent across devices, since even slight discrepancies — like lights getting dropped — could impact gameplay in unexpected ways. We considered limiting the number of lights during creation, but… where’s the fun in that? 😄

We evaluated existing modern techniques like ReSTIR, ReGIR, and other advanced sampling methods, but none of them were well-suited for real-time lighting on mobile. Ray tracing was definitely off the table 😄 — so we built a custom high-performance stochastic lighting solution tailored to our unique requirements.

The Algorithm

The lighting algorithm begins by selecting 16 lights per big-tile using Weighted Reservoir Sampling (WRS), guided by a basic probability density function (PDF). This PDF is evaluated using stratified sampling across the tile’s footprint to ensure balanced light selection.

Below is a visual of the resulting big-tile reservoir texture. It may not look exciting, but each 4×4 block in this single-channel image corresponds to one big-tile's light reservoir — a compact representation of the selected lights.

Importantly, we use WRS without replacement, which ensures that the light selection is diverse and well-distributed. This is critical for the quality of the next stage: small-tile resampling.

In this stage, we resample 1 to 4 lights per small-tile from the big-tile reservoirs using a more complete PDF, which includes additional terms such as shadow visibility. This yields high-quality per-tile light samples.

Here’s a visualization of the small-tile sample texture — with four samples packed into RGBA channels. We currently use 4 samples per pixel (spp) due to the absence of a denoiser, but the plan is to reduce this to 1 spp with denoising, improving performance without sacrificing visual quality.

Each small-tile covers 64 quads (256 pixels), meaning all those pixels share the same light samples. This design helps amortize the resampling cost and improves GPU wave coherence, since nearby pixels process the same light set — a big win for performance.

The system is designed to run entirely in pixel shaders, allowing us to take advantage of framebuffer compression (FBC) to reduce memory bandwidth — a key factor on mobile GPUs. That said, the reservoir textures are relatively small and contain fairly random data, so the benefit from FBC in those passes may be limited. We’re also exploring compute shader variants with additional optimization opportunities down the line.

Currently, we support point and spot lights, but the system is extensible to other types. If adding more types increases VGPR pressure, we may implement tile classification by light type to maintain good occupancy and performance.

One of the known challenges with small-tile stochastic sampling is correlation artifacts, which can manifest as visible tiling in the final image. To address this, we interleave samples across neighboring tiles using a Gaussian-Poisson distribution. We also randomly offset these interleaved patterns each frame, which helps reduce temporal artifacts and supports temporal accumulation techniques like TAA.

We don’t yet have a dedicated denoiser in place — TAA is next on the roadmap. We’ll first see how far we can push visual quality using TAA alone before introducing a more advanced spatio-temporal denoising solution.

Shadows

We support both static and dynamic shadows for point and spot lights, with shadows enabled by default to ensure creators get high-quality lighting without needing to tweak settings. Point lights use cubemaps remapped to octahedral space, while spot lights rely on standard shadow maps.

All shadow maps share a fixed-size shadow atlas, which makes efficient space management critical. The atlas contains a mix of shadow map resolutions and is dynamically updated as lights move or change. It’s implemented as a 16-bit single-channel texture to keep memory usage and bandwidth low — especially important on mobile.

Each light’s shadow map resolution is determined by its distance to the camera and luminous flux. Maps are allocated, resized, or deallocated as those properties change. We maintain a per-frame update budget for the atlas and spread shadow map updates across multiple frames as needed.

For static shadows, we render only static objects, and shadow map updates are added to the update queue only when needed — for example, if the light moves, rotates, or crosses certain distance thresholds. While they do support dynamic updates, static shadows are primarily intended to be used as a low-cost decorative option for enhancing lighting quality of a scene.

Dynamic shadows are an opt-in feature. When enabled, both static and dynamic geometry are rendered into the shadow map. These updates are prioritized and processed using a round-robin strategy based on update age. If too many dynamic updates are queued, we still ensure at least one static shadow map is updated per frame to avoid starvation. Since dynamic shadows require frequent updates, they are more performance-intensive than static shadows. As such, they’re intended to be used selectively — for example, on key “hero” lights where dynamic shadowing is essential to the scene.

Since shadow updates can be deferred across multiple frames, we evaluate shadows using the position of the light at the time the shadow map was rendered, not its current position. This maintains visual consistency.

Once the shadow map atlas has been updated for the frame, we evaluate shadow terms in a deferred quad-resolution pass before the lighting stage. This separation simplifies the lighting shader and reduces per-pixel VGPR pressure.

Shadow terms are computed per small-tile quad (covering 64 quads), and stored in a fixed-size texture aligned with the light sample layout. During this pass, we also apply IES light profiles to match the physical characteristics of each light source.

Here’s an example of the shadow term texture — each term is packed into the RGBA channels.

To improve execution coherence on the GPU, interleaved small-tiles are spatially ordered into 8×8 pixel blocks, allowing the GPU to better group workloads into efficient waves.

By deferring shadows and evaluating them at quad-resolution, we reduce shader complexity and avoid divergence during the lighting pass — since all shadow computations are already baked into precomputed terms. This helps both performance and scalability.

We’ll continue to profile whether the tradeoff — an extra memory pass and bandwidth cost — is justified by the simpler, more efficient lighting stage and improved wave utilization.

For shadow filtering, we currently support both 4× PCF (gather-based) and a stochastic PCF, which is quite efficient requiring only a single gather per shadow term. While some noise is expected, TAA and denoising should help smooth it out significantly.

Looking ahead, we’re exploring PCSS for contact-hardening shadows and screen-space ray-marched shadows for fine detail. With deferred shadows evaluated at quad-resolution, these advanced techniques could be viable even on mobile — as optional quality upgrades for higher-end devices.

Lighting

The final step is the lighting pass, where we read the small-tile light samples and corresponding shadow terms, evaluate the BRDF per sample, and apply the appropriate weights to ensure unbiased lighting.

Since this runs entirely in pixel shaders, there is some divergence based on light types. However, with only punctual lights currently supported, the divergence remains minimal. In the future, compute shader variants could help eliminate this entirely through better control over execution paths — especially relevant if we decide to support more complex light types, such as area lights, in the future.

And here’s a look at the final composed image after lighting is applied — bringing together all the pieces of the pipeline.

Published - June 6, 2025