Why voxels

The motivation behind my research surrounding voxels is rooted in a conviction in their potential to shape the future. Through this blog post, I aim to demonstrate how their inherent properties enable a new era of interactivity, unrestrained by the limitations of traditional rasterization technologies.

Meshes are tricky

Voxels are more than just a solution to some graphics/rendering problems. It is fundamentally a Human-Computer Interaction (HCI) issue in the way we produce content. Creating a 3D model and making it work in your game engine can be non-trivial:

Hitboxes
UV Mapping and Texturing
Normal mapping
Light probes
Holes
Non-Manifold Geometry
Overlapping Vertices and Faces
Creased Edges
Messy Topology
T-Junctions

On the other hand, creating voxel geometry is easy: right-click to build, left-click to destroy. There could be more advanced actions, but these two will take you a long way. Voxels are an incredibly intuitive way to interact with the virtual world. Even an ape knows how to do it!

You might think this is a skill issue on my part. "You're not an ape. Just get good at 3D modeling!" Ask anyone who's played Minecraft why they'd spend their entire childhood on it. The reason is usually this: because of its simplicity. Indeed, voxel modeling lowered the threshold of entry such that even a 15-year-old could figure out how to build a house in Minecraft with minimal training, and that is going to have far more implications than most realize.

Voxels enable greater levels of interactivity

In voxel games, each voxel acts as a manipulable building block, inviting users to interact with their environment in a direct and tactile manner. These kinds of interactions are analogous to the physical act of sculpting, carving, grinding, and polishing, which we have been doing since the stone age. However, AAA titles rarely take advantage of this level of interaction.

To be fair, the geometry representation isn't completely at fault here. Many lighting techniques we use - such as baking global illumination into textures and artist placements of light probes - are built upon the assumption that the scene is static and created by artists. And that's why interactive global illumination was the first problem I set out to solve when I started to build the Dust Engine.

Game designers sometimes limit interactivity to preserve narrative consistency or manage the workload, and storytelling in open-world games has long been a contentious topic. While it may be out of scope for this article, it underscores the narrative frontiers that voxel interactivity is going unlock: emergent gameplay. In voxel games, players often find themselves crafting their own stories independent of a scripted narrative. These moments are not preconceived by the game's creators but are direct results of the game's interactive potential.

Voxels, therefore, aren't merely a technical choice; they're a philosophical one. It is a choice chosen to maximize the chances that the players would engage with a sandbox where they could compose their own stories, written in real-time through every action and decision.

Voxels simplify game development

It's easy to do collisions and CSGs on voxel models, and that gives you more than a faster physics engine.

Procedural Generation

Voxels are easier to model for humans, and that is the case for procedural generators too.

What makes voxel the optimal format for procedurally generated content? Let's think about a common case here: you have a mountainous terrain with a lot of surface features - rocks, heavy vegetation, dripstones, crystals... Now you would like to procedurally spawn a contemporary house there. How do you make the environment organically blend together with the house without models clipping into each other? When the models are made from voxels, you just need to modify the model for the terrain, foliage and surface features by removing the cells occupied by the castle, and you'd end up with something that is at the very least correct. Even if the house was arbitrarily rotated, voxels from the foliage will not clip into the walls by more than the width of 1 voxel. I'm not saying that it can't be done with triangle meshes, but it will certainly be a nightmare.

Content Workflow

If you download some random voxel models from the internet, they are more likely to "just work" in your own game. You don't need hitboxes, normal maps, bump maps, or displacement maps. When the artist creates the model, they don't really have to know much about the environment in which the model is going to be used, because the environment will adapt to the model. Meanwhile, the fact that you can do CSG easily on voxel models allows you to do version controls on the model. When you do need to change the model for your game, you can easily create a "fork" and apply a "diff" specific to your game, while the model itself remains generic and suitable for a wider range of purposes.

I'd be really happy if we could build a GitHub / Cargo registry but for voxel models. How nice would it be if you could just npm install castle.vox and have everything working right out of the box! This would be crucial for indie game studios. Voxel art can also be less resource-intensive to create, allowing small teams or solo artists to produce game arts with appealing visuals. These arts can then be used by downstream indie game developers so that they can create content without relying on a large art department, and the profits from game sales can be shared with the artists in a predefined way. This means a greater variety of game experiences for players and opportunities for developers.

There are certainly existing asset stores for meshes out there, but they're nowhere close to the workflow I'm describing. I'm also not saying that a content workflow like this will never happen with mesh models one day. However, when we're the ones building the engine, we get the opportunity to define the ecosystem surrounding that engine.

Voxels empower the players

In the early days of filmmaking, the production of content was reserved for professionals, requiring years of training and access to specialized equipment. But today, when you press the capture button in your camera app, you don't have to think about shutter speed, focal length, or ISO. This democratization of content production has been the catalyst for the rise of platforms like YouTube and TikTok, which are built on user-generated content.

Voxels, in this context, represent the tools for creation in the hands of the everyday user, just like the camera on your iPhone. With a click of a button, an ordinary person can sculpt, build, and bring to life their stories and imagination, quickly and with ease. For an example of how that might work, just take a look at Minecraft. It's definitively more than a game that allows you to play it any way you want. People have created not just cities or architectures, but also calculators, tic-tac-toe AIs, or even a Turing Machine.

Do note here that I am not implying that a random 15-year-old will be able to create a 40-hour gaming experience matching the quality of an AAA title. What I propose is the beginning of a journey — just like how a young tech enthusiast like MKBHD could get started with homemade videos and evolve into a studio with 30 employees.

The big problems for voxel games

Voxels are hard. Minecraft demonstrated the potential of voxel games to some extent, but it's clear that we're far from "solving" voxels. Brian Karis's 2022 HPG talk Journey to Nanite explained some of the challenges when they tried to replace explicit surfaces (meshes) with implicit geometry (voxel SDF), and here is what he said:

It is clearly time to pivot. Voxels have too many hard problems remaining, my velocity in solving them is far too low. I believe it would need multiple years of compounding research and industry experience to be able to replace explicit surfaces completely. We aren't there yet and even if we were it's unclear if it would be better.
I learned a big lesson with my foray into voxels. Listen to your gut. I never expected this to work and I only explored it to prove it didn't. It is nearly impossible to prove something can't be done. Don't do this.

Although Dust Engine also tries to solve the hard problems in voxel rendering, it is important to note here that voxel game development is a balance between multiple constraints, and geometric complexity isn't everything. So, let's talk about those hard problems.

Scalability

The primary concern when you are working with voxel-based representations will almost always be the challenge of managing large data sizes. Unless you're making yet another Minecraft clone, longer view distances and more detailed geometry will definitively be appreciated. Unlike 2D textures — which typically scale quadratically with increased resolution — volumetric datasets scale cubically. Therefore, efficient organization, compression, and rendering techniques are crucial.

This aspect of the problem closely relates to what virtualized geometry renderers like Nanite tried to solve, but our goals and priorities differ. For Nanite, their primary objective is to push geometric complexity & polycount to the subpixel extremes without drastically changing the content pipeline, and voxels are evaluated as a means to that end. When software rasterization proved to have better performance than Voxel DDA, it's only reasonable that Brian would pivot. However, for us, replacing meshes completely with voxels is the goal itself, and we will do that even if our geometric complexity lags behind Nanite by a bit. I consider this the differences between a moderate reform and a revolution.

Nanite theoretically gives us faster GBuffer results than tracing primary rays, because instead of walking the BVH for every pixel every frame, you only need to do it once per frame. However, as I have explained in the previous section, the level of interactivity promised by voxels will not exist without our lighting techniques also being free of precomputed hacks. In practice, the incoherent final gather rays from secondary hit points will always dwarf the cost of tracing coherent primary rays (unless, as Brian suggested, your geometry is so detailed that all your triangles are subpixel). It is for this reason that I choose to trace primary rays instead of building something like Nanite at this time - The engineering cost to build a separate pipeline specifically to replace primary rays is not worth it, and you probably also don't want to maintain two copies of the BVH in VRAM, one for virtualized geometry and one for ray tracing secondary rays.

Global Illumination and PBR

When you're developing a voxel game, you would generally consider everything dynamic. That means no baking, no pre-computed tricks, and no shortcuts. Each luminous voxel would be considered a light source, so you'd also have a many-light problem at hand. Sure, you can solve some of those problems with technique X or technique Y, but then the players could be doing something weird that totally breaks your graphics, and you're left on an endless journey of adding more and more graphics tricks until the engine becomes a pile of unmanageable hacks. These factors necessitate a ray tracing renderer with fully real-time global illumination and PBR support.

So why PBR rendering? Why can't we just try to get away with something else, like anime-style shading? The decision to pursue full real-time global illumination was driven by the goal that everything should look nice by default. Sure, good graphics doesn't make a game more fun, but they certainly help your game stand out from the countless other Minecraft clones out there. Our eyes are calibrated to the way light and material interact with each other in the real world, so any alternative visual style will likely be related to a subset of the light interactions that a photorealistic renderer could simulate. That way, you're left with the maximal amount of information to work with if you're looking to implement an alternative visual style. It is the consistency, standardization, and predictability of rendering that we're looking for here, and photorealism is just a side product of that goal.

Fortunately, a lot of research in this area can be reused, and we might even be able to improve these techniques by taking advantage of the unique attributes of voxel models. To get a low-noise ray-traced image, the most important thing to consider is to reduce variance in the source image, and that means importance sampling. We just have to pick our samples very carefully so that we have a low-variance image that the denoiser is happy to work with. I would like to refer you to this ReSTIR talk made by Chris Wyman here:

This (tiny shadow maps for many lights) is an example of an insightful observation from Eric Enderton, a.k.a. Enderton's Law: The more coherent your queries, the more work you're wasting. To get coherent work in rasterization, we often perform unneeded computations. Amortizing over discarded computations isn't really computation; it's wasted work.

Dust Engine is going to implement a renderer that achieves real-time global illumination using a combination of ReSTIR, world space irradiance cache and screen space light probes. Stay tuned for future updates.

Physics

Voxels enable cheap collision with all sorts of shapes, but we're still quite far from solving voxel physics. Some areas of interest:

Rigid Body Dynamics. We already have the geometric representation on the GPU for rendering. Can we abuse hardware ray tracing for physics? Is it going to be faster?
Destruction. Teardown did a great job there, but can we do better?
Explosions. Sure, deleting some blocks might sound easy, but can we nuke a city with 130 billion voxels and still have a realistic simulation without dropping a frame? You know, with voxels flying around and stuff.
Fluid Dynamics. When the player unleashes an unexpected flood from an underground lake, that water better be real. How do we create fluid simulation that are not only realistic and fast, but also easy and fun to interact with?

Human-Computer Interaction

The basics are simple: Left click to place, right click to break. But when we scale this up to billions of voxels, accurately selecting and manipulating small voxels can be difficult, especially in a 3D space with a 2D input device like a mouse. How do we keep things intuitive and manageable? With the emergence of VR, how do we maximize immersion and engagement? Can we use hand-tracking to make digging and carving even more intuitive? I have many ideas but they will require experiments and research to verify.

Skeletal animation

This would only be relevant if you want to go all-in on voxels and use them for characters too. Unlike common belief, I think the value in this would be small - you don't really benefit much from making characters with voxels. However, for some people, they might still do it for the sake of visual coherency. Opening this can of worms would be too much for me at this stage, but it's certainly something we can research in the future.

Lack of tooling

One of the most significant hurdles in the advancement of voxel games is the lack of (usable, not vaporware) tools and engines designed for voxels at scale. While traditional 3D games can lean on a robust ecosystem of mature tools like Unity, Unreal Engine, and Maya, voxel game developers often find themselves in a Wild West scenario—either adapting existing toolchains by converting their voxel content to mesh, thus dropping all the benefits voxel models have to offer, or making their own from scratch.

This is the problem Dust Engine aims to solve, but this is going to take a long time. What we have at present is a Proof of Concept at best, and there’s a substantial gap between what I envision it to be and what is achievable for me as an individual developer working on it part-time. This gap represents an opportunity for collaboration. The task at hand is larger than any one developer or even a small team can tackle alone - that's why Dust is an open-source project licensed under MPL2.0, and it'll stay that way for the foreseeable future. Let us come together to forge the tools of tomorrow.

If any of this sounds interesting to you, I encourage you to check out our GitHub repository, our website, and join us on Discord. We need all the help we can get, especially if you are a:

System Programmers: Check out our Software Engineering Taskboard which contains things we need to do in order to make sure that the Dust Engine stands on solid ground as a graphics system.
Graphics Researcher: Check out Graphics Research Taskboard. These are the things we need to know and understand before we can implement important rendering features in Dust.
Technical Writer: Help us write our Wiki. We need to document engine features, summarize research findings, provide user instructions, and more.
Web Developer: Help us make our website better so that people can easily find the information they need!

I want voxels to become as universal and ubiquitous as dust particles, and you can help us get there.

Volumetric Data Structure

How we store the data, and how we render them efficiently on screen.