I-m-making-a-game-engine-based-on-dynami

Hi, I'm Mike and I've been working for quite some time now on a game engine based on sign distance fields. I created this video to show what the engine can do and explain how it works. I've started building a game with this engine as well, which I'll say a little more about later. My motivation when starting this project was to build an engine that enables high-fidelity modifications to world geometry during gameplay. In most games, the game world is relatively static. In the games where geometry is dynamic, like Minecraft or No Man's Sky, modifications to the world are usually relatively crude and low fidelity. My engine supports detailed changes to world geometry, adding and removing matter both smoothly or with sharp edges. It also supports non-destructive changes like moving a hole around or creating a tunnel and walking through it and then seeing it disappear behind you. I'm excited about these non-destructive changes in particular, as they enable a lot of interesting gameplay mechanics. The underlying technology that enables all of this uses sign distance fields, or SDFs. SDFs are cool for many reasons, but a big one is that they enable very simple geometric boolean operations between shapes, like union, subtraction, and intersection. These operations can also be smooth, which creates nice looking blends between shapes and removes hard edges. I'm not going to give a full introduction to SDFs in this video as that would take too long and the topic has been covered very well in quite a few other videos. I recommend this short video by Sebastian Log which covers how SDFs are defined mathematically and how they're typically rendered via raymarching. I've included a link in the description. The best sources on the web for SDF related techniques are ShaderToy and Inigo Kiles' website. In general, SDFs on ShaderToy are rendered via raymarching. While this technique is quick to get started with and very powerful, brute force raymarching degrades performance very quickly as it typically evaluates a complex mathematical function many times per pixel. This is how the rabbit hole I went down in developing my engine started. I wanted to fully represent the game world using SDFs, and to have that world be dynamically modifiable, but also to maintain high performance. And it turns out that achieving this goal is far more complex than typical ray-marching methods. My first inspiration was Dreams for PlayStation 4. The game world, as well as all 3D objects in Dreams, are represented via SDFs that can be sculpted in Dreams itself. Dreams is a phenomenal achievement in technology and design on many different levels. My aim isn't to rebuild Dreams, and that would be impossible for a solo developer anyway. Instead, I set out to build an engine based on SCFs, but where all world geometry is dynamically modifiable, which isn't possible in Dreams. This requires a different set of technical constraints, but also provides different gameplay possibilities. Next, I'm going to explain how I've made rendering of SDFs in complex environments fast enough for a game. I've been inspired by many other works here, including Dreams, but my overall approach is a fairly unique combination of techniques. First, some background. A scene in my engine is represented as an ordered list of SDF edits, which is a term I've borrowed from Dreams. You can think of an SDF edit as a shape with a position, rotation, and scale that's applied to the world via a Boolean operation like union or subtraction. So for example, a sphere that cuts a chunk out of the world at a particular position. The basic problem is that the distance function for a complex scene is expensive to evaluate, and standard SDF ray marching evaluates this function many times per pixel. Culling can be used to reduce the number of SDF edits that need to be evaluated at a specific point, but this still hits a performance wall for complex scenes. It's easy to end up with dozens or more of edits that influence a particular point in space. When having to repeatedly evaluate an expensive function, a typical optimization is caching, and that's what I do. Rather than evaluating the scene's distance function from scratch over and over again, I evaluate the function for all points on a grid once and then reuse those cache values when rendering. Consider this simple 2D scene where I smoothly blending the SDFs of a circle and a square Regions with positive distance meaning they outside the curve are shaded in orange and regions with negative distance which are inside the curve are shaded in blue The curve itself is the white line drawn where the distance is zero. As I move the mouse around, the scene's distance function is being continuously evaluated. The radius of the circle that's drawn at the cursor position is equal to the magnitude of the resulting distance. Now imagine that we overlay a grid on the scene and evaluate the scene distance function at each grid point. Each grid point is shaded corresponding to its calculated distance. As I move the mouse around, you can see the distances for the grid points of each grid cell visualized. Because the scene is changing every frame, for now we're still evaluating the function for every grid point every frame, but the number of grid points is far lower than the number of screen pixels. Of course, caching distances evaluated on a grid is useless without using these distances in some way. To approximate the distance to the curve at any point, we first determine which cell the point is in and then gather the four distances at the grid points of that cell. We can then use bilinear interpolation between these distances to calculate the final distance estimate. We first interpolate along the x-axis and then interpolate the resulting values along the y-axis. This calculation can be done with a single GPU texture fetch. Of course, the curve that we reconstruct from these interpolated distances doesn't exactly match the original curve. Here the original curve is still a white line, and I've added a yellow line representing the reconstructed curve. It's relatively accurate as long as the curve is straight or close to straight at the granularity of a grid cell. It's pretty inaccurate near corners, and you can see them wobbling as the square moves. If we increase the grid resolution, the reconstructed curve gets more accurate. This type of technique has been used in games to render font glyphs with sharp edges at any size for a couple decades now. It was written about in this Valve paper from 2007. In the case of text rendering, typically the SCF for each glyph is calculated and cached once, as you don't expect fonts to change during gameplay. The previous animation showed the corners of the square wobbling due to the interpolation of sample distances. This happens in 3D as well as you can see here in my engine as I drag a cube around. The grid resolution is intentionally very coarse for demonstration purposes. It's oddly satisfying just to drag a cube around like this. I'll say more later about why I'm not representing the SDF surface with a triangle mesh, but one benefit of rendering from a grid of cached distances is that the reconstructed surface can actually be quite complex, which means the grid resolution can be lower. This diagram is from an NVIDIA paper that I've linked in the description. It shows the type of 3D surfaces that can be reconstructed from trilinear interpolation of cache distances in a 3D grid. These surfaces are curved and can have complex topology. Now you may be thinking that evaluating and storing a grid of cache distances will be very expensive for larger scenes. And you'd be right. Even at a very coarse grid resolution, you'll be running out of memory very quickly. Let's do a bit of math here. Cache distances can be stored compactly. We don't need a full 4 byte floating point value per distance. We can get away with storing distances as a single byte if we limit them to cover the minimum required distance, which is half the diagonal length of a grid cell. As I'll discuss next, we only care about the surface of the SDF, so representing larger distances is not particularly helpful. Now, even at just one byte per cache distance value, a 3D grid resolution of 1024 cubed already uses a gigabyte of memory. This doesn't work unless you confine your game world to a very small area. The solution to this problem is caching distances sparsely. The previously described approach caches distances densely over the full grid. Since we're rendering the surface of the SEF, we only actually care about distances in cells that contain the surface, or specifically cells with grid points that evaluate to both positive and negative distances. The diagram now shows only the grid cells that contain the 2D curve. We only need to cache distances for grid points corresponding to these cells. Of course, storing a sparse grid is more complex than storing a dense grid. A dense grid can be stored in a buffer or texture with easy access to any entry based on its 2D or 3D index For sparse grids octrees are a common approach but I chose something simpler that works particularly well on the GPU a brick map A brick map is a dense grid of pointers to bricks, which are essentially small, square, or cubic regions of the original cached distance grid. In this diagram, each grid cell is a 2x2 brick. Bricks are allocated from a large texture atlas. As I mouse over cells in the scene, you can see the bricks in the texture atlas they correspond to. And this is me flipping through the slices of the 3D texture atlas used for bricks in my engine. Each pixel represents a single cached grid point. Bricks are 8x8x8, which was inspired by Dreams and works well. With this size of bricks, the brick map pointer grid uses a relatively trivial amount of space compared to the size of the brick atlas. This is a small scene in my engine, with each of these sparsely allocated distance bricks colored differently. You can see the allocation of bricks changing as I move the sphere around. Some people watching this may be wondering why I'm not generating a triangle mesh from the distance field using a technique like marching cubes or dual contouring. The equivalent for this 2D example would be marching squares. A mesh is cheap to render but is more expensive to generate for a given level of fidelity. Because my primary goal is to have a high fidelity dynamic world, I want to optimize for recomputation of scene changes. It's okay if rendering takes a bit of a performance hit, as long as the benefit to scene geometry recomputation is large. Sparsely allocating bricks helps a lot in reducing memory usage, but it's still nowhere near good enough for large scenes or an open world. I want my engine in-game to support vast spaces. The usual solution to this in-games is level of detail, or LOD. Geometry that's further from the player is represented at lower fidelity, which means it's both cheaper to load and cheaper to render. In my case, I'm not just loading geometry, but evaluating the SCF at many points, so it's essential that it be evaluated much less frequently further from the camera. I use an approach inspired by geometry clip maps, which were introduced in this paper from more than 20 years ago. There's a lot of stuff to get right in this type of approach, but the high level is that instead of using a single grid of brick pointers, a set of grids are nested on top of one another, where each successive grid is double the size in all dimensions of the previous grid level. All clip map levels are centered on the player and follow the player's movement. This naturally leads to evaluating the SCF at a coarser resolution further from the player, which ensures that the on-screen size of cached bricks remains relatively constant regardless of their distance to the player. Here, each brick is colored, and I've also turned on colors that clearly delineate the clip map levels. You can see how bricks double in size at the transition to farther clip map levels. Using clip map levels dramatically decreases the memory usage required for an open world. In this scene, where the draw distance is about 2.5 kilometers, at the resolution that's used nearest to the player, over 200 trillion brick map cells would be needed for the full range. With the clip map, only 20 million cells are needed, which is a 10 million fold reduction. One of the main goals for my engine was to enable fast updates to world geometry during gameplay. And the only way I could hope to achieve this given the architecture I chose is to incrementally regenerate just the world regions that change. This is a fairly deep rabbit hole on its own, so I'll just explain some broad strokes. All SDF edits are tracked spatially via a Bounding volume hierarchy or bvh. Specifically, I use a tree of axis aligned bounding boxes, which is common. This data structure is shared between the CPU and GPU. You can see the structure of the tree dynamically updating as I move an edit around here. The aabbtree is useful for ray casts, as shown here where the SDF edits that are tested for exact intersections are highlighted as the ray sweeps across the scene. Querying the tree also allows us to know which SDF edits can affect a region of space. This is used when evaluating the SDF to do less work. It's also used to determine which distance bricks need to be re-evaluated when an SDF edit is changed. Here I turned on a debug mode that colors the bricks that are regenerated each frame You can see that the vast majority of the world is reused frame to frame even when things are changing Note that this approach is not technically mathematically correct for reasons I won't go into here, but it works well enough right now for the types of interactions that are possible in my engine. So I've built this SDF rendering engine, but I still need physics for gameplay. Building a physics engine is itself a huge undertaking, but luckily there are great open source physics engines available. I chose Jolt, which is used in Horizon Forbidden West and Death Stranding 2. It's possible to do collision detection using the SDF directly, but it is certainly not trivial and Jolt doesn't offer built-in support for this. Instead, I chose to generate a triangle mesh from the SDF that is then fed into Jolt in chunks. I don't do this for rendering, but a mesh does work well for physics as long as the resolution is low. I use marching cubes to generate the collision mesh because it's simple and easily parallelized. While all SDF evaluation done for rendering is done on the GPU, the collision mesh is generated across multiple threads on the CPU for low latency updates to the physics system. I can make any SDF edit a dynamic physics body, and it will collide realistically with the rest of the scene. This gets particularly cool when the scene is being updated dynamically, which I'll show more of in a minute. I haven't said anything about the terrain yet. Like everything else, it's generated using a sequence of SDF edits, but it's special case for efficiency. The terrain isn't a height field like in many games, but rather fully 3D. It's generated progressively using octaves of noise with a technique heavily inspired by an article by Inigo Kiles. Here you can see octaves being added to the terrain. It's also possible to generate substantially wilder looking landscapes using the same algorithm but different parameters. And all STF edits interact properly with the terrain. I love just watching how cubes blend with it. There's so much more I could say about how my engine works, but for now I want to show more examples of the type of cool stuff it enables. Because of the way my engine incrementally recomputes just the spatial region that has changed, it supports an effectively unbounded number of SDF edits. Combined with the LOD system, this enables things like digging kilometers below the terrain surface. All SDF edits are dynamic, which means that any change can be non-destructive. Here I'm demoing a tunnel gun which is just a capsule subtraction locked to the player's viewpoint. This isn't just a visual effect as the physics collision mesh is also updated dynamically. You can also create a tunnel, walk through it and then remove it, which is something I'm experimenting with for gameplay. Because the physics collision mesh updates dynamically, you can do cool things like moving a hole around to capture other objects or enemies. This is one of those things that feels oddly satisfying, and yes, this was inspired by the game Donut County. Digging is a significant component of the game I'm making. In addition to changing the surface shape, SCF edits can volumetrically stamp their material underground, which allows digging into ore deposits. Digging can also get kind of gross depending on the material you're digging into. In general, it's not that hard to make things look a little gross in this engine. I've been focusing on subtractive edits for things like digging, but edits can also be additive. Here I'm building a wall out of cubes. And of course, drawing on surfaces is possible. Lastly, I didn't build my engine to be an SDF modeling tool, but the fidelity it provides does make that sort of thing possible. Here I'm putting the finishing touches on SirPotatoFace. You may have noticed I like debug visualizations. They're extremely useful for both debugging and building intuition about how the various systems work. Here I've enabled every debug visualization at once. So useful. As I've mentioned, I'm building a game with this engine. I have a fairly detailed plan, but there's still a lot to figure out. I'll be sharing more about the design as I make progress.