▶
Friday Facts #264 - Texture streaming
In 0.16, we added graphics option mysteriously called "Low VRAM mode", it enables a basic implementation of texture streaming, but I didn't want to call it that way, because I feared its poor performance would give texture streaming a bad name. How it worked? Every sprite has specified some priority, and the "Video memory usage" option - despite its name - controls which priorities of sprites are included in the sprite atlases. What happens to sprites that don't go to atlas? They are loaded as individual sprites. Normally, these sprites are allocated into texture objects. The reasoning behind this is that graphics driver has chance to decide how it wants to layout these small textures in memory, instead of it being forced to work with huge atlases.
What part of a sprite atlas looks like. When you enable "Low VRAM mode", non-atlas sprites are loaded only to RAM, and texture is allocated for them only when used during rendering. We called class that handled this BitmapCache and there was maximum limit of how many mega-bytes of texture data the bitmap cache could use at once. When this limit was reached, it failed to convert memory bitmap to video texture, and Allegro's drawing system would fallback to software rendering, which absolutely tanked FPS, but this didn't happen... most of the time. So apart from the obvious problem of falling back to software renderer (which we don't have any more after the graphics rewrite, so the game would crash or skip a sprite if this happened), there are other performance issues. Most sprites have a unique size, so we can't reuse textures for different sprites. Instead, when a sprite needs to be converted to a texture, a new texture is allocated, and when the sprite is evicted from the cache, its texture is destroyed. Creating and destroying textures considerably slows down rendering. The way we do it also fragments memory, so all of the sudden it may fail allocate new texture because there is no large enough consecutive block of memory left. Also, since our sprites are not in an atlas, sprite batching doesn't work and we get another performance hit from issuing thousands of draw calls instead of just hundreds. I considered it to be an experiment, and actually was kind of surprised that its performance was not as bad as I expected. Sure, it might cause FPS drop to single digits for a moment from time to time, but overall the game was playable (I have a long history of console gaming, so my standards as player might not be very high :)). Can we make it good enough, so it wouldn't be an experimental option any more, but something that could be enabled by default? Let's see. The problem is texture allocations, so let's allocate one texture for the entire bitmap cache - it would be a sprite atlas that we would dynamically update. That would also improve sprite batching, but when I started to think how to implement it, I quickly ran into a problem dealing with the fragmentation of space. I considered doing "defragmentation" from time to time, but it started to seem like an overwhelming problem, with a very uncertain result.
As I mentioned in FFF-251, it is very important for our rendering performance to batch sprite draw commands. If multiple consecutive draw commands use the same texture, we can batch them into a single draw call. That's why we build large sprite atlases. Virtual texture mapping - a texture streaming technique popularized by id Software as Mega Textures, seems like a perfect fit for us. All sprites are put into a single virtual atlas, the size of which is not restricted by hardware limits. You still have to be able to store the atlas somewhere, but it doesn't have to be a consecutive chunk of memory. The idea behind it is the same as in virtual memory - memory allocations assign a virtual address that maps to some physical location that can change under the hood (RAM, page file, etc.), sprites are assigned virtual texture coordinates that are mapped to some physical location. The virtual atlas is divided into tiles or pages (in our case 128x128 pixels), and when rendering we will figure out which tiles are needed, and upload them to a physical texture of much smaller dimensions than the virtual one. In the pixel shader, we then transform virtual texture coordinates to physical ones. To do that, we need an indirection table that says where to find the tiles from the virtual texture in the physical one. It is quite a challenge for 3D engines to figure out which virtual texture pages are needed, but since we go through the game state to determine which sprites should be rendered, we already have this information readily available. That solves the problem of frequent allocations - we have one texture and just update it. Also, since all the sprites share the same texture coordinate space, we can batch draw calls that use them. Great! However we could still run out of space in the physical texture. This is more likely if player zooms out a lot, as lot more different sprites can be visible at once. Well, if you zoom out, sprites are scaled down, and we don't need to render sprites in their full resolution. To exploit this, the virtual atlas has a couple levels of details (mipmaps), which are the same texture scaled down to 0.5 size, 0.25 size, etc. and we can stream-in only the mipmap levels that are needed for the current zoom level. We can use lower mipmap levels also if you are zoomed in and there are just too many sprites on the screen. We can also utilize the lower details to limit how much time is spent for streaming per frame to prevent stalls in rendering when a big update is required. The Virtual atlas technique is big improvement over the old "Low VRAM mode" option, but it is still not good enough. In the ideal case, I would like it to work so well, we could remove low and very-low sprite quality options, and everyone would be able to play the game on normal. What prevents that from happening is that the entire virtual atlas needs to be in RAM. Streaming from HDD has very high latency, and we are not sure yet if it will be feasible for us to do without introducing bad sprite pop-ins, etc. If you'd like to learn how virtual texture mapping works in more detail, you can read the paper Advanced Virtual Texture Topics, or possibly even more in-depth Software Virtual Textures.
The main motivation behind texture streaming, is to make sure the game is able to run with limited resources, without having to reduce visual quality too much. According to the Steam hardware survey, almost 60% of our players (who have dedicated GPU), have at least 4GB of VRAM and this number grows as people upgrade their computers:
We have received quite a lot of bug reports about rendering performance issues from people with decent GPUs, especially since we started adding high-resolution sprites. Our assumption was that the problems were caused by the game wanting to use more video memory than available (the game is not the only application that wants to use video memory) and the graphics driver has to spend a lot of time to optimize accesses to the textures. During the graphics rewrite, we learned a lot about how contemporary GPUs work (and are still learning), and we were able to utilize the new APIs to measure how much time rendering takes on a GPU. To simply draw a 1920x1080 image to a render target of the same size, it takes:
The game scene being rendered.
Overdraw visualisation (cyan = 1 draw, green = 2, red >= 10).
Overdraw visualisation when we discard transparent pixels. Interestingly, discarding completely transparent pixels didn't seem to make any performance difference on modern GPUs, including the Intel HD. Drawing sprites with a lot of completely transparent pixels is faster than an opaque sprite without having to explicitly discard transparent pixels with shaders. However, it did make difference on Radeon HD 6450 and GeForce GT 330M, so perhaps modern GPUs throw away pixels that wouldn't have any effect on the result automatically? Anyway, a GTX 1060 renders a game scene like this in 1080p in 1ms. That's fast, but it means in 4K it would take 4ms, 10ms on integrated GPUs, and more that a single frame worth of time (16.66ms) on old, non-gaming GPUs. No wonder, scenes heavy on smoke or trees can tank FPS, especially in 4K. Maybe we should do something about that... As always, let us know what you think on our forum.
[ 2018-10-12 14:24:36 CET ] [ Original post ]
Hello, it is me, posila, with another technical article. Sorry.
Bitmap Cache
In 0.16, we added graphics option mysteriously called "Low VRAM mode", it enables a basic implementation of texture streaming, but I didn't want to call it that way, because I feared its poor performance would give texture streaming a bad name. How it worked? Every sprite has specified some priority, and the "Video memory usage" option - despite its name - controls which priorities of sprites are included in the sprite atlases. What happens to sprites that don't go to atlas? They are loaded as individual sprites. Normally, these sprites are allocated into texture objects. The reasoning behind this is that graphics driver has chance to decide how it wants to layout these small textures in memory, instead of it being forced to work with huge atlases.
What part of a sprite atlas looks like. When you enable "Low VRAM mode", non-atlas sprites are loaded only to RAM, and texture is allocated for them only when used during rendering. We called class that handled this BitmapCache and there was maximum limit of how many mega-bytes of texture data the bitmap cache could use at once. When this limit was reached, it failed to convert memory bitmap to video texture, and Allegro's drawing system would fallback to software rendering, which absolutely tanked FPS, but this didn't happen... most of the time. So apart from the obvious problem of falling back to software renderer (which we don't have any more after the graphics rewrite, so the game would crash or skip a sprite if this happened), there are other performance issues. Most sprites have a unique size, so we can't reuse textures for different sprites. Instead, when a sprite needs to be converted to a texture, a new texture is allocated, and when the sprite is evicted from the cache, its texture is destroyed. Creating and destroying textures considerably slows down rendering. The way we do it also fragments memory, so all of the sudden it may fail allocate new texture because there is no large enough consecutive block of memory left. Also, since our sprites are not in an atlas, sprite batching doesn't work and we get another performance hit from issuing thousands of draw calls instead of just hundreds. I considered it to be an experiment, and actually was kind of surprised that its performance was not as bad as I expected. Sure, it might cause FPS drop to single digits for a moment from time to time, but overall the game was playable (I have a long history of console gaming, so my standards as player might not be very high :)). Can we make it good enough, so it wouldn't be an experimental option any more, but something that could be enabled by default? Let's see. The problem is texture allocations, so let's allocate one texture for the entire bitmap cache - it would be a sprite atlas that we would dynamically update. That would also improve sprite batching, but when I started to think how to implement it, I quickly ran into a problem dealing with the fragmentation of space. I considered doing "defragmentation" from time to time, but it started to seem like an overwhelming problem, with a very uncertain result.
Virtual texture mapping
As I mentioned in FFF-251, it is very important for our rendering performance to batch sprite draw commands. If multiple consecutive draw commands use the same texture, we can batch them into a single draw call. That's why we build large sprite atlases. Virtual texture mapping - a texture streaming technique popularized by id Software as Mega Textures, seems like a perfect fit for us. All sprites are put into a single virtual atlas, the size of which is not restricted by hardware limits. You still have to be able to store the atlas somewhere, but it doesn't have to be a consecutive chunk of memory. The idea behind it is the same as in virtual memory - memory allocations assign a virtual address that maps to some physical location that can change under the hood (RAM, page file, etc.), sprites are assigned virtual texture coordinates that are mapped to some physical location. The virtual atlas is divided into tiles or pages (in our case 128x128 pixels), and when rendering we will figure out which tiles are needed, and upload them to a physical texture of much smaller dimensions than the virtual one. In the pixel shader, we then transform virtual texture coordinates to physical ones. To do that, we need an indirection table that says where to find the tiles from the virtual texture in the physical one. It is quite a challenge for 3D engines to figure out which virtual texture pages are needed, but since we go through the game state to determine which sprites should be rendered, we already have this information readily available. That solves the problem of frequent allocations - we have one texture and just update it. Also, since all the sprites share the same texture coordinate space, we can batch draw calls that use them. Great! However we could still run out of space in the physical texture. This is more likely if player zooms out a lot, as lot more different sprites can be visible at once. Well, if you zoom out, sprites are scaled down, and we don't need to render sprites in their full resolution. To exploit this, the virtual atlas has a couple levels of details (mipmaps), which are the same texture scaled down to 0.5 size, 0.25 size, etc. and we can stream-in only the mipmap levels that are needed for the current zoom level. We can use lower mipmap levels also if you are zoomed in and there are just too many sprites on the screen. We can also utilize the lower details to limit how much time is spent for streaming per frame to prevent stalls in rendering when a big update is required. The Virtual atlas technique is big improvement over the old "Low VRAM mode" option, but it is still not good enough. In the ideal case, I would like it to work so well, we could remove low and very-low sprite quality options, and everyone would be able to play the game on normal. What prevents that from happening is that the entire virtual atlas needs to be in RAM. Streaming from HDD has very high latency, and we are not sure yet if it will be feasible for us to do without introducing bad sprite pop-ins, etc. If you'd like to learn how virtual texture mapping works in more detail, you can read the paper Advanced Virtual Texture Topics, or possibly even more in-depth Software Virtual Textures.
GPU rendering performance
The main motivation behind texture streaming, is to make sure the game is able to run with limited resources, without having to reduce visual quality too much. According to the Steam hardware survey, almost 60% of our players (who have dedicated GPU), have at least 4GB of VRAM and this number grows as people upgrade their computers:
We have received quite a lot of bug reports about rendering performance issues from people with decent GPUs, especially since we started adding high-resolution sprites. Our assumption was that the problems were caused by the game wanting to use more video memory than available (the game is not the only application that wants to use video memory) and the graphics driver has to spend a lot of time to optimize accesses to the textures. During the graphics rewrite, we learned a lot about how contemporary GPUs work (and are still learning), and we were able to utilize the new APIs to measure how much time rendering takes on a GPU. To simply draw a 1920x1080 image to a render target of the same size, it takes:
- ~0.1ms on GeForce GTX 1060.
- ~0.15 ms on Radeon Vega 64.
- ~0.2ms on GeForce GTX 750Ti or Radeon R7 360.
- ~0.75ms on GeForce GT 330M.
- ~1ms on Intel HD Graphics 5500.
- ~2ms on Radeon HD 6450.
The game scene being rendered.
Overdraw visualisation (cyan = 1 draw, green = 2, red >= 10).
Overdraw visualisation when we discard transparent pixels. Interestingly, discarding completely transparent pixels didn't seem to make any performance difference on modern GPUs, including the Intel HD. Drawing sprites with a lot of completely transparent pixels is faster than an opaque sprite without having to explicitly discard transparent pixels with shaders. However, it did make difference on Radeon HD 6450 and GeForce GT 330M, so perhaps modern GPUs throw away pixels that wouldn't have any effect on the result automatically? Anyway, a GTX 1060 renders a game scene like this in 1080p in 1ms. That's fast, but it means in 4K it would take 4ms, 10ms on integrated GPUs, and more that a single frame worth of time (16.66ms) on old, non-gaming GPUs. No wonder, scenes heavy on smoke or trees can tank FPS, especially in 4K. Maybe we should do something about that... As always, let us know what you think on our forum.
[ 2018-10-12 14:24:36 CET ] [ Original post ]
Factorio
Wube Software LTD.
Developer
Wube Software LTD.
Publisher
2020-08-14
Release
Game News Posts:
506
🎹🖱️Keyboard + Mouse
Overwhelmingly Positive
(164072 reviews)
The Game includes VR Support
Public Linux Depots:
- Factorio Linux64 [306.86 M]
- Factorio Linux32 [300.1 M]
Available DLCs:
- Factorio: Space Age
Factorio is a game in which you build and maintain factories. You will be mining resources, researching technologies, building infrastructure, automating production and fighting enemies. In the beginning you will find yourself chopping trees, mining ores and crafting mechanical arms and transport belts by hand, but in short time you can become an industrial powerhouse, with huge solar fields, oil refining and cracking, manufacture and deployment of construction and logistic robots, all for your resource needs. However this heavy exploitation of the planet's resources does not sit nicely with the locals, so you will have to be prepared to defend yourself and your machine empire.
Join forces with other players in cooperative Multiplayer, create huge factories, collaborate and delegate tasks between you and your friends. Add mods to increase your enjoyment, from small tweak and helper mods to complete game overhauls, Factorio's ground-up Modding support has allowed content creators from around the world to design interesting and innovative features. While the core gameplay is in the form of the freeplay scenario, there are a range of interesting challenges in the form of the Scenario pack, available as free DLC. If you don't find any maps or scenarios you enjoy, you can create your own with the in-game Map Editor, place down entities, enemies, and terrain in any way you like, and even add your own custom script to make for interesting gameplay.
Discount Disclaimer: We don't have any plans to take part in a sale or to reduce the price for the foreseeable future.
Join forces with other players in cooperative Multiplayer, create huge factories, collaborate and delegate tasks between you and your friends. Add mods to increase your enjoyment, from small tweak and helper mods to complete game overhauls, Factorio's ground-up Modding support has allowed content creators from around the world to design interesting and innovative features. While the core gameplay is in the form of the freeplay scenario, there are a range of interesting challenges in the form of the Scenario pack, available as free DLC. If you don't find any maps or scenarios you enjoy, you can create your own with the in-game Map Editor, place down entities, enemies, and terrain in any way you like, and even add your own custom script to make for interesting gameplay.
Discount Disclaimer: We don't have any plans to take part in a sale or to reduce the price for the foreseeable future.
What people say about Factorio
- No other game in the history of gaming handles the logistics side of management simulator so perfectly. - Reddit
- I see conveyor belts when I close my eyes. I may have been binging Factorio lately. - Notch, Mojang
- Factorio is a super duper awesome game where we use conveyor belts to shoot aliens. - Zisteau, Youtube
MINIMAL SETUP
- OS: Linux (tarball installation)
- Processor: Dual core 3Ghz+Memory: 4 GB RAM
- Memory: 4 GB RAM
- Graphics: OpenGL 3.3 core. DirectX 10.1 capable GPU with 512 MB VRAM - GeForce GTX 260. Radeon HD 4850 or Intel HD Graphics 5500
- Storage: 3 GB available space
- OS: Linux (tarball installation)
- Processor: Quad core 3GHz+Memory: 8 GB RAM
- Memory: 8 GB RAM
- Graphics: OpenGL 4.3 core. DirectX 11 capable GPU with 2 GB VRAM - GeForce GTX 750 Ti. Radeon R7 360
- Storage: 3 GB available space
GAMEBILLET
[ 6102 ]
GAMERSGATE
[ 764 ]
FANATICAL BUNDLES
HUMBLE BUNDLES
by buying games/dlcs from affiliate links you are supporting tuxDB