TUXDB - LINUX GAMING AGGREGATE
by NuSuey
NEWSFEED
▪️ GAMES
▪️ STEAM DECK ▪️ DEALS ▪️ CROWDFUNDING ▪️ COMMUNITY
tuxdb.com logo
Support tuxDB on Patreon
Currently supported by 9 awesome people!

🌟 Special thanks to our amazing supporters:


✨ $10 Tier: [Geeks Love Detail]
🌈 $5 Tier: [Arch Toasty][Benedikt][David Martínez Martí]

Steam ImageSteam ImageSteam ImageSteam ImageSteam ImageSteam Image
Friday Facts #281 - For a Few Frames More

Read this article on our Website.

For a few frames more


Previously on Factorio Friday Facts (#264): "No wonder, scenes heavy on smoke or trees can tank FPS, especially in 4K. Maybe we should do something about that..."

Sometimes, it's just a bug


Writing technical Friday Facts helps me to summarize what I know, look at problems from a different perspective, and sometimes try to answer questions I haven't even thought of asking before. For example, I made the overdraw visualization just for screenshot for FFF-264, and while writing the FFF I didn't analyzed them. Just after the blog post was released I looked at it and started thinking - why does smoke create such a huge overdraw blob when it is just barely noticeable in game view? Well, it turned out it was largely due to a bug. Some time ago we optimized smoke particles so that they are only updated once every 120 ticks (2 seconds), and its animation (movement, scale and opacity) is interpolated during drawing. Thing is, particles are destroyed only during their update, if the lifetime of a particle ended somewhere in middle of the 120 tick window, the particle would be still drawn until destroyed. Because smoke fades away and scales up during its lifetime, it would be drawing itself fully transparent over a large area. Making the smoke particle not draw itself past its lifetime reduced number of particles being drawn by 15%, and reduced the number of pixels being rasterized even more. Additionally, particles below 2% opacity don't really seem to add anything to the final picture, so we can safely not draw those to get little extra boost.

Turret range visualization optimization


Probably only a very small number of our players actually run into this issue during normal play, but it is simple problem that we used to learn more about how to use GPUs more efficiently. Turret ranges in the map view are rendered as opaque geometry (no sprites) to an offscreen buffer which is subsequently drawn semi-transparently to the game view. This makes the turret ranges blend into single solid shape of the same color. Every pixel is written for each turret in range, but if it is written once, subsequent writes are unnecessary. We had two ideas how to optimize this. First, enable stencil test, so that pixels can be written just once. The second, pass a list of turrets to the GPU and test if each pixel is in range of any turret using a pixel shader. The stencil test idea yielded about 3x speed-up in our extreme test cases (20x20 turrets) which was not good enough - if your GPU had a problem with 3x3 grid, it would have problem again with 9x3 grid, which is not such a preposterous setup to have. The pixel shader idea turned out to be mixed bag - if the entire screen was in range, the shader would be lightning fast, but as soon as there were pixels outside of range, which meant shader had to iterate through entire list of turrets to figure out none covers it, performance started to drop off rapidly. In the worst case, it would be worse than without optimization. Jiri came up with idea to do prepass, in which we would render the geometry into a much smaller buffer (let's say 16x smaller in both width and height) and then in turret range shader we would check if a pixel is definitely in range (pixel in prepass buffer is fully opaque), definitely out of range (pixel in prepass buffer is fully transparent) or we don't know (pixel in prepass buffer is semi-transparent due to linear filtering) and we need to the pixel against the turret list. He did that and performance improved slightly, but not as much as we hoped for. After further investigation we found out the GPU really doesn't like the early exit in the pixel shader. Jiri managed to remove it by rendering all the certain cases first, while marking uncertain pixels in stencil buffer, and in another pass he would run the turret range shader on just stenciled pixels. This solution ended up being up to 20x faster in cases that were too slow with the un-optimized solution, but as you would zoom out more and there were more turrets covering less pixels on screen, the orginal solution would become better. So we turn on the optimization only when you are zoomed in enough. By the way, artillery ranges suffered from the same problem in early 0.16 versions, but they are so large it was good enough for us to test if entire screen is in range, and draw just single fullscreen rectangle in that case. Turrets were very likely to cause problems on low-end GPU even when not covering the entire screen.

GPU performance


As always, the root of all performance degradation is memory access. Since we mostly do just simple sprite drawing, the GPU has to read pixels from a texture, and blend it into a framebuffer. Which technically means reading pixels from another texture, doing some simple maths with the two values, and writing the result back. GPUs are designed to do this on a massively parallel scale and are optimized for high memory bandwidth while sacrificing memory latency. The assumption is you'll want to do at least a little bit of maths on the colors that you fetch from textures, to give the resulting image some dynamic detail. Every GPU core can then have many scheduled tasks and switch between them as the current task starts waiting on some memory operation to finish. When you don't really do any math to add dynamic details, and all detail comes from drawing more layers of sprites, the GPU cores quickly reach their limit of maximum tasks and every one of them is stalled by a memory access. That is not to say we don't hit memory bandwidth limits on some hardware. For cases where we do hit memory bandwidth limits, we added an option to do rendering in 16-bit color depth (as opposed to the normal 32-bit). This option is intended for old GPUs and integrated GPUs. An obvious way to improve this is to shade less pixels. This is something I mentioned before in FFF-227 by splitting tree shadows from tree trunks to remove large areas of completely transparent pixels. This can be improved further by drawing sprites not as rectangles but as generic polygons that envelope sprites such that most of the fully transparent areas won't be rasterized. This is something we will probably do for the most problematic sprites (trees and decoratives). Another way to reduce the number of shaded pixels is to simply to render in lower resolution. We actually do this for lights for a long time already, but it could be used for other effects that don't have important high frequency detail - for example smoke. Ultimately, some GPUs are not cut for rendering the game in FullHD resolution no matter what (for example Intel HD Graphics 2500 or multimedia cards like GeForce GT 710 and Radeon HD 6450 come to mind), so they would benefit from an option to render the game view in lower resolution with native-resolution GUI overlay. In FFF-227, I also mentioned mipmaps - downscaled copies of textures that are used when a texture is being scaled down. This helps to better utilize the caches on GPU, and therefore reduce the need to access main VRAM. We already used mipmaps for trees and decoratives in 0.16, but paradoxically some people had performance issues when they had mipmaps enabled. The problem is, in 0.16 we always use trilinear filtering for mipmapped textures. That means when you want to draw as sprite at 75% scale, the GPU will get a pixel from the 100% scale mipmap, and the 50% scale mipmap and average them to get the pixel for the 75% scale version. The access to two different mipmap levels would make things slower. In the new rendering code, we are able to control this, so for sprites that are likely to cause performance issues (for example smoke) we can just fetch pixels from the closest finer detailed mipmap.

Texture compression


GPUs natively support block compressed formats - the texture is divided into 4x4 pixels (or possibly different sizes in formats that are not commonly supported by desktop GPUs) and each block is compressed into a fixed number of bytes. Commonly supported formats on desktop GPUs are BC1-7. Most interesting to us, are:
  • BC1 (used to be DXT1) - Intended for RGB without alpha (or 1 bit alpha) with uses 4 bits per pixel (as opposed to the 32 bits raw RGBA uses).
  • BC3 (used to be DXT5) - Uses 8 bits per pixel - same format for color as BC1, but uses 4 extra bits for the alpha channel.
  • BC4 - Stores just a single color channel in the same format as the alpha channel in BC3 is stored (so 4 bits per pixel).
These formats work pretty well with normal textures, but not so well with 2D art that contains a lot of small detail. With our art it is not so bad as long as sprites are static, but as soon as we apply the compression on animations, even tiny changes in individual pixels results in larger changes in blocks that contain them, and the resulting animation ends up looking very noisy. https://cdn.factorio.com/assets/img/blog/fff-281-player.webm https://cdn.factorio.com/assets/img/blog/fff-281-player-compressed.webm Uncompressed vs. BC3 compression Awesomenouts developers described this in their blogpost on how compressed formats supported by GPUs work, what kind of artifacts it creates in their art, and what they do to improve quality. They store their sprites in a slightly higher resolution (for example 41% larger width and height) to spread the large frequency detail more apart. This makes compression less efficient but still worth it. The problem is, we can't really do this as it makes our sprites look pretty bad and blurry. During our initial experiments with compression, we deemed the quality of compression to be too low. In addition to that it somewhat slowed down the game startup as we built sprite atlases in an uncompressed format first and compressed them at the end of the sprite loading process. I left the 'Texture compression' option in, for people who really needed to use it, and it would compress only the sprites that usually blend over some other graphics (like color masks, shadows and smoke). The most commonly used image or video compression formats (like JPEG or H.264) exploit the fact that human vision is actually pretty bad at recognizing colors and is much more sensitive to changes in brightness. Instead of RGB color space they use of color space based on luma (brightness) and chrominance (color), and apply better quality compression to the luma component and lower quality to color components, resulting in a very high quality image to the human eye. Some smart people thought this technique could be used to improve the quality of GPU block compression formats. One of the ideas is YCoCg-DXT compression which uses BC3 as an underlying format and stores luma in alpha channel for higher quality compression and chrominance in RGB channels. Pixel shaders that use these textures need to do just a little bit of math to convert colors from YCoCg color space to RGB. We tried to integrate this (using separate BC4 compressed texture for alpha channel) and were pleasantly surprised by the result. You probably won't notice artifacts unless you zoom-in a lot and look for them. In fact, two weeks ago I enabled texture compression by default (and didn't tell anyone), and whenever I ask somebody on the team if they disabled it, they say they didn't know it was turned on. So I am pretty happy about that. The small downside is the need to use two textures (BC3 + BC4) resulting in 12 bits per pixel, but the best thing is, despite the pixel shader having to fetch from 2 textures instead of just one, the GPU is able to render up to twice as fast due to caches being able to fit more pixels in compressed formats than in raw formats. Luckily the paper contains the pixel shader code for compressing to this format on GPU, so we just had to adapt our sprite loading code to efficiently utilize GPU for compressing sprites as they are being loaded, so that the fact that we are compressing sprites during startup doesn't make the game load slower. In 0.17, the texture compression graphics setting is changed to a drop down list containing 'None', 'High quality' and 'Low quality':
  • High quality will use the custom YCoCg-DXT compression and will be the default on most computers.
  • Low quality will use BC3 and is intended for use only on really weak GPUs.
There shouldn't be any reason for you to disable compression, the option to do so is there mainly just in case some technical issue appears. Compression is applied to all sprites except the GUI, which needs to be as crisp as possible. Also, regardless of the texture compression option, shadow sprites will be always compressed using the BC4 format. https://cdn.factorio.com/assets/img/blog/fff-281-017-uncompressed.webm https://cdn.factorio.com/assets/img/blog/fff-281-017-high-compression.webm https://cdn.factorio.com/assets/img/blog/fff-281-017-low-compression.webm Uncompressed vs. High quality compression vs. Low quality compression After we added the high-res worms, biters, and spitters, VRAM usage rose up to 3.5GB (with high-res enabled, obviously) when no compression (even the shadow one) was applied. Compressing just shadows decreased VRAM usage to 0.16 levels of ~2.5GB. With high quality compression enabled, VRAM usage of sprite atlases currently is ~1GB (without mipmaps). This means, vanilla should be perfectly playable in high-res on GPUs with 2GB VRAM, and in combination with texture streaming these GPUs should be able to keep up in high-res even in cases when mods add a lot of new sprites. High-resolution sprites were originally intended for players with the most powerful computers, but in 0.17, they'll essentially become new standard. The goal is to eventually remove the 'low' and 'very-low' sprite resolution options, as 'low quality' texture compression on normal sprites + texture streaming should be able to run even on GPUs with very low VRAM sizes. Side note: I mentioned there are 7 BC formats. BC7 is intended for RGB or RGBA textures, and has potentially much better quality than BC3 with the same compression ratio. The problem is, the format was introduced with DirectX 11, so it is not supported by DirectX 10 class hardware, and it is not available in OpenGL on macOS. The second problem is that it takes a very very long to compress something into this format, because the compressor has to try out a large number of configurations for each block to find the one that yields the best quality. Since Factorio is kind of outlier in that its art is distributed in bunch of PNG files instead of having everything neatly packaged so that graphical assets can be loaded directly to the GPU, without any transcoding to a different format or dynamically building sprite atlases, we require a real-time compression solution. We are not eager to change how the game is distributed too much, for one having everything in separate files makes updates reasonably small when we change some of them, and having vanilla data completely open makes it easier for people to mod the game.

Now look at these graphs


We took some benchmarks of extreme scenes. The first one is on save from public Clusterio event last year. Twinsen sent me the save, because he was getting framerate drops in the middle of large steam turbine array. The second one is save from a bug report. I choose the save because it has rails going though forest and because I could really get low FPS on it even in 0.16, I used map editor to carpet the ground with grass decoratives. We tested on GPUs with 2GB or 1GB of VRAM, that we have in the office. Benchmarks on desktop GPUs ran on single PC (we just swapped GPUs between runs) - Intel Core i7-4790K, 16GB RAM, Windows 10. We measured the time a frame was being processed by GPU, for 1000 frames, and averaged out the results. We benchmarked against the 0.17 build from before GPU-side optimizations were implemented, unfortunately we don't have a way to capture GPU timings in 0.16. Tests ran in FullHD resolution (1920x1080) with high resolution sprites enabled, with a graphic configuration that I believe to be the best for rendering performance. For old build: Mipmaps, Atlas specialization, Separate lower object atlas, Texture compression options enabled, Video Memory Usage set to All. And equivalent configuration for the new build, except for using the new high quality compression option. We also included a result from one high-performance GPU (GeForce GTX 980 4GB), but this was ran in 4K resolution (3840x2160) as opposed to FullHD.The benchmark of Intel HD Graphics 5500 was ran on a laptop with Intel Core i7-5600U and 16GB RAM, and used Low quality texture compression instead. We also included results with 16-bit color depth enabled.



Times were measured in milliseconds. Lower times are better, and for 60 FPS, a frame needs to take under 16.66ms. If you are interested in how GPUs work more in-depth, Fabian Giesen wrote a nice series of articles on the topic. As always, let us know what you think on our forum.


[ 2019-02-08 15:12:12 CET ] [ Original post ]

Factorio
Wube Software LTD. Developer
Wube Software LTD. Publisher
2020-08-14 Release
Game News Posts: 506
🎹🖱️Keyboard + Mouse
Overwhelmingly Positive (164072 reviews)
The Game includes VR Support
Public Linux Depots:
  • Factorio Linux64 [306.86 M]
  • Factorio Linux32 [300.1 M]
Available DLCs:
  • Factorio: Space Age
Factorio is a game in which you build and maintain factories. You will be mining resources, researching technologies, building infrastructure, automating production and fighting enemies. In the beginning you will find yourself chopping trees, mining ores and crafting mechanical arms and transport belts by hand, but in short time you can become an industrial powerhouse, with huge solar fields, oil refining and cracking, manufacture and deployment of construction and logistic robots, all for your resource needs. However this heavy exploitation of the planet's resources does not sit nicely with the locals, so you will have to be prepared to defend yourself and your machine empire.

Join forces with other players in cooperative Multiplayer, create huge factories, collaborate and delegate tasks between you and your friends. Add mods to increase your enjoyment, from small tweak and helper mods to complete game overhauls, Factorio's ground-up Modding support has allowed content creators from around the world to design interesting and innovative features. While the core gameplay is in the form of the freeplay scenario, there are a range of interesting challenges in the form of the Scenario pack, available as free DLC. If you don't find any maps or scenarios you enjoy, you can create your own with the in-game Map Editor, place down entities, enemies, and terrain in any way you like, and even add your own custom script to make for interesting gameplay.

Discount Disclaimer: We don't have any plans to take part in a sale or to reduce the price for the foreseeable future.

What people say about Factorio


  • No other game in the history of gaming handles the logistics side of management simulator so perfectly. - Reddit
  • I see conveyor belts when I close my eyes. I may have been binging Factorio lately. - Notch, Mojang
  • Factorio is a super duper awesome game where we use conveyor belts to shoot aliens. - Zisteau, Youtube

MINIMAL SETUP
  • OS: Linux (tarball installation)
  • Processor: Dual core 3Ghz+Memory: 4 GB RAM
  • Memory: 4 GB RAM
  • Graphics: OpenGL 3.3 core. DirectX 10.1 capable GPU with 512 MB VRAM - GeForce GTX 260. Radeon HD 4850 or Intel HD Graphics 5500
  • Storage: 3 GB available space
RECOMMENDED SETUP
  • OS: Linux (tarball installation)
  • Processor: Quad core 3GHz+Memory: 8 GB RAM
  • Memory: 8 GB RAM
  • Graphics: OpenGL 4.3 core. DirectX 11 capable GPU with 2 GB VRAM - GeForce GTX 750 Ti. Radeon R7 360
  • Storage: 3 GB available space
GAMEBILLET

[ 6102 ]

9.89$ (1%)
50.95$ (15%)
35.14$ (12%)
24.87$ (17%)
21.24$ (15%)
17.79$ (11%)
49.79$ (17%)
12.42$ (17%)
17.49$ (13%)
26.09$ (13%)
17.79$ (11%)
2.22$ (78%)
15.95$ (16%)
42.99$ (14%)
6.79$ (15%)
12.38$ (17%)
22.24$ (11%)
4.21$ (16%)
17.79$ (11%)
16.97$ (15%)
4.12$ (17%)
3.93$ (21%)
13.14$ (12%)
21.22$ (15%)
16.96$ (15%)
37.32$ (17%)
50.39$ (16%)
49.79$ (17%)
29.71$ (15%)
16.79$ (16%)
GAMERSGATE

[ 764 ]

0.51$ (83%)
1.11$ (91%)
8.1$ (73%)
3.83$ (62%)
2.13$ (79%)
3.83$ (74%)
0.45$ (85%)
1.0$ (90%)
2.13$ (79%)
6.39$ (20%)
1.49$ (79%)
0.43$ (91%)
0.52$ (92%)
0.43$ (91%)
0.64$ (87%)
0.43$ (91%)
8.28$ (45%)
4.05$ (73%)
8.5$ (66%)
1.28$ (91%)
1.25$ (75%)
8.49$ (58%)
0.6$ (91%)
0.34$ (91%)
2.3$ (87%)
0.85$ (91%)
0.53$ (92%)
2.97$ (40%)
4.67$ (53%)
4.67$ (53%)

FANATICAL BUNDLES

Time left:

12 days, 11 hours, 10 minutes


Time left:

19 days, 11 hours, 10 minutes


Time left:

8 days, 11 hours, 10 minutes


Time left:

5 days, 11 hours, 10 minutes


Time left:

13 days, 11 hours, 10 minutes


Time left:

15 days, 11 hours, 10 minutes


Time left:

36 days, 11 hours, 10 minutes


Time left:

356461 days, 3 hours, 10 minutes


Time left:

18 days, 11 hours, 10 minutes


Time left:

47 days, 11 hours, 10 minutes


Time left:

33 days, 11 hours, 10 minutes


HUMBLE BUNDLES

Time left:

0 days, 5 hours, 10 minutes


Time left:

2 days, 5 hours, 10 minutes


Time left:

7 days, 5 hours, 10 minutes


Time left:

9 days, 5 hours, 10 minutes


Time left:

14 days, 5 hours, 10 minutes

by buying games/dlcs from affiliate links you are supporting tuxDB
🔴 LIVE