- 2011 - A trip through the Graphics Pipeline 2011
- 2013 - Performance Optimization Guidelines and the GPU Architecture behind them
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
| In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader | |
| group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile. | |
| Simplified HLSL code looks like this: | |
| Buffer<float4> lightDatas; | |
| Texture2D<uint2> lightStartCounts; | |
| RWTexture2D<float4> output; | |
| [numthreads(8, 8, 1)] |
This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
| Setup: | |
| 1. Index buffer containing N quads (each 2 triangles), where N is the max amount of spheres. Repeating pattern of {0,1,2,1,3,2} + K*4. | |
| 2. No vertex buffer. | |
| Render N*2 triangles, where N is the number of spheres you have. | |
| Vertex shader: | |
| 1. Sphere index = N/4 (N = SV_VertexId) | |
| 2. Quad coord: Q = float2(N%2, (N%4)/2) * 2.0 - 1.0 | |
| 3. Transform sphere center -> pos |
For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.
After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft
This article has been updated and is available here.
This is my little Christmas-break experiment trying to (among other things) reduce the amount of generated code for containers.
THIS CODE WILL CONTAIN BUGS AND IS ONLY PRESENTED AS AN EXAMPLE.
The C++ STL is still an undesirable library for many reasons I have extolled in the past. But it's also a good library. Demons lie in this here debate and I have no interest in revisiting it right now.
The goals that I have achieved with this approach are:
Please consider using http://lygia.xyz instead of copy/pasting this functions. It expand suport for voronoi, voronoise, fbm, noise, worley, noise, derivatives and much more, through simple file dependencies. Take a look to https://github.com/patriciogonzalezvivo/lygia/tree/main/generative
float rand(float n){return fract(sin(n) * 43758.5453123);}
float noise(float p){
float fl = floor(p);
float fc = fract(p);