📜 ⬆️ ⬇️

Using Stream Out-stages for debugging shaders in DirectX 10 \ 11


In early March, I had the pleasure of visiting the Direct3D development team at Microsoft’s headquarters in Redmond. In the course of one of the discussions about debugging 3D applications, they advised me to use the new DirectX10 \ 11 feature to debug shaders.

I used this technique to debug tessellation code under DirectX 11 (this code is shown below), but DirectX 10 has the same features and porting will be quite trivial.

What are we trying to do?

We are interested in obtaining the results of the work of shaders (vertex, geometry, tessellation) performed on the GPU for the subsequent processing of this data using the CPU. In this case, we want to see the results of rendering graphics on the screen, and have all the coordinates in the form of buffers and structures in the RAM, from where we can already read them, write to the log, use them for further calculations.

Let's get down to business

You need to perform 4 basic steps:
')
Modify your shaders
We need to add additional fields to the shader output that we want to receive. For example, in the normal state, your shader may not output world-space coordinates, but for debug output via the Stream Out-stage you can add them.

Change the way to create a geometric shader
Constructing an ID3D11GeometryShader (or ID3D10GeometryShader) and adding it to the pipeline will be different.

Buffer to get output
It is quite logical - you also need to store the results somewhere.

Decipher results
The received data in the buffer is an array of structures, each of which contains information about the vertex in a specific shader format. The easiest way to decode a buffer is to declare a structure in the same format, and then bring a pointer to the beginning of the buffer to a pointer to an array of the above structures.

So, we modify the shaders

As you may know, Direct3D supports the “pass forward” mechanism. This means that the results of the withdrawal of the previous stage of the pipeline are transmitted to the next stage (and no longer return back). Thus, if you want to display some additional data from the vertex shader, you will have to “stretch” them through the HS / DS / GS pipeline line.

Let's look at such a geometric shader:

struct DS_OUTPUT { float4 position : SV_Position; float3 colour : COLOUR; float3 uvw : DOMAIN_SHADER_LOCATION; float3 wPos : WORLD_POSITION; }; [maxvertexcount(3)] void gsMain( triangle DS_OUTPUT input[3], inout TriangleStream<DS_OUTPUT> TriangleOutputStream ) { TriangleOutputStream.Append( input[0] ); TriangleOutputStream.Append( input[1] ); TriangleOutputStream.Append( input[2] ); TriangleOutputStream.RestartStrip(); } 


This geometric shader is completely “transparent” - it simply redirects the input to the output. Pay attention to the DS_OUTPUT structure - later we will choose which elements of this structure we want to receive.

It should be noted that your pixel shaders do not require changes. In the example above, the pixel shader will receive only the second parameter of the structure - float3 color: COLOR and ignore all other parameters. Thus, we will use the simplest idea: all new fields that we want to bring to the Stream Out-stages will simply be added to the end of the DS_OUTPUT structure.

Now we modify the procedure for creating a geometric shader. You need to call the CreateGeometryShaderWithStreamOutput () method instead of CreateGeometryShader (), passing to it the D3D11_SO_DECLARATION_ENTRY structure (or D3D10_SO_DECLARATION_ENTRY - depending on which version of DirectX you are using) describing the vertex format.

 D3D11_SO_DECLARATION_ENTRY soDecl[] = { { 0, "COLOUR", 0, 0, 3, 0 } , { 0, "DOMAIN_SHADER_LOCATION", 0, 0, 3, 0 } , { 0, "WORLD_POSITION", 0, 0, 3, 0 } }; UINT stride = 9 * sizeof(float); // *NOT* sizeof the above array! UINT elems = sizeof(soDecl) / sizeof(D3D11_SO_DECLARATION_ENTRY); 


It is necessary to pay attention to three things:
  1. Semantic names: they must correspond to be written in the HLSL code of your shader. Please note - in the structure above, we select three fields from the four declared in the geometric shader.
  2. The initial element and the number of elements: for the data type float3, we want to get all three coordinates, starting with zero, respectively, the initial element is 0, the number is 3.
  3. Step (offset) between two adjacent vertices: calling CreateGeometryShaderWithStreamOutput () requires knowing the size of the structure that describes the vertex. It’s not so difficult to calculate, but you can make a mistake and transfer the size of the soDecl structure, which will be wrong.


Now you need to create a buffer to get the results. It is created in much the same way as you create vertex and index buffers. We need two buffers - one available for writing from the GPU, the second - available for reading from the CPU.

 D3D11_BUFFER_DESC soDesc; soDesc.BindFlags = D3D11_BIND_STREAM_OUTPUT; soDesc.ByteWidth = 10 * 1024 * 1024; // 10mb soDesc.CPUAccessFlags = 0; soDesc.Usage = D3D11_USAGE_DEFAULT; soDesc.MiscFlags = 0; soDesc.StructureByteStride = 0; if( FAILED( hr = g_pd3dDevice->CreateBuffer( &soDesc, NULL, &g_pStreamOutBuffer ) ) ) { /* handle the error here */ return hr; } // Simply re-use the above struct soDesc.BindFlags = 0; soDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ; soDesc.Usage = D3D11_USAGE_STAGING; if( FAILED( hr = g_pd3dDevice->CreateBuffer( &soDesc, NULL, &g_pStagingStreamOutBuffer ) ) ) { /* handle the error here */ return hr; } 


You cannot call the Map () method on a buffer created with the D3D11_USAGE_DEFAULT flag and you cannot bind the buffer with the D3D11_CPU_ACCESS_READ flag to the Stream Out stage of the pipeline, so you create one buffer of each type and copy data from one to another.

Now bind the buffer to the Stream Out-stage:

 UINT offset = 0; g_pContext->SOSetTargets( 1, &g_pStreamOutBuffer, &offset );        : g_pContext->CopyResource( g_pStagingStreamOutBuffer, g_pStreamOutBuffer ); D3D11_MAPPED_SUBRESOURCE data; if( SUCCEEDED( g_pContext->Map( g_pStagingStreamOutBuffer, 0, D3D11_MAP_READ, 0, &data ) ) ) { struct GS_OUTPUT { D3DXVECTOR3 COLOUR; D3DXVECTOR3 DOMAIN_SHADER_LOCATION; D3DXVECTOR3 WORLD_POSITION; }; GS_OUTPUT *pRaw = reinterpret_cast< GS_OUTPUT* >( data.pData ); /* Work with the pRaw[] array here */ // Consider StringCchPrintf() and OutputDebugString() as simple ways of printing the above struct, or use the debugger and step through. g_pContext->Unmap( g_pStagingStreamOutBuffer, 0 ); } 


All of the above must be done after drawing is called. You need to be careful with the structure to which you are converting the contents of the buffer (take alignment into account).

How much data is received? We can write code using the D3D11_QUERY_PIPELINE_STATISTICS query to find out.

 // When initializing/loading D3D11_QUERY_DESC queryDesc; queryDesc.Query = D3D11_QUERY_PIPELINE_STATISTICS; queryDesc.MiscFlags = 0; if( FAILED( hr = g_pd3dDevice->CreateQuery( &queryDesc, &g_pDeviceStats ) ) ) { return hr; } // When rendering g_pContext->Begin(g_pDeviceStats); g_pContext->DrawIndexed( 3, 0, 0 ); // one triangle only g_pContext->End(g_pDeviceStats); D3D11_QUERY_DATA_PIPELINE_STATISTICS stats; while( S_OK != g_pContext->GetData(g_pDeviceStats, &stats, g_pDeviceStats->GetDataSize(), 0 ) ); 


Any restrictions?


Unfortunately yes.

Source: https://habr.com/ru/post/234707/


All Articles