Fundamentals of graphics programming on Apple Metal: Getting started

Hi, Habr! My post today is a guide for beginners to program graphics on the Apple Metal API. When I started to deal with this topic, it turned out that in addition to Apple’s documentation and examples from them, there’s really nothing to watch. Today I will talk about how to create a simple application on Metal, which displays a three-dimensional cube with lighting. Then we will draw several cubes using one of the main Metal chips - rendering in several streams. Interested please under the cat.

Demo application

In order to run the demo , we need a Mac, Xcode 6, as well as a device with an A7 processor (starting with iPad Air 2013 and iPhone 5S). Unfortunately, the launch of the application for Metal is impossible on the emulator. The last limitation implies the need to have a valid subscription to an iOS developer program. I understand that these are not small requirements for the simple curious, and, of course, I do not urge you to buy any of the above. However, if the stars are such that you have everything you need, I will be glad to learn about forks from my repository and your own experiments with Metal.
In addition, when reading this manual, I strongly recommend in parallel to look at the code of the demo, it will greatly improve the understanding of what is happening.

Introduction

I am not in favor of adding official documents to posts, so let's talk about the nature of Metal in simple words. Apple talked a lot about why Metal is cooler than OpenGL ES (there was a bit of that in Habré ). From all this, I would single out only 2 key advantages:

In Metal, the amount of command-line runtime validations for the GPU was significantly reduced by transferring validation at the time the application was loaded or at the time of compilation. This is how cached state objects appeared. The idea, frankly, is not new, state objects we saw back in Direct3D 10. Thus, in the Metal API, you can pre-prepare and cache almost any states of the graphics pipeline.
Possibility of parallel calculation and filling of command buffers. The idea here is to pass on to the application developer the process of filling in the command queue for the GPU, since no one better than the developer knows how his scene is rendered, what can be done in parallel and what cannot. At the same time, when working with Metal API in several threads, you should not be afraid to get bogged down in the threads synchronization processes, the API is designed to simplify the developer’s life as much as possible (or at least not cause an instant panic attack).

To start working with Metal, you can create a new project like “Game” in Xcode 6, then select Metal as the rendering method in the project creation wizard and ... everything. Xcode will generate a template project that will draw the cube. That's how I started creating my demo, since the standard template project did not suit me.
')

Step 1. We draw a cube with lighting.

The result of this step will be an application in which a single-color cube will be displayed, lit using the Blin model. The application will also have an arcball camera that will allow us to rotate around an object with the Swipe gesture and zoom in / out with the Zoom gesture.
In the standard Apple template, all the application logic is concentrated in the custom ViewController. I selected 2 classes: RenderView and RenderViewContoller . The first class is a successor from UIView and is responsible for the initialization of Metal and its bundle with Core Animation. The second class contains the graphical demo itself and a certain amount of infrastructure code for handling the minimization / deployment situations of the application and user input. It would be more appropriate to create a RenderModel class and render the logic of the graphic demo there. Perhaps we will do so when the complexity of the program increases.
It is appropriate to mention in what language we will create the application. I chose Objective-C ++, which allowed me to include in classes written in pure C ++ into the project. There is also the possibility to use Swift (a good article in English about this can be read here ).

RenderView implementation

It is unlikely that anyone will be surprised to learn that Metal is closely connected with Core Animation, the system that controls graphics and animation in iOS. Apple has prepared a special CAMetalLayer layer for embedding Metal in iOS apps . This layer will be used by our RenderView . The RenderView will initialize as follows:

+ (Class)layerClass { return [CAMetalLayer class]; } - (void)initCommon { self.opaque = YES; self.backgroundColor = nil; _metalLayer = (CAMetalLayer *)self.layer; _device = MTLCreateSystemDefaultDevice(); _metalLayer.device = _device; _metalLayer.pixelFormat = MTLPixelFormatBGRA8Unorm; _metalLayer.framebufferOnly = YES; _sampleCount = 1; _depthPixelFormat = MTLPixelFormatDepth32Float; _stencilPixelFormat = MTLPixelFormatInvalid; }

In this code, it is easy to find the common with other graphic APIs: create the root API class ( MTLDevice in this case), choose the formats of the back buffer and the depth buffer, choose the number of samples for multisampling. Directly creating the back buffer and depth buffer textures is done on request. This is due to the peculiarity of the Metal and Core Animation bundles. When Core Animation allows drawing on the device’s screen, it returns a non-zero CAMetalDrawable that is associated with the device’s screen. If the user minimizes the application, then we must take care to stop any rendering, since in this case CAMetalDrawable for this application will be zero (hello, Direct3D 9 and D3DERR_DEVICELOST ). In addition, when the device transitions from Portrait to Landscape and vice versa, it is necessary to reinitialize textures for the back buffer, depth buffer, and stencil.
At each frame, the MTLRenderPassDescriptor object is re- formed . This object links the back buffer texture obtained from the current CAMetalDrawable with the desired rendering parameters. Also in this object are specified actions that can be additionally carried out before and after rendering. For example, MTLStoreActionMultisampleResolve says that after rendering to a texture with multisampling it is necessary to transform this texture (resolve) to the usual form. MTLLoadActionClear allows you to clear the back buffer / depth buffer / stencil buffer before drawing a new frame.
The code for creating and re-initializing the back buffer, depth buffer and stencil buffer can be found under the cut.

Code to create and re-initialize textures

 - (void)setupRenderPassDescriptorForTexture:(id <MTLTexture>)texture { if (_renderPassDescriptor == nil) _renderPassDescriptor = [MTLRenderPassDescriptor renderPassDescriptor]; // init/update default render target MTLRenderPassColorAttachmentDescriptor* colorAttachment = _renderPassDescriptor.colorAttachments[0]; colorAttachment.texture = texture; colorAttachment.loadAction = MTLLoadActionClear; colorAttachment.clearColor = MTLClearColorMake(0.0f, 0.0f, 0.0f, 1.0f); if(_sampleCount > 1) { BOOL doUpdate = (_msaaTexture.width != texture.width) || ( _msaaTexture.height != texture.height) || ( _msaaTexture.sampleCount != _sampleCount); if(!_msaaTexture || (_msaaTexture && doUpdate)) { MTLTextureDescriptor* desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: MTLPixelFormatBGRA8Unorm width: texture.width height: texture.height mipmapped: NO]; desc.textureType = MTLTextureType2DMultisample; desc.sampleCount = _sampleCount; _msaaTexture = [_device newTextureWithDescriptor: desc]; _msaaTexture.label = @"Default MSAA render target"; } colorAttachment.texture = _msaaTexture; colorAttachment.resolveTexture = texture; colorAttachment.storeAction = MTLStoreActionMultisampleResolve; } else { colorAttachment.storeAction = MTLStoreActionStore; } // init/update default depth buffer if(_depthPixelFormat != MTLPixelFormatInvalid) { BOOL doUpdate = (_depthTexture.width != texture.width) || (_depthTexture.height != texture.height) || (_depthTexture.sampleCount != _sampleCount); if(!_depthTexture || doUpdate) { MTLTextureDescriptor* desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: _depthPixelFormat width: texture.width height: texture.height mipmapped: NO]; desc.textureType = (_sampleCount > 1) ? MTLTextureType2DMultisample : MTLTextureType2D; desc.sampleCount = _sampleCount; _depthTexture = [_device newTextureWithDescriptor: desc]; _depthTexture.label = @"Default depth buffer"; MTLRenderPassDepthAttachmentDescriptor* depthAttachment = _renderPassDescriptor.depthAttachment; depthAttachment.texture = _depthTexture; depthAttachment.loadAction = MTLLoadActionClear; depthAttachment.storeAction = MTLStoreActionDontCare; depthAttachment.clearDepth = 1.0; } } // init/update default stencil buffer if(_stencilPixelFormat != MTLPixelFormatInvalid) { BOOL doUpdate = (_stencilTexture.width != texture.width) || (_stencilTexture.height != texture.height) || (_stencilTexture.sampleCount != _sampleCount); if (!_stencilTexture || doUpdate) { MTLTextureDescriptor* desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat: _stencilPixelFormat width: texture.width height: texture.height mipmapped: NO]; desc.textureType = (_sampleCount > 1) ? MTLTextureType2DMultisample : MTLTextureType2D; desc.sampleCount = _sampleCount; _stencilTexture = [_device newTextureWithDescriptor: desc]; _stencilTexture.label = @"Default stencil buffer"; MTLRenderPassStencilAttachmentDescriptor* stencilAttachment = _renderPassDescriptor.stencilAttachment; stencilAttachment.texture = _stencilTexture; stencilAttachment.loadAction = MTLLoadActionClear; stencilAttachment.storeAction = MTLStoreActionDontCare; stencilAttachment.clearStencil = 0; } } }

The render method of the RenderView class will be called per frame from the RenderViewController .

RenderViewController implementation

The description of the implementation of this class will begin with the infrastructure part. In order to call the method for rendering a frame from the RenderView , we need a timer, a CADisplayLink object of the class, which we initialize as follows:

 - (void)startTimer { _timer = [CADisplayLink displayLinkWithTarget:self selector:@selector(_renderloop)]; [_timer addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSDefaultRunLoopMode]; }

It is important to note that we will stop the timer when the application is minimized and resume when expanded. To do this, I forward calls to applicationDidEnterBackground and applicationWillEnterForeground from the AppDelegate in the RenderViewContoller . This ensures that our application does not attempt to render anything when it is minimized and does not fall for this reason.
In addition, we initialize a special semaphore ( dispatch_semaphore_t _inflightSemaphore ). This will allow us to avoid the so-called GPU Bound, that is, a situation when the CPU is waiting for the graphics processor to form the next frame. We will allow our CPU to prepare several frames in advance (up to 3 frames in our case) to minimize our idle time while waiting for the GPU. The semaphore technique will be discussed further.
We will intercept user input using the implementation of the touchesBegan , touchesMoved and touchesEnded methods . The movements of one or several fingers across the screen will be transmitted to the ArcballCamera class, which will convert these movements into turns and movements of the camera.
Reaction code for user input under the cat.

Reaction to user input

 - (void)touchesBegan:(NSSet *)touches withEvent:(UIEvent *)event { NSArray* touchesArray = [touches allObjects]; if (touches.count == 1) { if (!camera.isRotatingNow()) { CGPoint pos = [touchesArray[0] locationInView: self.view]; camera.startRotation(pos.x, pos.y); } else { // here we put second finger simd::float2 lastPos = camera.getLastFingerPosition(); camera.stopRotation(); CGPoint pos = [touchesArray[0] locationInView: self.view]; float d = vector_distance(simd::float2 { (float)pos.x, (float)pos.y }, lastPos); camera.startZooming(d); } } else if (touches.count == 2) { CGPoint pos1 = [touchesArray[0] locationInView: self.view]; CGPoint pos2 = [touchesArray[1] locationInView: self.view]; float d = vector_distance(simd::float2 { (float)pos1.x, (float)pos1.y }, simd::float2 { (float)pos2.x, (float)pos2.y }); camera.startZooming(d); } } - (void)touchesMoved:(NSSet *)touches withEvent:(UIEvent *)event { NSArray* touchesArray = [touches allObjects]; if (touches.count != 0 && camera.isRotatingNow()) { CGPoint pos = [touchesArray[0] locationInView: self.view]; camera.updateRotation(pos.x, pos.y); } else if (touches.count == 2 && camera.isZoomingNow()) { CGPoint pos1 = [touchesArray[0] locationInView: self.view]; CGPoint pos2 = [touchesArray[1] locationInView: self.view]; float d = vector_distance(simd::float2 { (float)pos1.x, (float)pos1.y }, simd::float2 { (float)pos2.x, (float)pos2.y }); camera.updateZooming(d); } } - (void)touchesEnded:(NSSet *)touches withEvent:(UIEvent *)event { camera.stopRotation(); camera.stopZooming(); }

You can read about the theory of the implementation of arcball cameras here .
Finally, we turn to the logic of the graphical application itself, which is contained in 5 main methods:

 - (void)configure:(RenderView*)renderView

Here we configure the view by specifying, for example, the number of samples for multisampling, the formats of the back buffer, the depth buffer, and the stencil.

 - (void)setupMetal:(id<MTLDevice>)device

In this method, we create a queue of commands, initialize resources, load shaders, prepare state objects.

 - (void)update

Here the frame is updated, the matrices and other parameters for shaders are calculated.

 - (void)render:(RenderView*)renderView

Here, obviously, the frame itself is rendered.

 - (void)resize:(RenderView*)renderView

This method is called when the screen is resized, for example, when turning the device, when the length and width are swapped. Here it is convenient to calculate, for example, the projection matrix.

What are the features when initializing resources and state objects in Metal? For me, accustomed to the Direct3D 11 API, there was only one serious one. Since the CPU can manage to send up to 3 frames for rendering before synchronization with the GPU, the buffer size for the constants should be three times larger than usual. Each of the three frames works with its own piece of constant buffer to eliminate the possibility of data rubbing. In practice, it looks like this:

 //  uint8_t* bufferPointer = (uint8_t*)[_dynamicUniformBuffer contents] + (sizeof(uniforms_t) * _currentUniformBufferIndex); memcpy(bufferPointer, &_uniform_buffer, sizeof(uniforms_t)); //  [renderEncoder setVertexBuffer:_dynamicUniformBuffer offset:(sizeof(uniforms_t) * _currentUniformBufferIndex) atIndex:1 ];

Perhaps, it is worth mentioning about the MTLRenderPipelineDescriptor and MTLRenderPipelineState classes , which are defined by the state of the graphics pipeline and the state object itself. This object includes links to the vertex and pixel shaders, the number of multisample samples, the format of the back buffer and the depth buffer. Stop, it seems, we have already asked it somewhere. Everything is exactly as it seems. This state is sharpened for quite specific rendering parameters, and under other circumstances it cannot be used. By creating such an object in advance (and validating) we eliminate the graphic pipeline from having to check parameter compatibility errors during rendering, the pipeline either accepts the entire state or completely rejects it.
The initialization code for Metal is shown below.

 - (void)setupMetal:(id<MTLDevice>)device { _commandQueue = [device newCommandQueue]; _defaultLibrary = [device newDefaultLibrary]; [self loadAssets: device]; } - (void)loadAssets:(id<MTLDevice>)device { _dynamicUniformBuffer = [device newBufferWithLength:MAX_UNIFORM_BUFFER_SIZE options:0]; _dynamicUniformBuffer.label = @"Uniform buffer"; id <MTLFunction> fragmentProgram = [_defaultLibrary newFunctionWithName:@"psLighting"]; id <MTLFunction> vertexProgram = [_defaultLibrary newFunctionWithName:@"vsLighting"]; _vertexBuffer = [device newBufferWithBytes:(Primitives::cube()) length:(Primitives::cubeSizeInBytes()) options:MTLResourceOptionCPUCacheModeDefault]; _vertexBuffer.label = @"Cube vertex buffer"; // pipeline state MTLRenderPipelineDescriptor *pipelineStateDescriptor = [[MTLRenderPipelineDescriptor alloc] init]; pipelineStateDescriptor.label = @"Simple pipeline"; [pipelineStateDescriptor setSampleCount: ((RenderView*)self.view).sampleCount]; [pipelineStateDescriptor setVertexFunction:vertexProgram]; [pipelineStateDescriptor setFragmentFunction:fragmentProgram]; pipelineStateDescriptor.colorAttachments[0].pixelFormat = MTLPixelFormatBGRA8Unorm; pipelineStateDescriptor.depthAttachmentPixelFormat = MTLPixelFormatDepth32Float; NSError* error = NULL; _pipelineState = [device newRenderPipelineStateWithDescriptor:pipelineStateDescriptor error:&error]; if (!_pipelineState) { NSLog(@"Failed to created pipeline state, error %@", error); } MTLDepthStencilDescriptor *depthStateDesc = [[MTLDepthStencilDescriptor alloc] init]; depthStateDesc.depthCompareFunction = MTLCompareFunctionLess; depthStateDesc.depthWriteEnabled = YES; _depthState = [device newDepthStencilStateWithDescriptor:depthStateDesc]; }

Finally, consider the most intriguing part of the code rendering frame.

 - (void)render:(RenderView*)renderView { dispatch_semaphore_wait(_inflightSemaphore, DISPATCH_TIME_FOREVER); [self update]; MTLRenderPassDescriptor* renderPassDescriptor = renderView.renderPassDescriptor; id <CAMetalDrawable> drawable = renderView.currentDrawable; // new command buffer id <MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer]; commandBuffer.label = @"Simple command buffer"; // simple render encoder id <MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor: renderPassDescriptor]; renderEncoder.label = @"Simple render encoder"; [renderEncoder setDepthStencilState:_depthState]; [renderEncoder pushDebugGroup:@"Draw cube"]; [renderEncoder setRenderPipelineState:_pipelineState]; [renderEncoder setVertexBuffer:_vertexBuffer offset:0 atIndex:0 ]; [renderEncoder setVertexBuffer:_dynamicUniformBuffer offset:(sizeof(uniforms_t) * _currentUniformBufferIndex) atIndex:1 ]; [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:36 instanceCount:1]; [renderEncoder popDebugGroup]; [renderEncoder endEncoding]; __block dispatch_semaphore_t block_sema = _inflightSemaphore; [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) { dispatch_semaphore_signal(block_sema); }]; _currentUniformBufferIndex = (_currentUniformBufferIndex + 1) % MAX_INFLIGHT_BUFFERS; [commandBuffer presentDrawable:drawable]; [commandBuffer commit]; }

At the beginning of the method, dispatch_semaphore_wait is called, which stops the frame calculation on the CPU until the GPU finishes with one of the current frames. As I said, in our CPU demo it is allowed to count up to 3 frames while the GPU is busy. The semaphore is released in the addCompletedHandler method of the commandBuffer command buffer . The command buffer is designed as a lightweight (transient) object, that is, it must be created every frame and cannot be reused.
Each frame for a specific buffer creates a so-called render instruction encoder (in this case, an object of class MTLRenderCommandEncoder ). When creating it, an object of the MTLRenderPassDescriptor class is used , which we discussed above. This object allows you to fill the buffer with commands of various kinds (setting states, vertex buffers, calls to primitive drawing methods, i.e., everything that is familiar from other graphical APIs). Upon completion of the filling for the command buffer, the commit method is called, which sends this buffer to the queue.
There is nothing unusual in shader code, an elementary implementation of Blinn lighting. For Metal, Apple engineers came up with their own shader language, which is not very different from HLSL, GLSL and Cg. Those who have written shaders at least once in one of the languages listed will easily use this language as well, for the rest I recommend the Apple language guide .

Shader Code

 #include <metal_stdlib> #include <simd/simd.h> using namespace metal; constant float3 lightDirection = float3(0.5, -0.7, -1.0); constant float3 ambientColor = float3(0.18, 0.24, 0.8); constant float3 diffuseColor = float3(0.4, 0.4, 1.0); constant float3 specularColor = float3(0.3, 0.3, 0.3); constant float specularPower = 30.0; typedef struct { float4x4 modelViewProjection; float4x4 model; float3 viewPosition; } uniforms_t; typedef struct { packed_float3 position; packed_float3 normal; packed_float3 tangent; } vertex_t; typedef struct { float4 position [[position]]; float3 tangent; float3 normal; float3 viewDirection; } ColorInOut; // Vertex shader function vertex ColorInOut vsLighting(device vertex_t* vertex_array [[ buffer(0) ]], constant uniforms_t& uniforms [[ buffer(1) ]], unsigned int vid [[ vertex_id ]]) { ColorInOut out; float4 in_position = float4(float3(vertex_array[vid].position), 1.0); out.position = uniforms.modelViewProjection * in_position; float4x4 m = uniforms.model; m[3][0] = m[3][1] = m[3][2] = 0.0f; // suppress translation component out.normal = (m * float4(normalize(vertex_array[vid].normal), 1.0)).xyz; out.tangent = (m * float4(normalize(vertex_array[vid].tangent), 1.0)).xyz; float3 worldPos = (uniforms.model * in_position).xyz; out.viewDirection = normalize(worldPos - uniforms.viewPosition); return out; } // Fragment shader function fragment half4 psLighting(ColorInOut in [[stage_in]]) { float3 normalTS = float3(0, 0, 1); float3 lightDir = normalize(lightDirection); float3x3 ts = float3x3(in.tangent, cross(in.normal, in.tangent), in.normal); float3 normal = -normalize(ts * normalTS); float ndotl = fmax(0.0, dot(lightDir, normal)); float3 diffuse = diffuseColor * ndotl; float3 h = normalize(in.viewDirection + lightDir); float3 specular = specularColor * pow (fmax(dot(normal, h), 0.0), specularPower); float3 finalColor = saturate(ambientColor + diffuse + specular); return half4(float4(finalColor, 1.0)); }

As a result, the following can be seen on the screen of our device.

This concludes the first step of the guide. The code for this step is available in the git repository under the tutorial_1_1 tag.

Step 2. We draw some cubes.

In order to draw several cubes, you need to change our constant buffer. Previously, it stored the parameters (the matrix of the world-view-projection, the matrix of the world and the camera position) for only one object, now this data must be set for all objects. Obviously, the camera position is enough to transfer once, for this you need an additional constant buffer for parameters that are calculated 1 time per frame. However, I have not yet started a separate buffer for one vector, we will do it the next time when the number of parameters increases. You can try to do it yourself now. Thus, for 5 cubes, we will have 5 sets of parameters for each of the 3 frames, which the CPU can manage to calculate, until it is synchronized with the GPU.
We will change the rendering method as follows:

  id <MTLRenderCommandEncoder> renderEncoder = [commandBuffer renderCommandEncoderWithDescriptor: renderPassDescriptor]; renderEncoder.label = @"Simple render encoder"; [renderEncoder setDepthStencilState:_depthState]; [renderEncoder pushDebugGroup:@"Draw cubes"]; [renderEncoder setRenderPipelineState:_pipelineState]; [renderEncoder setVertexBuffer:_vertexBuffer offset:0 atIndex:0 ]; for (int i = 0; i < CUBE_COUNTS; i++) { [renderEncoder setVertexBuffer:_dynamicUniformBuffer offset:(sizeof(_uniform_buffer) * _currentUniformBufferIndex + i * sizeof(uniforms_t)) atIndex:1 ]; [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:36 instanceCount:1]; } [renderEncoder popDebugGroup]; [renderEncoder endEncoding];

I want to draw your attention to the calculation of the displacement in a constant buffer ( sizeof (_uniform_buffer) * _currentUniformBufferIndex + i * sizeof (uniforms_t) ). The _currentUniformBufferIndex variable determines the block corresponding to the current frame, and the counter i determines where the data for a specific cube is located.
As a result, we get something like this.

The code for this step is available in the git repository under the tutorial_1_2 tag.

Step 3. We draw several cubes in several streams.

We can also draw cubes in one stream on OpenGL ES, now we will add to the demo the filling of the command buffer in several streams. Let half the dice be rendered in one stream and the other half in the other. An example, of course, purely academic, we will not get any performance gain from this.
For multi-threaded command buffer filling in Metal API there is a special class MTLParallelRenderCommandEncoder . This class allows you to create as many objects of the MTLRenderCommandEncoder class, which we already know from the previous steps. Each of these objects allows you to execute code by filling the buffer with commands in a separate thread.
Using dispatch_async , we will start rendering half of the cubes in a separate stream, the second half will be rendered in the main stream. As a result, we get the following code:

 - (void)render:(RenderView*)renderView { dispatch_semaphore_wait(_inflightSemaphore, DISPATCH_TIME_FOREVER); [self update]; MTLRenderPassDescriptor* renderPassDescriptor = renderView.renderPassDescriptor; id <CAMetalDrawable> drawable = renderView.currentDrawable; // new command buffer id <MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer]; commandBuffer.label = @"Simple command buffer"; // parallel render encoder id <MTLParallelRenderCommandEncoder> parallelRCE = [commandBuffer parallelRenderCommandEncoderWithDescriptor:renderPassDescriptor]; parallelRCE.label = @"Parallel render encoder"; id <MTLRenderCommandEncoder> rCE1 = [parallelRCE renderCommandEncoder]; id <MTLRenderCommandEncoder> rCE2 = [parallelRCE renderCommandEncoder]; dispatch_async(dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^ { @autoreleasepool { [self encodeRenderCommands: rCE2 Comment: @"Draw cubes in additional thread" StartIndex: CUBE_COUNTS / 2 EndIndex: CUBE_COUNTS]; } dispatch_semaphore_signal(_renderThreadSemaphore); }); [self encodeRenderCommands: rCE1 Comment: @"Draw cubes" StartIndex: 0 EndIndex: CUBE_COUNTS / 2]; // wait additional thread and finish encoding dispatch_semaphore_wait(_renderThreadSemaphore, DISPATCH_TIME_FOREVER); [parallelRCE endEncoding]; __block dispatch_semaphore_t block_sema = _inflightSemaphore; [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer) { dispatch_semaphore_signal(block_sema); }]; _currentUniformBufferIndex = (_currentUniformBufferIndex + 1) % MAX_INFLIGHT_BUFFERS; [commandBuffer presentDrawable:drawable]; [commandBuffer commit]; } - (void)encodeRenderCommands:(id <MTLRenderCommandEncoder>)renderEncoder Comment:(NSString*)comment StartIndex:(int)startIndex EndIndex:(int)endIndex { [renderEncoder setDepthStencilState:_depthState]; [renderEncoder pushDebugGroup:comment]; [renderEncoder setRenderPipelineState:_pipelineState]; [renderEncoder setVertexBuffer:_vertexBuffer offset:0 atIndex:0 ]; for (int i = startIndex; i < endIndex; i++) { [renderEncoder setVertexBuffer:_dynamicUniformBuffer offset:(sizeof(_uniform_buffer) * _currentUniformBufferIndex + i * sizeof(uniforms_t)) atIndex:1 ]; [renderEncoder drawPrimitives:MTLPrimitiveTypeTriangle vertexStart:0 vertexCount:36 instanceCount:1]; } [renderEncoder popDebugGroup]; [renderEncoder endEncoding]; }

To synchronize the main and additional threads, I used the _renderThreadSemaphore semaphore, which synchronizes the two threads just before the endEncoding call on the MTLParallelRenderCommandEncoder class object . MTLParallelRenderCommandEncoder requires that the endEncoding method be called guaranteed after endEncoding calls on objects of the MTLRenderCommandEncoder class generated by it.
If everything was done correctly, then the result will be the same on the device screen as in the previous step.

The code for this step is available in the git repository under the tutorial_1_3 tag.

Conclusion

Today we looked at the very first steps in graphics programming using the Apple Metal API. If this topic and this format will be interesting to the community, we will continue further. In the next series, I plan to draw a more interesting model, we use the index buffer and texture it. As a “trick” of the lesson there will be something like an instance. Waiting for your feedback, thank you for your attention.

Source: https://habr.com/ru/post/248785/

All Articles