📜 ⬆️ ⬇️

How we dispersed CAD KOMPAS-3D → Part 2

In the last part, we talked about the emergence of KOMPAS-3D v18, something about the selection of criteria and models for testing new functions, and also touched upon the theme of rendering in the Basic version.
Let's continue with the story about the version of the drawing “Superior”.


Draw calls

Alexander Tulup, programmer:
“The main performance problem of displaying large scenes is associated with a large number of so-called“ drawing calls ”. The old version of rendering is built on top of a mathematical data model. Thus, for each primitive — points, edges, faces — a separate method was called for its display.

For each “draw call” OpenGL (driver) performs a series of checks, simultaneously translating incoming commands in a format that is understandable to the video card, after which the calls are added to the queue and from there they are sent for execution.
')

GPU Command Transfer Scheme in OpenGL ( source )

With a large number of details, the number of calls to the CPU grows so much that the data simply do not have time to go to the video card. We get a situation where on a very strong video card it “slows down” as well as on a medium or weaker one.

You can fight this by reducing the number of drawings (state changes) - group by material, combine common geometry ( instancing ), and so on.

We should not forget that from the whole scene we see only a part of it. The algorithms for detecting invisible objects are applicable here (frustum culling, occlusion culling, etc.)

Inspired by the example of The Road to One Million Draws and AZDO , we decided to take a rather unusual way: to get rid of the state changes on the CPU side as much as possible. Now almost everything is done on the video card. All the necessary attributes are obtained directly from the video memory while drawing from the shader itself ( shader ), which was made possible by the increase in the amount of video memory ( VRAM ) and the advent of SSBO .


1,000,000 cubes

From the advantages of this approach: the display speed has become really high. The speed is limited only by the capabilities of the GPU, namely the amount of data it is capable of processing.

It also made it possible to effectively implement the mechanisms for cutting off invisible objects. The results of the visibility check are recorded directly into the video memory, and drawing commands are formed from there on the basis of them. That is, it is not necessary to wait on the CPU side.

One of the main disadvantages of this approach is the high complexity of the development. Much has to be implemented again, taking into account the chosen approach. In addition, we often had to deal with a situation where the same shader code worked differently or did not work at all on video cards from different manufacturers. Often this was “cured” by updating the driver, but sometimes after a long debugging it was necessary to rewrite the code.

Naturally, the requirements for the video card also increased. OpenGL 4.5 support is key, but not the only requirement.
Below are the results of the rendering speed during rotation of the assembly. Recall that 24 frames per second (fps) are considered comfortable indicators for the human eye.
Hereinafter, measurements were performed on a PC with the following configuration:
CPU: Intel Core i7-6700K 4.00 GHz
RAM: 32 Gb
GPU: NVidia Quadro P2000
OS: Microsoft Windows 10 x64 Professional
Table 1. Frame rate (number of frames per second, fps) on various models. More is better. Display Mode: Halftone + Frame, Simplified Mode Off, Smoothing: Medium (MSAA 8x)
Modelamount
components
Frame rate, fps
V16.1v17.1v18
image
A machine
mosaic grinding
27644.14.7124.9

PGU-410
1083370.30.428.6

Wagon tipper
173421.11.4124.7

Trolley bus
97831.92.4124.9

North tidal
power station
484450.30.576.1

Installation
vacuum technology
71891.92.3124.9

Ship Reducer
power plant
64142.63.6123.9


Adding components to a large assembly


The scenario with the addition of components to a larger assembly eventually evolved into the so-called complex test, which is described in Table 2.

Table 2. Scenario with adding components to a larger assembly. Testing criteria
CriterionCriterion Description
File opening speedComponents to be added to the assembly need to be loaded from disk.
Drawing speedThe assembly and the component to be inserted must be positioned; to do this, you need to rotate / move / zoom the image
Object selection speedTo create mates, you need to select basic objects: faces, planes, edges, etc.
Synchronization speed with the construction treeThe component added to the assembly and its mates must be represented in the construction tree
Synchronization rate with the specification moduleThe component added to the assembly must be taken into account in the specification.

In the table you can see the points (drawing, opening), which from the very beginning were chosen as separate directions of accelerations. But improvements required other components.

Significant time occupied by synchronization with the tree. We solved the problem by implementing its partial update.

Another difficulty was the significant effect of specification characteristics on the performance of KOMPAS-3D. In some integrated test scenarios, this component was basic (50% or more).
Specification
The specification is a module of the KOMPAS-3D system, which is responsible for the formation of the design document of the same name. It is developed by a separate team.

In particular, the team accelerated synchronization when inserting by reworking the internal mechanisms of the specification module.


Some results


Add components to the assembly "Ship Power Plant Reducer".


Comprehensive test for the assembly of the "Reducer of the ship power plant."
Figures show: 1 - bracket, 2 - washer, 3 - bolt.

Table 3. Time to insert components into a large assembly in seconds. Less is better.
ComponentActTime with
V16.1v17.1v18
Insert
component
Bracket

Loading2.03.02.2
Switch to mate mode0.60.40.4
First mateSelect the first object0.41.00.2
Select the second object0.51.10.2
Select the desired mate3.83.61.0
Second mateSelect the first object0.51.40.5
Select the second object0.51.40.2
Select the desired mate3.63.01.2
Third mateSelect the first object0.50.50.5
Select the second object0.31.10.3
Select the desired mate3.73.21.1
Confirm Insert Creation7,85.22.3
Total insert kronshteyna24.224.610.1
Insert
washers
from the library
standard
products



Select first mate6.42.40.4
Select the second pairing4.23.10.4
Confirm Insert Creation15.79.24.4
Totally to insert the puck26.314.75.2
Insert
the bolt

Loading2.02.72.0
Switch to mate mode0.50.50.5
First mateSelect the first object0.41.00.2
Select the second object0.41.10.2
Select the desired mate3.42.71.0
Second mateSelect the first object0.41.20.4
Select the second object0.50.50.4
Select the desired mate3.72.91.0
Third mateSelect the first object0.51.00.5
Select the second object0.51.00.2
Select the desired mate4.23.91.2
Confirm Insert Creation32.55.42.2
Total for bolt insertion4921.29.8
Totally insert three components99.560.525.1


A comprehensive test can be viewed as one of the assembly editing scenarios (among the common ones).

In addition, the rebuilding of assemblies has accelerated. Now, if you edit any operation, a complete rebuilding of the entire assembly will not occur - only the changed objects will be updated. To determine dependent operations, i.e., those operations, the result of which could be affected by the result of a modified operation, a special algorithm is used that builds connections between operations, bodies and inserts.

Opening assemblies


The main idea to increase the speed of reading files is to make KOMPAS-3D read only what the user needs at the moment.

For example:


All this required the refinement of the data structure in the file so that it could be read into individual parts.

Anton Sidyakin, programmer, teamlead:

“For some time the KOMPAS-3D file has been an archive that combines several service files. One of them contains data on the model / assembly document organized in a tree structure. The ability to navigate through this structure has already been. For partial reading it was necessary to ensure the independence of the parts from each other. Thus, the received parts should not refer to each other, otherwise the part with the reference would become “incomplete”.

As a result, for details, it was possible to separate performances from the document and from each other. In assemblies, the container of inserts and mates is highlighted separately. Inside the performances, it also turned out to separate the source data for the construction and the results in the form of triangulation and bodies.



If we talk about simplified loading types, then the edited assembly is loaded completely, and from its inserts only triangulation is loaded and, depending on the type, the boundary representation (B-rep). Some difficulties were represented by the display in this mode of inserts with modified external variables, since they were previously obtained on the fly by rebuilding when reading, and in simplified loading types there is no data for this. The decision was to record the results of rebuilding such inserts in the assembly. This gave acceleration due to the lack of rebuilding.

The described division of the document into parts allowed loading into the assembly only the performances selected in the inserts.
In addition to speeding up the opening of files, partial reading also helped to reduce consumed resources - primarily RAM.

On the basis of improvements, a new type of assembly loading appeared - “Partial”. In this type of loading, only results (bodies, surfaces) and triangulation are read from the file. Partial download allows you to create mates and, in terms of functionality, is close to full component loading.

After the implementation of improvements on partial reading, the creation of custom loading types becomes promising.

hint
Custom download types are combinations of the system component download methods. This function is not new, but the improvements made in v18, allow you to get noticeable bonuses from its use.


For components that are not important for further constructions, the loading type “Empty” can be applied. These can be components hidden inside others (“vnutryanka”). In v18, components (and entire assemblies) with the “Empty” download type open almost instantly.

Table 4. Opening time for assemblies with “Empty” and “Dimension” download types in seconds. Less is better.
ModelDownload typeOpening time, with
V16.1v17.1v18

Installation
vacuum technology
Empty12.811.72.5
Dimension21.220.82.6

Ship Reducer
power plant
Empty31.015.97.2
Dimension371.5114.87.3


The remaining components that are needed to understand the appearance of the product or will be used as reference objects for further construction can be downloaded "Fully" or "Partially".

As a tool for the preparation of custom download types, you can use new commands to select "invisible" components. We use the command and then use the context menu to change the type of download for the selected components to “Empty”.

Projection


While accelerating the projection, we asked ourselves the question of filtering the input data to the mathematical core.

First decided to filter the invisible components / bodies. For this purpose, they used the mechanism for cutting off invisible bodies (occlusion-culling) - it allows you to find out if the body to be projected is visible or it is closed and is inside some other body. This operation is performed on the side of the video card.

The greatest effect will be when creating projections of models with a large number of components hidden inside closed volumes, for example:


For the inclusion of responsible option "draft projection." The name is non-random - relatively small parts may not be projected on the scale of the assembly (for example, a bolt on the scale of a power plant). Many users will be satisfied with this state of affairs, especially in the case of creating dimensional drawings and general drawings.

Read more about the option “Rough projection”
The option is available only for standard projections. For clarifying images (sections, sections, detail views) "Rough projection" is not involved.


Even without using this option, the projection is noticeably faster compared to the V16 and v17. This was helped by improvements on the side of the mathematical core.

Table 5. Time to create three standard projections in seconds. Less is better.
ModelCreation time of three standard projections, with
V16.1v17.1v18
Enabled
draft
projection
v18
Disabled
draft
projection

Installation
vacuum technology
124.147.512.934.6

Ship Reducer
power plant
25641038.454.4

Multipurpose
unified
box body
99.9123.444.953.5


Also in v18, the possibility of rebuilding individual associative species was realized.

In the drawing, which contains many associative views, the user has the opportunity to rebuild some irrelevant views. For example, the one in which he wants to add annotations. You can also specify the types built with the “Rough projection” option enabled.

Rebuilding a separate view


This feature does not apply to explicit accelerations, but allows the user to save time.

The result of the work done to accelerate the projection of the model. Installation of vacuum-technological in the drawing:


In the next part, we will explain how the calculation of mass-centering characteristics (ICC) accelerated, the contribution to the performance of KOMPAS-3D of the geometric core of c3dlabs , changes in C3D Modeler, and what hardware is suitable for v18.

Source: https://habr.com/ru/post/447516/


All Articles