#include <chrono> #include <iostream> #include <vector> using namespace std; using namespace std::chrono; struct S { uint64_t u; double d; int i; float f; }; struct Data { vector<uint64_t> vu; vector<double> vd; vector<int> vi; vector<float> vf; }; int test1(S const & s1, S const & s2) { return s1.i + s2.i; } int test2(Data const & data, size_t const ind1, size_t const ind2) { return data.vi[ind1] + data.vi[ind2]; } int main() { size_t const N{ 30000 }; size_t const R{ 10 }; vector<S> v(N); Data data; data.vu.resize(N); data.vd.resize(N); data.vi.resize(N); data.vf.resize(N); int result{ 0 }; cout << "test #1" << endl; for (uint32_t i{ 0 }; i < R; ++i) { auto const start{ high_resolution_clock::now() }; for (size_t a{ 0 }; a < v.size() - 1; ++a) { for (size_t b{ a + 1 }; b < v.size(); ++b) { result += test1(v[a], v[b]); } } cout << duration<float>{ high_resolution_clock::now() - start }.count() << endl; } cout << "test #2" << endl; for (uint32_t i{ 0 }; i < R; ++i) { auto const start{ high_resolution_clock::now() }; for (size_t a{ 0 }; a < v.size() - 1; ++a) { for (size_t b{ a + 1 }; b < v.size(); ++b) { result += test2(data, a, b); } } cout << duration<float>{ high_resolution_clock::now() - start }.count() << endl; } return result; } S structure is 24 bytes. My processor (Intel Core i7) has 32KB cache per core with 64B cache line (cache line). This means that when requesting data from memory, only two S structures will fit completely into one cache line. In the first test, I read only one int field, i.e. with one memory access in one cache line there will be only 2 (sometimes 3) fields we need. In the second test, I read the same int value, but from a vector. std::vector guarantees consistency of data. This means that when accessing the memory in one cache line there will be 16 ( 64B / sizeof(int) = 16 ) values we need. It turns out that in the second test, we turn to memory less often. A memory reference is known to be a weak link in modern processors.Hello World category, i.e. far from real life. In real code, there are a lot of dependencies and specific data that may not give a performance boost. In addition, if in the tests we address all fields of the structure, there will be no difference in performance.Shape class, which represents the physical body, looks like this: class Shape { public: Shape(uint32_t const numVertices, float radius, math::Vec2 const pos, math::Vec2 const vel, float m, math::Color const col); static Shape createWall(float const w, float const h, math::Vec2 const pos); public: math::Vec2 position{ 0.0f, 0.0f }; math::Vec2 velocity{ 0.0f, 0.0f }; math::Vec2 overlapResolveAccumulator{ 0.0f, 0.0f }; float massInverse; math::Color color; std::vector<math::Vec2> vertices; math::Bounds bounds; }; position and velocity , I think, is obvious. vertices - the vertices of the figures given randomly.bounds is a bounding box that completely contains a shape — it is used to pre-check intersections.massInverse is a unit divided by mass - we will use only this value, therefore we will store it, instead of mass.color - color - used only when rendering, but stored in a copy of the figure, set randomly.overlapResolveAccumulator see explanation below.
bounds . But after moving, the triangle intersects another figure - b , and again we have to move it and recalculate the bounds again. Notice that after the second movement, the triangle will again be above the a figure. To avoid recalculations, we will store the amount by which the triangle should be moved in a special accumulator — overlapResolveAccumulator — and later we will move the shape to this value, but only once.ShapesApp::update() method. Here is its simplified version: void ShapesApp::update(float const dt) { float const dtStep{ dt / NUM_PHYSICS_STEPS }; for (uint32_t s{ 0 }; s < NUM_PHYSICS_STEPS; ++s) { updatePositions(dtStep); for (size_t i{ 0 }; i < _shapes.size() - 1; ++i) { for (size_t j{ i + 1 }; j < _shapes.size(); ++j) { CollisionSolver::solveCollision(_shapes[i].get(), _shapes[j].get()); } } } } ShapesApp::updatePositions() method, which changes the position of each shape and calculates the new Shape::bounds . Then we check each piece with each other for an intersection - CollisionSolver::solveCollision() . I used Separating Axis Theorem (SAT) . All these checks we do NUM_PHYSICS_STEPS times. This variable serves several purposes - first, the physics is more stable, and second, it limits the number of objects on the screen. c ++ is fast, very fast, and without this variable we will have tens of thousands of shapes, which will slow down the rendering. I used NUM_PHYSICS_STEPS = 20
Grid - which consists of NxM cells - Cell . At the beginning of the calculations, we will write to each cell the objects that are in it. Then we will only need to run through all the cells and check the intersection of several pairs of objects. I have repeatedly used this approach in releases and it has proved itself - it is written very quickly, easily debugged, easy to understand.Grid class has appeared and the ShapesApp::update() method has slightly changed - now it calls the grid methods for checking intersections.
ShapesApp::updatePositions , which previously ran through all the figures, setting a new position and recalculating bounds , now runs only in part of the figures, thus reducing the load on one core. The pool class was honestly skipiped from here . In tests, I use four threads (counting the main one). The finished version can be found here .
Shape class. To do this, we added std::mutex to Shape and Cell .
Shape structure and see if this small modification can affect performance. To everyone's joy, refactoring was not difficult, even trivial. Instead of Shape we will use a structure with vectors: struct ShapesData { std::vector<math::Vec2> positions; std::vector<math::Vec2> velocities; std::vector<math::Vec2> overlapAccumulators; std::vector<float> massesInverses; std::vector<math::Color> colors; std::vector<std::vector<math::Vec2>> vertices; std::vector<math::Bounds> bounds; }; solveCollision(struct ShapesData & data, std::size_t const indA, std::size_t const indB); Source: https://habr.com/ru/post/321106/
All Articles