BVH- 9,01
1,44
MIP- 2,00
11,02
top
reported that about 70 GB of memory was actually used, that is, the statistics do not take into account 45 GB. Small deviations are quite understandable: dynamic memory allocators require additional space to register resource use, some are lost due to fragmentation, and so on. But 45 GB? Something bad is definitely hiding here.Primitive
base class, which are not counted in statistics. A little oversight that is fairly easy to fix . After that we see the following:Primitives 24,67
Shape
, which is a pure geometry (sphere, triangle, etc.) and Primitive
, which is a combination of geometry, material, sometimes the function of radiation and the involved medium inside and outside the surface of the geometry.Primitive
base class: GeometricPrimitive
, which is a standard case: a “vanilla” combination of geometry, material, etc., as well as a TransformedPrimitive
, which is a primitive with transformations applied to it, either as an instance of an object or for moving primitives with time-varying transformations. It turns out that in this scene both of these types are a waste of space.GeometricPrimitive
. It’s funny to live in a world where 4.3 GB of used RAM is not your biggest problem, but let's still see where we have 4.3 GB of GeometricPrimitive
. Here are the relevant parts of the class definition: class GeometricPrimitive : public Primitive { std::shared_ptr<Shape> shape; std::shared_ptr<Material> material; std::shared_ptr<AreaLight> areaLight; MediumInterface mediumInterface; };
MediumInterface
containing two more pointers with a total size of 48 bytes. In this scene, there are only a few meshes radiating illumination; therefore, areaLight
almost always a null pointer, and there is no environment influencing the scene, so both mediumInterface
pointers mediumInterface
also null. Thus, if we had a specialized implementation of the Primitive
class, which could be used in the absence of the radiation function and the environment, we would have saved almost half of the disk space occupied by GeometricPrimitive
- in our case, about 2 GB.Primitive
implementation to pbrt. We strive as much as possible to minimize the differences between the pbrt-v3 source code on github and the system described in my book, for a very simple reason — keeping them in sync makes it easier to read the book and work with the code. In this case, I decided that a completely new implementation of Primitive
, never mentioned in the book, would be too big a discrepancy. But this fix will definitely appear in the new version of pbrt.GeometricPrimitive
was a pretty painful hit, but what about 17.4 GB under TransformedPrimitive
?TransformedPrimitive
used for both time-varying transformations and for instances of objects. In both cases, we need to apply an additional transformation to the existing Primitive
. There are only two members in the TransformedPrimitive
class: std::shared_ptr<Primitive> primitive; const AnimatedTransform PrimitiveToWorld;
AnimatedTransform
? const Transform *startTransform, *endTransform; const Float startTime, endTime; const bool actuallyAnimated; Vector3f T[2]; Quaternion R[2]; Matrix4x4 S[2]; bool hasRotation; struct DerivativeTerm { // ... Float kc, kx, ky, kz; }; DerivativeTerm c1[3], c2[3], c3[3], c4[3], c5[3];
Primitive
implementation for fixed instances of objects, 17.4 GB are compressed to only 900 MB (!).GeometricPrimitive
, fixing it is a non-trivial change from what was described in the book, so we will also postpone it for the next version of pbrt. At least, we now understand what is happening with the chaos of 24.7 GB of Primitive
memory.TransformCache
, which occupied approximately 16 GB. (Here is a link to the original implementation .) The idea is that the same transformation matrix is often used in the scene several times, so it’s best to have a single copy in memory so that everyone using its elements simply stores a pointer to the same thing. transformation.TransformCache
used std::map
, and massif reported that 6 out of 16 GB were used for black-mahogany nodes in std::map
. This is a terrible lot: 60% of this volume is used for the transformations themselves. Let's look at the announcement for this distribution: std::map<Transform, std::pair<Transform *, Transform *>> cache;
Transform
is used entirely as keys for distribution. Even better, the pbrt Transform
stores two 4x4 matrices (the transformation matrix and its inverse matrix), which results in 128 bytes of storage at each node of the tree. All this is absolutely unnecessary for the value stored for it.std::map
to bypass the red-black tree involves a lot of navigation operations on the pointers, so it seems logical to try something completely new. Fortunately, there is little written about TransformCache
in the book, so it is perfectly acceptable to completely rewrite it.Lookup()
method, another problem becomes apparent: void Lookup(const Transform &t, Transform **tCached, Transform **tCachedInverse)
Transform
, the cache stores and returns pointers to the transformation equal to the transmitted one, but also passes the inverse matrix. To make this possible, in the original implementation, when adding a transformation to the cache, the inverse matrix is always calculated and saved so that it can be returned.Transform *
array, which, in effect, reduces the amount of memory used to the value that is really needed to store all Transform
.Transform *Lookup(const Transform
&t)
Transform *Lookup(const Transform
&t)
Transform *Lookup(const Transform
&t)
; in one place where the calling function wants to get an inverse matrix from the cache, it simply calls Lookup()
twice.TransformCache
implementation, the overall system startup time has been significantly reduced, to 21 minutes 42 seconds. That is, we saved another 5 minutes 7 seconds, or accelerated 1.27 times. Moreover, more efficient use of memory has reduced the space occupied by the transformation matrices from 16 to 5.7 GB, which is almost equal to the amount of stored data. This allowed us not to try to take advantage of the fact that they are not projective, and store 3x4 matrices instead of 4x4. (In the usual case, I would be skeptical about the importance of this kind of optimization, but here it would save us more gigabytes - a lot of memory! This is definitely worth doing in the renderer of production.)TransformedPrimitive
structure costs us both memory and time: the profiler reported that a significant amount of time was spent on launching in the AnimatedTransform::Decompose()
function, which decomposes the transformation of the matrix into a quaternion rotation, transfer, and scale. Since nothing is moving in this scene, this work is not needed, and a thorough check of the implementation of the AnimatedTransform
showed that none of these values are accessed if the two transformation matrices are actually identical.Source: https://habr.com/ru/post/417445/
All Articles