In continuation of the heading "admin summary" I would like to understand the nuances of the RAM technology of modern iron: in the register memory, ranks, memory banks and so on. Let us take a closer look at the reliability of data storage in memory and those technologies that relieve administrators of BSOD sorrows innumerable times a day.
Today on the market there are mainly DDR SDRAM memory modules: DDR2, DDR3, DDR4. Different generations differ among themselves by a number of characteristics - in general, each next generation is "faster, higher, stronger", and for the curious, here is a sign:
For selecting the correct memory, the modules themselves are of greater interest:
RDIMM is a registered (buffered) memory. Convenient for installing a large amount of RAM compared to unbuffered modules. Of the minuses - lower performance;
UDIMM (unregistered DRAM) - unregistered or unbuffered memory is random access memory that does not contain any buffers or registers;
LRDIMM - these modules provide higher speeds with higher capacity compared to dual-rank or four-rank RDIMM modules, due to the use of additional memory buffer chips;
HDIMM (HyperCloud DIMM, HCDIMM) - modules with virtual ranks that have greater density and provide a higher speed of work. For example, 4 physical ranks in such modules can be represented for the controller as 2 virtual ones;
Attempting to use these types at the same time can cause a variety of unfortunate consequences, including damage to the motherboard or the memory itself. But it is possible to use the same type of modules with different characteristics, since they are backward compatible on the clock frequency. True, the final frequency of the memory subsystem will be limited by the capabilities of the slowest module or memory controller.
For all types of SDRAM memory there is a common set of basic characteristics affecting the volume and performance:
frequency and mode of operation;
rank;
Of course, there are actually more differences, but to build a properly working system, you can limit yourself to these.
It is clear that the higher the frequency - the higher the overall memory performance. But the memory will still not work faster than the controller on the motherboard allows it. In addition, all modern modules are able to work in multi-channel mode, which increases the overall performance up to four times .
Modes of operation can be divided into four groups:
Single Mode - single channel or asymmetric. It turns on when only one memory module is installed in the system or all modules differ from each other. In fact, it means the absence of multi-channel access;
Dual Mode - dual channel or symmetrical. Memory slots are grouped by channels, each of which is set to the same amount of memory. This allows you to increase the speed of work by 5-10% in games, and up to 70% in heavy graphics applications. Memory modules must be installed in pairs on different channels. Motherboard manufacturers usually highlight paired slots in one color;
Triple Mode - three-channel mode. Modules are installed in groups of three - for each of the three channels. The following modes work similarly : four-channel (quad-channel), eight-channel (8-channel memory), etc.
For maximum performance, it is better to install the same modules with the maximum possible frequency for the system. At the same time, use the installation in pairs or groups - depending on the available multi-channel operation mode.
A rank is a memory area of several 64-bit memory chips (72 bits with ECC, which we will talk about later). Depending on the design, the module can contain one, two or four ranks.
You can learn this parameter from the markings on the memory module. For example, Kingston can easily calculate the number of ranks by one of the three letters in the middle of the marking: S (Single - one-rank), D (Dual - two-rank), Q (Quad - four-rank).
An example of a complete decoding of markings on Kingston modules:
Server motherboards are limited by the total number of memory ranks they can work with. For example, if the maximum can be set to eight ranks with four dual-rank modules already installed, then the memory cannot be added to the free slots.
Before buying modules, it makes sense to clarify what types of memory the server processor supports. For example , Xeon E5 / E5 v2 supports one-, two-, and four-rank register DIMM (RDIMM) modules, LRDIMM, and non-buffered ECC DIMM (ECC UDIMM) DDR3. And the Xeon E5 v3 processors support single- and two-rank register DIMM modules as well as LRDIMM DDR4.
Timings or memory latency (CAS Latency, CL) - the amount of delay in cycles from the arrival of a command to its execution. The numbers of timings indicate the parameters of the following operations:
CL (CAS Latency) - the time that elapses between a processor requesting some data from the memory and the time when the data is issued by the memory;
tRCD (RAS to CAS delay) - the time that must elapse from the moment the matrix row is accessed (RAS) until the matrix column (CAS) is accessed with the necessary data;
tRP (RAS Precharge) - the interval from closing access to one row of the matrix, and before starting to access another;
tRAS - pause to return memory to the waiting state of the next request;
Of course, the lower the timings - the better for speed. But for low latency, you will have to pay with a clock frequency: the lower the timings, the lower the clock frequency allowed for the memory. Therefore, the right choice would be the "golden mean".
There are also special more expensive modules marked "Low Latency", which can operate at a higher frequency at low timings. When expanding memory, it is advisable to select modules with timings similar to those already installed.
Errors in the storage of data in RAM are inevitable. They are classified as hardware failures and irregular errors (malfunctions). A parity memory is able to detect an error, but is unable to correct it.
For the correction of irregular errors, the ECC-memory is used, which contains an additional chip for detecting and correcting errors in individual bits.
The error correction method works as follows:
When writing 64 bits of data to a memory cell, a checksum of 8 bits is calculated.
When the processor reads the data, the checksum of the received data is calculated and compared with the original value. If the amounts do not match - this is an error.
Advanced ECC technology is capable of correcting multi-bit errors in a single chip, and it is possible to recover data with it even if the entire DRAM module fails.
Error correction needs to be separately enabled in the BIOS
Most server memory modules are register (buffered) - they contain data transfer control registers.
Registers also allow you to install large amounts of memory, but because of them additional delays in the work. The fact is that each reading and writing is buffered in the register for one clock cycle before it gets from the memory bus to the DRAM chip, so the register memory is slower than the register memory for one clock cycle.
Source - nix.ru
All register modules and fully buffered memory also support ECC, but the reverse is not always true. For reasons of reliability, it is better for the server to use the register memory.
For correct and fast operation of several processors, each of them needs to allocate its own memory bank for direct access. It is better to read about the organization of these banks in a particular server in the documentation, but the general rule is this: we distribute the memory equally between banks and in each set modules of the same type.
If you had to install modules into the server with a lower frequency than the motherboard requires, you need to include additional wait cycles in the BIOS when the processor is working with memory.
To automatically account for all the rules and recommendations for installing modules, you can use special utilities from the vendor. For example, HP has an Online DDR4 (DDR3) Memory Configuration Tool .
Instead of a spatial conclusion, I’ll give general guidelines for choosing a memory:
For HP multiprocessor servers, it is recommended to use only the register memory with the error correction function (ECC RDIMM), and for uniprocessor servers - unbuffered with ECC (UDIMM). The UDIMM strips for HP servers are best chosen from the same manufacturer to avoid spontaneous reboots.
In the case of RDIMM, it is better to choose single- and two-rank modules (1rx4, 2rx4). For optimal performance, use dual-rank memory modules in 1 or 2 DIMM configurations per channel. Creating a configuration of 3 DIMMs with the installation of modules in the third memory bank significantly reduces performance.
The list is short, but here everything is necessary and least obvious. Of course, the old-world RTFM principle has not been canceled.
Source: https://habr.com/ru/post/321554/
All Articles