LRDIMM (Load-Reduced Dual Inline Memory Module or “DIMM with reduced load”) is a type of memory module that has been supported by server platforms since 2012. LRDIMM modules are similar to register DIMM modules and fit the same memory slots. However, the principle of LRDIMM is different from RDIMM. Using LRDIMM in a regular server you can make 512GB, 1TB or 1.5TB of memory.

Memory buffer is the foundation of LRDIMM technology
Register DIMMs are connected directly to the bus connected to the memory controllers of the processor. In DIMM mode, the memory controller manages each DRAM chip connected to the control line of the module. And the more of these chips in the memory module (the so-called ranks), the greater the electrical load on the controller. Rank - the number of chipsets connected to one chip select line. Rank is a characteristic of the memory module. Below are two- and four-rank memory modules.
')
A two-rank module is two logical modules soldered on a printed circuit board and using the same physical data channel in turn. Chetyrehrangovy - a similar solution, but on a fourfold scale.
RDIMM is a register memory module. The name "register" means that modules of this type have a buffering register, which is used to buffer address and command signals.
In the case of LRDIMM, a special memory buffer chip attached to each module has been added to the bus. When the controller is working with LRDIMM modules, management is reduced to sending packet information (data and commands) to this module buffer - iMB (Isolation Memory Buffer). Unlike RDIMM modules, not only control signals, but also data are buffered.
The buffer controls all read and write operations in DRAM. Data and command / address signals pass through it - this is an intermediary between the memory controller (Host Memory Controller) and DRAM.
When adding new DRAM chips (ranks) to register DIMM modules, the electrical load of the memory modules increases. With an increase in the number of ranks per memory channel, memory speed decreases - its speed of operation. For RDIMM modules, it is optimal to install no more than two DIMM modules per channel, because using the third bank the memory speed decreases. A channel is a “path” from the memory module to the controller through which read and write data is transmitted.
LRDIMM modules do not have these limitations because they use memory buffer chips. When working with LRDIMM, memory controllers in processors operate in a sequential mode. Commands and data are transferred to a memory buffer that manages all read and write operations in DRAM.
Multiplication of ranks
LRDIMM modules significantly reduce the electrical load of DRAM chips on the data bus, and due to the so-called Rank Multiplication. Physical DRAM ranks for the memory controller appear as one logical rank of higher capacity. The following shows the multiplication of ranks for the three LRDIMMs per memory channel.
Rank multiplication can be disabled, set to 2: 1 or 4: 1 - up to 8 physical ranks per LRDIMM. For example, four-rank LRDIMM modules are converted for a memory controller into two-rank modules. That is, the controller regards a four-rank module as a two-rank, and an eight-rank module as a four-rank one. Due to this, the load of the multi-rank module becomes two times lower. As a result, the server can support LRDIMM modules at higher speeds compared to RDIMM modules.
Reducing the electrical load allows the system with LRDIMM to operate at a higher speed (memory clock frequency) with the same capacity, or to increase the RAM capacity while maintaining the same speed as in the configuration with RDIMM.
Thus, in practice, LRDIMM can be used to increase the speed of the memory and / or increase its capacity. LRDIMMs provide higher speeds with greater capacity for users who do not meet the requirements of 16-GB dual-rank RDIMM modules or 32-GB four-rank RDIMM modules.
For example, a two-processor server with twenty-four memory slots can be configured as follows:
- LRDIMM modules: 32 GB x 24 = 768 GB with a frequency of 1066 MHz and at a voltage of 1.5V and 1.35V.
- RDIMM modules: 32 GB x 16 = 512 GB with a frequency of 800 MHz and at a voltage of 1.5V.
Another example: Intel Xeon E5 v3 processors contain a four-channel memory controller and support up to eight logical ranks per channel. In total, you can install a maximum of eight four-rank modules of 32 GB per processor (two per channel). The memory capacity on the dual processor board in this case cannot exceed 512 GB. Peer or two-rank modules can be put up to three per channel, but they will have less capacity.
If you use four-rank LRDIMM modules, which the memory controller perceives as two-rank, then you can install up to 12 modules of 32 GB per processor - just 768 GB of memory running at a higher frequency. Now there are LRDIMM for 64 and 128 GB, this allows you to get a fantastic amount of memory on the server - up to 1.5-2Tb!
Note that you cannot combine LRDIMM and DIMM - the system will simply not start.
LRDIMM Features
In addition to increasing the capacity of RAM and its speed, the LRDIMM architecture has a number of other useful features. iMB, LRDIMM memory buffer, supports DRAM and LRDIMM testing tools, including transparent mode and MemBIST (Memory Built-In Self-Test), VREF (voltage reference) for data bus (DQ) and commands / addresses (CA), parity check for commands, built-in control, similar to register 32882 for RDIMM, optional SMBus interface (Serial Management Bus) for configuration and LRDIMM status registers, as well as an integrated temperature sensor.
Transparent Mode: Used to test the memory module. The module works simply as a buffer and transmits signals and data to DRAM chips.
MemBIST: For initializing DRAM and testing components, LRDIMM memory supports the MemBIST (Memory Built-In-Self Test) feature. It serves to fully test DRAM. Testing is performed with a working frequency, using access via the command / address bus or via SMBus.
VREF: LRDIMM modules can use external voltage parameters for data (VREFDQ) and commands / addresses (VREFCA) or internal ones from the memory buffer. If the VREF is set by the memory buffer, the host memory controller can control the voltage level. For this, memory buffer configuration registers are used. Programmable voltage levels allow suppliers of memory modules and system components to ensure the reliability and stability of the LRDIMM memory interfaces.
Parity check: To detect distorted commands on the command / address bus, the parity check is performed for incoming commands in the memory buffer. On error, the signal ERROUT_n is generated.
SMBus interface: a memory buffer supports management via an additional serial channel (out-of-band serial management bus). It allows you to write and read data from the status registers. 
Temperature sensor: it is built into the memory buffer and updated 8 times per second. You can access it through the SMBus interface. You can use the EVENT_n pin of the buffer to send a message to the controller of the high temperature memory.
How to overclock LRDIMM?
The unbuffered data bus remains the weakest link in the RDIMM memory system. For example, a four-rank DDR3 RDIMM module is four electrical loads on a data bus. Therefore, the maximum speed of four-rank DDR3 RDIMM is 1066 MT / s (million transactions per second) in the configuration “one DIMM per channel” (one DPC) and 800 MT / s in the configuration “two DIMM per channel” (two DPC). In LRDIMM, the buffer uses both the data bus and the command / address bus. This allows you to increase data transfer speed and memory density.
Below is a diagram of a data bus of a four-rank RDIMM module in a “two DIMM per channel” configuration. It shows that with 8 electrical loads on the data bus, the integrity of the signal in the memory channel is seriously degraded, which limits the frequency. With eight electrical loads and 1333 MT / s, the maximum “data window” on the bus is reduced to 212 ps at the ideal VREF point and does not exceed 115 mV at maximum voltage. The “data window” is the period of time when the controller can read data, and this period is shortened as the frequency at which the memory is running increases.
The compression effect of the data window means that two four-rank RDIMM modules in a “two DIMM per channel” configuration for operation at a speed of 1333 MT / s are not suitable. We have to choose a compromise between the capacity of the memory and its speed.
Below is a diagram of a data window in the case of two four-rank LRDIMM modules in a “two DIMM per channel” configuration. The electrical load of the 8 DRAM physical ranks is replaced by two electrical loads of the memory buffer. Signal integrity has improved significantly. Although the conditions are similar to the previous illustration, the data window increased from 212 to 520 ps, and its maximum height increased from 115 to 327 mV.
Improving the integrity of the signal means that LRDIMM can operate at a speed of 1333 MT / s and higher, even with several LRDIMM modules per channel. You will not need to choose between capacity and memory bandwidth.
A bit about system memory capacity
One of the main advantages of LRDIMM is the ability to significantly increase the capacity of RAM, without sacrificing the speed of the memory. Due to the electrical isolation of DRAM from the data bus, additional ranks can be added to each DIMM while maintaining the integrity of the signal, and additional DIMMs can be installed on each memory channel. A common option is LRDIMM with a capacity of 32 GB. These are 4Rx4 modules of 4 GB each, DDP (dual-die package) DRAM. Since each LRDIMM represents one electrical load for the memory controller, you can also install more DIMMs per channel.
Take, for example, a dual-processor server with three DIMM memory slots per channel, four channels per CPU. With LRDIMM, the capacity of the RAM can be increased two to three times compared to the RDIMM. Below are the maximum capacities of RDIMM and LRDIMM for various speeds and voltages.
For example, for a memory of 1.5V DDR3 at a speed of 800 MT / s, a system with a full set of RDIMM can use up to 384 GB of RAM when using a 16GB 2Rx4 RDIMM. The use of LRDIMM modules allows you to double this capacity - up to 768 GB. The limitations of the motherboard (usually 8 ranks of DRAM per channel) are overcome by multiplying the ranks of LRDIMM. In this case, 12 physical ranks per channel are obtained.
At a speed of 1066 or 1333 MT / s, signal integrity limitations do not allow using more than three DIMMs per channel in a configuration with RDIMMs. For 1.5V DDR3 memory with a speed of 1066 or 1333 MT / s, the maximum RAM capacity with RDIMM will be 256 GB. LRDIMM has no such restrictions, and you can set up three DIMMs per channel at 1066 MT / s (or 1333 MT / s). In this case, the total capacity of the RAM will be 768 GB, that is, three times more. For 1.35V DDR3L memory with a speed of 1333 MT / s, the advantage of LRDIMM is even greater.
And what about LRDIMM power consumption?
LRDIMM memory modules not only make it possible to increase the capacity of the main memory of the north, but also to do this with minimal energy efficiency losses. Although the memory buffer in LRDIMM in the “one DIMM per channel” configuration consumes more than the RDIMM in the same configuration, in high-density configurations — 2 and 3 DIMM per channel — the difference is leveled.
Below is the normalized power consumption per RDIMM or LRDIMM in configurations with one and with two DIMMs per channel with different memory speeds. Since the actual power consumption depends on the density and the DRAM technology used, the relative power is shown for the LRDIMM and RDIMM modules of the same DRAM generation. These are 4Rx4 modules with a capacity of 32 GB. The power of the RDIMM module at 800 MT / s is taken as a unit. For the measurement, standard tests with 50% write operation and 50% read operations were used.
At 800 MT / s in the “one DIMM per channel” configuration, LRDIMM consumes 17% more power than RDIMM, but in the “two DIMM per channel” configuration, the difference is only 3%. At 1066 MT / s is 15%, but in the “two DIMM per channel” configuration, the difference is also small. At 1333 MT / s, the power consumption per LRDIMM in the “two DIMM per channel” configuration is 28% less than in the “one DIMM per channel” configuration.
Below are similar results for 100% reading. Since LRDIMM is mainly used in systems with high memory density, the consumption of LRDIMM in the “two DIMM per channel” configuration is of greater interest. There are practically no losses in energy efficiency in this case.
Most Intel E5 platforms can support two LRDIMMs per channel at 1333 MHz and 1.5V voltage and three LRDIMMs per channel at 1066 MHz, which allows configurations with twelve LRDIMMs per processor; when using four-rank RDIMM modules, only 8 slots are used per processor and the maximum speed is 800 MHz.
Do you need LRDIMM modules?
How do I know if I need to use LRDIMM modules at all? Determine the memory transfer rate for your server (see the vendor’s performance documents). If you need more than 8 x 32 GB per processor, then you need LRDIMM modules, otherwise it will be enough to have four-rank RDIMM modules with a capacity of 32 GB with a frequency of 800 MHz. If 1066 MHz or 1333 MHz frequencies are required, only LRDIMMs should be used.
Below are the limitations on the ranks and maximum memory frequencies for the example of the Supermicro X9 (LGA2011) and X10 (LGA2011-3) dual-processor motherboards of the series when installing Intel Xeon E5 2600 processors of different generations.
Supermicro X10 Series + E5-2600 v3 (Haswell)Supermicro X10 Series dual processor cards do not support non-buffered memory modules (UDIMMs). Obviously, to achieve maximum memory capacity and maximum speed of its operation, LRDIMM DDR4 modules are needed.
Hynix HMTA8GL7AHR4C-PBM2: RAM for the server, memory capacity: 64 GB, bandwidth: PC12800, type: DDR3 LRDIMM.
Kingston KVR16LL114 / 32 - DDR3L memory module, 32 GB capacity, LRDIMM form factor, 240-pin, 1600 MHz frequency, ECC support, CAS Latency (CL): 11. The average price of such a module is 28 thousand rubles. 
Samsung DDR4 2133 Registered ECC LRDIMM 32Gb memory module. The average price is about 22 thousand rubles. This is a 288-pin LRDIMM module with a frequency of 2133 MHz. There is support for ECC, CAS Latency (CL): 15. 
Samsung 32GB 288-Pin DDR4 SDRAM DDR4 2133 Memory Module (PC4 17000) Server Memory Model M386A4G40DM0-CPB, Cas Latency 15. 
In general, LRDIMM modules allow up to 35% higher throughput of RAM compared to standard RDIMM modules. 
The use of LRDIMM will have the greatest effect for applications that use memory intensively, cloud computing and HPC (high-performance computing) tasks, when you need to load into RAM and process large amounts of data. In a virtual environment, this makes it possible to increase the “density” of virtual machines. In data centers - to increase energy efficiency and reduce TCO (Total Cost of Ownership). 
Alternative? 128GB LRDIMM!
The technology does not stand still and Samsung introduced new LRDIMM memory modules with a capacity of 128 GB. They use chip packing technology called TSV (Through Silicon Via) - DRAM chips are connected vertically using electrodes through microscopic holes, like they did on 3D VNAND. 
The TSV DDR4 DRAM memory in 128GB RDIMM modules is considered a true technological breakthrough. Its advantages are doubled capacity, high speed and efficiency in comparison with the previous standard modules. Thanks to the 20-nm process technology, the 128GB TSV DDR4 memory reduces power consumption by 50% compared to 64GB LRDIMM modules. It remains to clarify the issue price. 
Practical use
128GB in a server with 8 memory locations can be assembled on DDR3 RDIMM at 16GBx8, that is, 9000 rubles * 8 = 72000 rubles. On LRDIMM, these are two levels for 64GB of 30500 rubles each, that is, costs will amount to 61,000 rubles, which is cheaper than a traditional solution. Moreover, now there is no special reason to overpay for motherboards with 16 memory slots - 99% of servers can be assembled on 8-slot motherboards. This goes 512GB of memory to the standard X9DRL. 
So far, large 64GB DDR4 LRDIMMs cost 75000r each (64GB PC17000 LR M386A8K40BM1-CPB0Q SAMSUNG memory module in ELCO). If you put on 32GB, then the price of LRDIMM DDR4 at 21000r per share is 84000r for 128GB, which is a little more expensive than regular register memory. 
All this allows us in HOSTKEY to donate large dedicated servers even cheaper, reduce the price of virtual machines and make private clusters even more reliable and for less money. 
A little bit about HOSTKEY
 Since 2008, we have been renting dedicated and virtual servers for rent, providing server hosting services in 4 Moscow data centers, including two Tier-III certified data centers. We specialize in large dedicated servers and the creation of private clouds and clusters for our clients based on them.
 Since 2008, we have been renting dedicated and virtual servers for rent, providing server hosting services in 4 Moscow data centers, including two Tier-III certified data centers. We specialize in large dedicated servers and the creation of private clouds and clusters for our clients based on them. 
We have a hot offer for our readers: Servers available on the basis of T-Platform supercomputers and Intel Xeon E5-2630v2 processors at a 15% discount until the end of December (or until they run out) when using the TMW5U0S8SE promo code 
For example, for comparison: 
 - 2xE5-2630v2 (12x2,6 GHz) / 64Gb RAM / 1x1Tb SSD + 1x1Tb 7.2K HDD = 17000r per month, with a discount of 14450.
 - 2xE5-2630v2 (12x2,6 GHz) / 64Gb RAM / 1x1Tb SSD + 1x1Tb 7.2K HDD = 17000r per month, with a discount of 14450. 
- 2xE5-2630v2 (12x2,6 GHz) / 128Gb RAM / 1x2Tb SSD + 1x2Tb 7.2K HDD = 25700r per month, with a discount of 21800 
- 2xE5-2630v2 (12x2,6 GHz) / 256Gb RAM / 2x2Tb Samsung SSD = 36500r per month, with a discount of 31000 
- 2xE5-2630v2 (12x2,6 GHz) / 32Gb RAM / 2x600Gb SAS 10K = 13650r per month, with a discount of 11600r 
All prices include VAT, almost any configuration is possible. 
All servers are connected on a gigabit channel, the traffic limit is 10TB without restrictions. Each dedicated server is provided with remote access via IPMI, it is possible to organize a VLAN at a speed of up to 10Gbps.