
Today we present a new configuration of dedicated servers: Intel Xeon E3-1270v3, 32GB RAM, 2x240GB SSD. Behind these brief numbers are really great opportunities. Consider them in more detail.
The new configuration uses the latest Intel development, the Xeon E3 processor based on the Haswell architecture. Processors of the Haswell family are manufactured using a 22-nanometer process using three-dimensional transistors (Tri-Gate technology).
Among the innovations, we should mention, first, the support of the AVX2 and FMA3 instruction sets, due to which the processor can perform addition and multiplication operations within one clock cycle. In theory, this leads to a significant increase in productivity. To use these instructions, you need to update or at least recompile the code.
')
Secondly, Haswell processors are also characterized by extended bandwidth to the L1- and L2-cache, which can significantly speed up data access and, therefore, application execution.
Thirdly, the new processors implement hardware support for transactional memory. Many experts call this innovation the most non-trivial expansion of the x86 architecture in recent years, and should be discussed separately.
Transaction memory
All programs have variable memory areas that store their data. If several control threads work with this data, then the work should be organized so that there are no problems with parallel access (such as reading a memory area that is written in parallel from another stream, or writing from two streams at the same time).
Most multi-threaded applications use block-based synchronization to prevent these problems. Before any access to the data, it must be locked. While one thread is modifying data, other threads are waiting for this lock to be released. To ensure the parallel operation of several threads, it is necessary for each more or less dependent part of the program data to set a lock. Implementing it in practice, however, is very difficult.
An alternative to lock-based synchronization is to use transactional memory. The transactional memory methods work in the following way: a thread completes changes to shared memory without taking into account what other threads are doing, and registers any read or write in the log file. After the completion of the full operation, the reader checks whether other threads have made changes to the memory that was accessed earlier. If a transaction cannot be completed due to a change conflict, it is aborted and re-executed until it is successfully completed. The advantages of this approach are obvious: no stream needs to wait for access to a resource, and various threads can simultaneously modify non-intersecting data structures that would be protected by a lock.
Until recently, support for transactional memory could only be implemented in software. Software support for transactional memory is a very complex and time-consuming task, which far from every programmer can do. The new extension of the x86 architecture allows us to solve many problems at the hardware level and is an undoubted step forward.
Support for transactional memory in Haswell processors is implemented using the TSX (Transactional Synchronization Extensions) instruction set, which consists of two mechanisms: HLE (Hardware Lock Elision) and RTM (Restricted Transactional Memory).
The HLE mechanism allows for improved performance of multi-threaded applications with locks. It uses the XACQUIRE and XRELEASE prefixes. If the XACQUIRE prefix is ​​placed before the instruction prescribing to perform a blocked atomic operation, the blocking is released. The XRELEASE prefix placed before the same instruction returns the processor to the “normal” mode of operation, including blocking again. Of course, performing atomic operations without blocking is fraught with errors. The control logic monitors the occurrence of problematic situations: the code section that caused the error will be executed again, but with the blocking turned on.
The RTM engine uses the XBEGIN, XEND and XABORT prefixes. The XBEGIN instruction tells the processor to start executing a section of code that works with regions of memory accessed by unlocked program streams. All errors are detected by hardware, and control is transferred to the process at the address specified in the instruction. The processor automatically returns to the state it was in when the XBEGIN instruction started execution. The XEND instruction informs about the completion of the execution of a section of code that worked with transactional memory. If an error is detected programmatically, the XABORT instruction explicitly initiates the procedure for handling this error.
TSX is already supported in GCC v4.8, the latest version of Microsoft Visual Studio 2012, the latest version of the C ++ compiler from Intel, as well as the Glibc v2.18 library, which is widely used by linux applications. TSX allows for good scaling of multi-core processors without detailed lock configuration. The programmer does not even need to modify the program code: you just need to connect the appropriate library or recompile the code.
More possibilities
The new configuration is great for storage servers with intensive work with the disk subsystem. Each server is equipped with two 240GB solid state drives (SSD). Modern SSDs are characterized by short access times as well as high speed read / write operations. They can be used to host large databases and cache “hot” web storage data.
Servers of the new configuration are equipped with 32GB of RAM. This volume is sufficient for using sufficiently large in-memory databases, such as Redis, Memcached or Couchbase (they place data directly in RAM and periodically save the state of the database to disk). At the same time, classical databases will also get a performance boost due to intensive caching of requests in memory.
I want it already!
New servers are already available for order in Moscow and St. Petersburg. The rental price is only 7,500 rubles per month.
For those who can not comment on posts on Habré, we invite to our
blog .
PS
Thanks to the new graphical core, Haswell processor-based servers do an excellent job with the task of video transcoding on the fly and can be used, for example, as hardware platforms for video broadcasting and hosting. In addition, due to a more productive graphics subsystem, new processors can improve the performance of virtual desktop servers (VDI) and the density of client locations.
The Intel Xeon E3-1270v3 processor used in the new configuration does not have an integrated graphics core. If there are tasks in your work for which you can use the graphics core of the Haswell family, we are ready to provide you with a platform with an E3-1285v3 processor for a month. Instead, we will ask you to provide a test report, which we will share with everyone in our blog. You can leave a request with a short test plan through our
ticket system .