One of the most important news for Oracle in 2015 was the release of the new SPARC M7 processor and a line of servers based on it. This line includes T-series servers (T7-1, T7-2, T7-4) and M-series servers (M7-8, M7-16).
In addition to the unique physical characteristics (frequency 4.13 GHz, 32 cores, up to 256 threads), the M7 processor is able to transfer part of the Oracle database SQL-logic to special DAX (Data Analytics Accelerator) coprocessors. This technology was named “SQL in Silicon” - with it, the new M7 processor is positioned as the first processor in IT history, including the Oracle Database optimized for tasks.
At the beginning of 2016, testing of T-series servers became possible, and we were one of the first in Russia to simultaneously test two T7-2 test servers at once (two M7 processors in each).
Testing was organized in several stages:
- analyzing the performance of the new SPARC server using synthetic benchmarks using the Oracle Database,
- A study of the virtualization capabilities of the T7-2 server (see Virtualization on Oracle SPARC T7-2 - our test results)
- study of technology "SQL in Silicon".
The purpose of this post is to share with the community the results obtained in the first and third stages.
')
There is a wide range of synthetic benchmarks on the market using Oracle Database. Moreover, no matter how paradoxical it sounds, the choice of synthetics can largely determine the result. As in the case of testing the previous generation SPARC (servers of the T5 line), we set out to find out what kind of maximum performance you can squeeze out of the server - to determine the moment when it “boils” under the load of the Oracle Database.
For this kind of research, classical SwingBench GUI synthetics are not suitable - the test software must have open source in order to be able to minimize the impact on the test results of the I / O system and the internal database mechanisms (in this case, the Oracle Database). In the course of solving this task, we selected and significantly improved the open source software SLOB (Silly Little Oracle Benchmark, by Kevin Closson). In the course of the study, we fixed the maximum values ​​of the Logical Reads Per Second (logical reading or reading from memory) of the Oracle instance with a fully heated cache and parallel sessions of the SLOB with almost no competition for the instance resources. The intensity of logical reads from memory is an important characteristic of the new processor, and the reading of data from memory itself is an integral part of the operation of the Oracle Database.
Fig. 1. Results in domains of 32 cores (T5 vs T7)The figure shows the comparative results obtained on the T5-4 servers (previous generation of SPARC processors) and T7-2 (new SPARC M7 processors). The results are recorded in the same processor power domains (32 cores) with the same versions of the Oracle Database and instance settings. On the X axis, the number of parallel sessions of the finalized SLOB is postponed, on the Y axis, the maximum number of logical reads per second according to AWR statistics.
It can be seen from the graph that saturation (when the server “boils”) occurs when the number of SLOB sessions is compared with the number of domain flows (32 cores with eight threads each - 256). It can also be seen that, with the same number of cores, the T7 server domain was 1.15–1.2 times more productive than the T5 server domain. This means that the new M7 processor (in which twice the number of cores) is 2.3–2.4 times more productive than previous generation processors from Oracle. Note that this result is fixed on the T7 server both in the control (control) and in the guest (guest) domain. At the same time, a large number of sessions (192 and more) in the guest domain of the T7 server is noticeably influenced by virtualization: performance is 3-5% lower than in the control domain under the same conditions.
The maximum value of the Logical Reads Per Second indicator under the load of SLOB was fixed at 512 threads - when all 64 cores were given to the control domain. This value was 93–95 million logical reads per second - for the entire time of testing servers of various architectures under the load of the Oracle Database, we received such figures for the first time!
Fig. 2. Comparative results in the domains of 16 cores (T5 / T7 / P8 / x86)In parallel with testing the T7-2 server using the same methodology, actual servers of the IBM Power and x86 architecture were tested. The figure shows the comparative results of the SLOB tests obtained in domains with the same number of cores (16). Notice that the x86 server has 2 threads per core each, and Power and SPARC have 8 threads each. On a large number of SLOB sessions, the result of the T7-2 server (about 24 million logical reads per second) turned out to be the best - despite the fact that on small numbers of sessions, the x86 architecture performed most efficiently.
The results of the synthetic tests SLOB allow us to conclude that even without the special features of SQL in Silicon, the T7-2 server shows very high performance on Oracle Database tasks and can be recommended at least as an Oracle Database consolidation platform. These are the brief results of the first stage of our research on the M7 processor.
As for DAX (or “SQL in Silicon”), it can be explored in various ways. First, the DAX API is open and can be directly used in the application. This approach is described in our article
Hardware Acceleration of Corporate Computing — using DAX, we managed to speed up operations with mathematical sets by a factor of 5-6.
Secondly, it is possible to test how the technology “SQL in Silicon” speeds up queries to the database. Today, this is only possible in Oracle Database 12c and only when using the In-Memory option. Therefore, it will be correct to recall what this option is.
Most databases use string data storage (Row Database) both on disks and in memory. At the same time, there are and are actively developing on the market Database that implement column storage (Columnar Database). It is generally accepted that the Row Database is optimal for transactional systems of the OLTP class, and in DWH class storages certain analytical queries can work much faster with the Columnar Database.
Appeared in Oracle Database 12c, the Database In-Memory option implements column data storage in memory in addition to the traditional string. Such storage is possible due to an additional memory area (In-Memory Cache), in which the Oracle Database administrator can cache data of entire tables, as well as individual columns or partitions in column format. Such additional column data storage in memory is transparent to the application, while the Oracle Optimizer has the opportunity to select the necessary data from the memory both in the string and in the column view. We can say that using In-Memory in Oracle, a unique combination of inline and column data storage is implemented.
We have repeatedly introduced the community to the results obtained in our testing of In-Memory work, in particular, we have developed a methodology that emulates the operation of DWH class systems. Randomly generated data on Europeans and their salaries were added to the Oracle database table (persons table, about 20 million records), all European countries were added to a separate reference table (countries table):
create table persons ( id not null number(38), country_id number(38), name varchar2(50), salary number(36) ); create table countries ( id not null number(38), name varchar2(20) );
The role of the analytical query was played by the SQL calculation of the sum of all the salaries of the residents of countries starting with R (these are Russia and Romania):
select sum(salary) from persons where country_id in (select id from countries where name like 'R%');
When working with In-Memory in the In-Memory cache, two columns of the persons table — country_id and salary: “rose”;
alter table persons inmemory no inmemory (id, name) inmemory memcompress for query high (country_id, salary);
When using the traditional buffer cache (after warming up), this request was processed on the T7-2 server in 1.9 seconds (note that the In-Memory mechanism was turned off by the hint). When using the In-Memory cache on the same server, it takes 0.39 seconds or 4.8 times faster. DAst monitoring of the busstat utility showed that during the execution of the request the counters were not zero - i.e. DAX worked:
5 dax0 DAX_SCH_query_cmd_sched 80 DAX_QRYO_input_valid 694586 DAX_QRYO_output_valid 1530256
5 dax1 DAX_SCH_query_cmd_sched 80 DAX_QRYO_input_valid 754450 DAX_QRYO_output_valid 2017088
5 dax2 DAX_SCH_query_cmd_sched 79 DAX_QRYO_input_valid 522635 DAX_QRYO_output_valid 758496
5 dax3 DAX_SCH_query_cmd_sched 79 DAX_QRYO_input_valid 672683 DAX_QRYO_output_valid 1529568
5 dax4 DAX_SCH_query_cmd_sched 79 DAX_QRYO_input_valid 589392 DAX_QRYO_output_valid 1073248
5 dax5 DAX_SCH_query_cmd_sched 79 DAX_QRYO_input_valid 635264 DAX_QRYO_output_valid 1502832
5 dax6 DAX_SCH_query_cmd_sched 79 DAX_QRYO_input_valid 615433 DAX_QRYO_output_valid 1080257
5 dax7 DAX_SCH_query_cmd_sched 80 DAX_QRYO_input_valid 810452 DAX_QRYO_output_valid 2295872
Thus, this analytical query was partially performed on DAX coprocessors and we recorded almost 5-fold acceleration of its work due to the integrated In-Memory + DAX technology as compared to the work of Oracle Database using the traditional buffer cache. Note that we did not specifically select the query or the size of the persons table — we implemented a technique on T7-2 that we used to study the operation of the In-Memory option earlier. In this case, the fivefold acceleration is more than a decent result, especially since it fits well with the conclusions that we made earlier when testing DAX via the API (see
Hardware acceleration of corporate computing ).
Upd. Colleagues, when placing messed up images, the error is eliminated.