📜 ⬆️ ⬇️

This is not forgotten - Oracle Database In-Memory

The volume of databases and the complexity of queries to them always grew faster than the speed of their processing. Therefore, the best minds of mankind for many years thought about what will happen when the RAM becomes so much that it will be possible to take the entire database and put it into the cache of RAM.

In recent years, the logical moment for this, it would seem, has come. The cost of RAM fell, fell, and fell completely. At the beginning of the century, it seemed that 256 MB of memory for the server is normal, and even a lot. Today, we won’t be surprised by the parameter of 256 GB of RAM on an entry-level server, and with industrial servers in general, full communism has arrived, any noble don can type at least a terabyte of RAM on the server if he wants.

But this is not only the case - new data indexing technologies have appeared, new data compression technologies - OLTP compression, compression of unstructured data
(LOB Compression). In Oracle Database 11g, for example, there was the Result Cache technology, which allows you to cache not just table rows or indexes, but also the results of queries and subqueries themselves.

That is, on the one hand, you can finally use the RAM for its intended purpose, but on the other hand, everything is not so simple. The larger the cache, the greater the overhead for its maintenance, including CPU time. You put more memory, increase the amount of cache, and the system runs slower, and this, in general, is logical, because the memory management algorithms developed in the Early Middle Ages by our great-great-grandparents are simply not suitable for the Renaissance, and that’s it. What to do?
')
And that's what. Let us recall that there are, in fact, two categories of databases: line databases, which in the buffer cache in the RAM and on the disk store information in a line format - Oracle Database, Microsoft SQL Server, IBM DB / 2 , MySQL, etc .; and column DBMS, in which information is stored in columns, and which, unfortunately, did not find a large distribution in the industry. Inline databases handle OLTP operations well, but analytics are more suitable for processing, you will laugh, column databases - but DML operations are a problem for them, you understand why. Industry, as you know, has taken the path of line databases, to which analytical capabilities are hung in the form of a compromise.

And so, the Oracle Database In-Memory technology has appeared, in which the advantages of both approaches are finally combined.

And what happens?


It turns out fantastic. Transaction processing is accelerated twice, inserting rows is 3-4 times faster, requests for analytics are executed in real time, almost instantly! Marketers say that the analyst has become a hundred times faster, but they are modest in order not to frighten the market, the real results are much more impressive.

And now let's understand how and what it works.

So, the technology appeared in the version of Oracle Database 12.1.0.2, and its meaning is that next to our usual buffer cache, which stores rows of tables and blocks of indexes, there is a new cache, more precisely a new shared area for data in RAM, in which data from tables are stored in column format! Do you understand, yes? Both inline and column storage format for the same data and tables! Moreover, the data is simultaneously active and transactionally consistent. All changes, as usual, are first made in the usual buffer cache, after which they are reflected in the column cache, or, as our English-speaking friends call it, the “columnar” cache.

Some important details. Firstly, only tables are reflected in the cache cache, that is, indexes are not cached - this is the first. Secondly, technology does not do unnecessary work. If the data is read, but does not change, then there is no need to store it in the normal buffer cache, that is, in the lowercase buffer cache. But if the data changes, then they should be stored in both caches, buffer and column. Well, and, accordingly, the analyst works faster because for it the columnar representation of information is more effective. This is the second. And thirdly - once again, to make it clear. The column cache does not store data blocks from disk. In blocks on a disk the information is stored in the lines. In the column cache information is stored in columns, in its own representation, in the so-called In-Memory “wrap units”. This is the third.

And now the details


We realized that the analyst works hundreds of times faster, because the column representation is more effective for her — and, strictly speaking, why?

In the usual buffer cache information is stored in rows. Here is an example - from the four-column table you need to remove the column number 4. To do this, you will have to completely scan all this label in RAM:
image
And what happens if the same table is stored in a column format? The entire fourth column of our plate is in one extent, i.e. in one memory block. We can immediately select it, immediately read it and return it to the application. Costs for scanning, sending this data to the processor are reduced, and CPU usage is reduced. Everything works much faster.

image
Such scanning operations are very typical for ERP-applications, for data warehouses in analytical systems. Agree, the right thing for the progress of mankind.
Technically, in order to run this, you need to enable caching for the desired columns in the table. To do this, use the special syntax extension for the ALTER TABLE command:

 SQL> ALTER TABLE cities
  INMEMORY
 INMEMORY (Id, Name, Country_Id, Time_Zone) 
 NO INMEMORY (Created, Modified, State);
 Table altered.

This is done once, the information is recorded in the Oracle DBMS system dictionary, and then automatically used by the database in the course of its work. In the above example, the service columns are not involved in the reports, they are needed only for internal audit of the application, and therefore we do not cache them.
You can specify caching on all columns for the materialized view:

SQL> ALTER MATERIALIZED VIEW cities_mv INMEMORY

Materialized view altered.

You can enable caching at the level of the entire table space:

 SQL> ALTER TABLESPACE tbs_data DEFAULT INMEMORY
 MEMCOMPRESS FOR CAPACITY HIGH
 PRIORITY LOW;
 Tablespace altered.

Or, you can flexibly cache tables by column at the section level to link the caching strategy to business rules:

 SQL> CREATE TABLE customers .......
 PARTITION BY LIST </ b>
 PARTITION p1 ....... INMEMORY,
 PARTITION p2 ....... NO INMEMORY);

For example, we have historical data, and they are partitioned by date. When a period is closed for, say, an operating day, the data in this section of the table does not change, and an analyst can start working on them. For this you need caching for sections of the table for which the period is closed. And for data on which intensive operations of change and deletions go, there is no need to turn on caching.

Behind the scenes


What happens when we enable caching and the information is written to the Oracle DBMS dictionary? The magic happens - the SQL optimizer rebuilds the query plan. The first time a request comes from an application, at the stage of the so-called hard parse, a query execution plan is generated.

image

In this example, the total number of rows in the CITIES city directory table is calculated. The optimizer sees that it is advantageous to perform a query on the column view, and performs a TABLE ACCESS INMEMORY FULL scan. For an application, this is completely transparent; no rewriting or modification of the application is required!

The Oracle database itself uses a number of interesting optimization techniques:

1. Each processor core scans its own column. It uses fast vector SIMD instructions, i.e. special processor instructions that use not a scalar value as an argument, but a vector of values. It scans billions of lines per second.
2. We do not just scan the data. The data is scanned and joined from several tables. On the column view, this is much more efficient than regular joins. Such joins are performed on average 10 times faster .
3. In the course of query execution, the technology In-Memory Aggregation is used: a dynamic object is created in memory - an interim report. The object is filled during the table scan and allows you to speed up the execution of the query. As a result, reports are built 20 times faster without previously created analytical cubes.
4. In order not to clutter up the RAM, the compression of columns in the memory is used. There are six options:

• NO MEMCOPRESS - uncompressed
• MEMCOMPRESS FOR DML - optimized for DML operations
• MEMCOMPRESS FOR QUERY LOW - the best option, which is used by default
• MEMCOMPRESS FOR QUERY HIGH - optimized for query execution speed and memory saving
• MEMCOMPRESS FOR CAPACITY HIGH - optimized for query execution speed
• MEMCOMPRESS FOR CAPACITY LOW - optimized to save memory

For example:

 SQL> ALTER TABLE cities
 INMEMORY
 INMEMORY MEMCOMPRESS FOR CAPACITY HIGH (Country_Id, Time_Zone)
 INMEMORY NO MEMCOMPRESS (Id, Name, Name_Eng);
 Table altered.


In this example, data columns in which are often repeated are selected for compression, unique columns are not compressed.
The system attribute table USER_TABLES has a new segment attribute INMEMORY. The INMEMORY column also appeared in the system tables * _TAB_PARTITIONS. To find out what and how much is in the cache, you need to use a special system representation V $ IM_SEGMENTS:

image

In this example, we see that there are four tables in the cache, and each disk on the disk takes about 5 MB, and in memory, due to compression, it is from 100 KB to 1 MB. The POPULATE_STATUS column shows the status information. We see that the CITIES, COMPANIES, and AIRPORTS tables are already fully loaded into the In-Memory cache, and COUNTRIES is not fully complete, 400 KB is left to be loaded. That is, right now this table is being transposed into a format by columns and loaded into the cache.

First of all, priorities


From a technical point of view, reading in memory from a disk can occur in two ways:

• When you first access the data. This is the default option.
• Automatically after starting a database instance. This feature is enabled by setting the PRIORITY segment attribute.

In the second version, the reading is performed by background processes ORA_W001_orcl (W001 is the instance number), the number of background processes is adjusted using the new parameter INMEMORY_MAX_POPULATE_SERVERS. As a result, after the restart, the instance is immediately available to work in the background, and the start time of the instance does not increase. Of course, at the beginning of the load on the processor increases, where to go. But then analytical queries will work faster.

The priority of loading into the cache can be controlled, here are the options for the priority values:

image

Suppose we keep a directory of cities in the cities table, and this directory is constantly needed by all users, it constantly participates in reports. In this case, we must specify a critical priority for this table, thus we will force the system to automatically read an instance of this table in the cache at the start of the database:
 SQL> ALTER TABLE cities 
 INMEMORY
 PRIORITY CRITICAL
 INMEMORY MEMCOMPRESS FOR CAPACITY HIGH (Country_Id, Time_Zone)
 INMEMORY MEMCOMPRESS NO (Id, Name, Name_Eng);

 Table altered.

What about OLTP?


As you well know, there are almost no pure OLTP systems. In any OLTP-application there is support for reporting, and additional indexes are needed for reporting. What are additional indexes? This is nothing but the extra overhead of inserting data.

image

And now let me proudly inform you that when you switch to Oracle Database In-Memory, this problem is also solved, because this technology is right! - indexes are not used. Those. we can simply remove the indices that we need for analytics, and we get a paradoxical effect - a system designed to increase the speed of data warehousing, perfectly “overclocks” and OLTP applications.

image

At the same time, the operations of Oracle Database In-Memory technology do not interfere with and do not attempt to help (the principle “you don’t have to repair something that didn’t break,” is in operation!) Because it is in the database engine say aside. That is, this technology does not affect the format of the data on the disk; everything works in the same way as before, no matter what file system you use. Data files do not change, the log, backup, data recovery still work. All technologies, including ASM, RAC, DataGuard, GoldenGate - work transparently.

Container architecture


The major architectural innovation in Oracle Database 12c is container architecture. Oracle Database In-Memory fully supports this architecture. The INMEMORY_SIZE parameter is set at the level of the entire container database, and at the level of specific databases it can be varied depending on the specific application. For example, at the container database level, you can set INMEMORY_SIZE to 20 GB, and at the container level, you can not enable the cache for ERP, for CRM, set the cache size to 4 GB, for the data warehouse, 16 GB.

image

Cluster architecture


Yes, and in Oracle Real Application Cluster clusters, this also works. You can control the distribution of objects in the in-memory cache between nodes in a cluster. For example, you can specify the DUPLICATE option, then when the cache changes on one of the cluster nodes, they will be automatically synchronized with the second node, this is to ensure that there is always an available copy of the cache with “warmed” columnar data:
 SQL> ALTER TABLE cities INMEMORY
 DUPLICATE;

 Table altered.

Other options:

• AUTO DISTRIBUTE - cache synchronization is managed by the DBMS (used by default);
• DUPLICATE ALL - the same cache is synchronized on all nodes of the cluster;
• DISTRIBUTE BY ROWID RANGE;
• DISTRIBUTE BY PARTITION;
• DISTRIBUTE BY SUBPARTITION.

The DUPLICATE and DUPLICATE ALL options work only on Oracle Exadata and Oracle SuperCluster, this option is ignored on a regular server. The remaining options are needed for more flexible control - for example, using the DISTRIBUTE BY ROWID RANGE parameter, you can specify that part of the sections should be in a column view on one node, and the rest on another node.

Summary


I can no longer hide from you the complete syntax of the ALTER TABLE INMEMORY command. Here he is:
 SQL> ALTER TABLE cities 
 INMEMORY
 PRIORITY CRITICAL </ b>
 DUPLICATE </ b>
 INMEMORY MEMCOMPRESS FOR CAPACITY HIGH (Country_Id, Time_Zone)
 INMEMORY MEMCOMPRESS NO (Id, Name, Name_Eng)
 NO INMEMORY (Created, Modified, State);

 Table altered.

You can specify the priority of loading into the cache, the way the cache is synchronized between the cluster nodes, the names of the columns that need and do not need to be cached in memory, the degree of compression. The command, as I already wrote, is executed once, then the information is remembered.

For real engineers, it is possible to fine-tune their queries using the SQL optimizer hints: INMEMORY, NO_INMEMORY, INMEMORY_PRUNING and NO_INMEMORY_PRUNING.
NO_INMEMORY, as you can see, is the simplest hint here. For example, you can explicitly instruct the optimizer not to use In-Memory technology - if you are sure that this is simply not necessary, because you have a good query, indexes are built, etc. Two more interesting hints are INMEMORY_PRUNING and NO_INMEMORY_PRUNING, they control the use of storage indexes. The storage index stores the minimum and maximum value of a column in each cache memory extent and transparently eliminates unnecessary column scans, for example: WHERE prod_id> 14 AND prod_id <29.

There are new parameters in the initialization file INIT.ORA, I donate them to you for free, that is, for nothing:

• INMEMORY_SIZE
• INMEMORY_FORCE = {DEFAULT | OFF}
• INMEMORY_CLAUSE_DEFAULT = [INMEMORY] [NO INMEMORY] [compression-clauses] [priority-clauses]
• INMEMORY_QUERY = {ENABLE | DISABLE}
• INMEMORY_MAX_POPULATE_SERVERS
• OPTIMIZER_INMEMORY_AWARE
• INMEMORY TRICKLE REPOPULATE SERVERS PERCENT

INMEMORY_SIZE allows you to specify the size of the memory area for columnar data, the default is zero. For example, INMEMORY_MAX_POPULATE_SERVERS is the number of background processes that read data from disk to cache, by default it is equal to the number of processors that the Oracle Database "sees". Another interesting parameter is OPTIMIZER_INMEMORY_AWARE, with its help you can specify whether the In-Memory cache optimizer sees or does not see. For example, this is needed if you need to estimate the overhead. Details suggest you find the documentation.

Oracle Database In-Memory is most suitable for applications in which there are many queries that scan many rows with filters such as: “=”, “<”, “>”, “IN”. The technology is very effective when an application requests only a few columns from a large number (typical of SAP), it connects large factor tables with dimension tables, and filters by dimension tables. Accordingly, these are applications such as data warehouses, information analysis systems, including OLTP applications. By the way, there is a useful additional product - Oracle Database In-Memory Advisor, it helps to assess the applicability of Oracle Database In-Memory technology to specific applications. Oracle Database In-Memory Adviser analyzes database performance statistics and provides recommendations for memory size, the type of tables that need to be cached in the In-Memory cache.

It is important to understand that, unlike competitors, Oracle Database In-Memory does not require rewriting of applications. No restrictions on SQL, no data migration is needed, the technology is ready for the cloud.

Do not confuse Oracle Database In-Memory and Oracle TimesTen In-Memory Database, these are different technologies. TimesTen is an embedded database for applications, it is designed for pure, not mixed, OLTP systems, for those cases where the application should work in real time and the response time should be literally seconds, not seconds and not milliseconds. TimesTen fully loads all data into RAM. In contrast, Oracle Database In-Memory is an extension of the classic Oracle DBMS, is located inside its core and expands its capabilities in terms of accelerating analytical queries with a column view.

I think I wrote enough to awaken your irrepressible desire to read the documentation for Oracle Database In-Memory. But here the thing is - the technology is new, there is little expertise on it, I cannot answer all your questions. Therefore, friends, sign up for our online trainings. We will definitely announce them right here. And I have everything for now.

Source: https://habr.com/ru/post/257745/


All Articles