Oracle DB for programmer

Does an application programmer need to understand how a database works? Tom Kite, a recognized Oracle specialist, the author of the famous asktom column, in his book Oracle for Professionals. Architecture and main features. ”Insists that it is just necessary. Even if your team has a competent administrator, knowing how the Oracle DBMS works will help you better understand each other and interact more effectively, not to mention the case when you do not have such a specialist. In this topic, I will mention the main things, the understanding of which will allow you to competently work with the Oracle database and use some of its features with great impact for your application. If you have already read the aforementioned book by Tom Kite, then you can simply use this article as a memo. One remark - I read the book a long time ago, and then the latest version of the Oracle database was 9i, I also took courses in administration on the nine, so that if something changed and added in the top ten or higher, then do not blame me. Although I am writing about quite fundamental things that are unlikely to change much.

What makes an Oracle database work so fast?

When you change data in the database, your changes first go to the cache, and then asynchronously in several threads (the number can be configured) are written to disk. Synchronously, special logs are written (an operative log file) so that it is possible to recover data after a failure, if they have not yet managed to flush the cache to disk. This approach allows you to win in speed, since in this case everything is written to the disk sequentially in one file, and you can configure it to write in parallel on two or more disks, thereby increasing the reliability of the protection against loss of changes. There should be several described files and they are used in a circle: as soon as all the data protected by one of the log files was recorded by the background process into data blocks on a disk, then this log file can be reused. Thus, to some extent, this also saves money by having ultra-fast small disks only for small journal files used in a circle.

I usually talk about this when I am offered to save something just to a file on the disk, since it will be “faster” due to the fact that we will write all the data sequentially and the head of the hard disk will not have to run and look for random blocks. I still insist that we will not win anything here, because we will write to a slow disk, which soon is actively used by many other processes to write a huge number of different logs, and Oracle also writes synchronously to itself on a disk only sequentially, as I described above.

Data recovery mechanism

In Oracle DBMS, you can enable archiving of the above-described online log files, and all changes will be archived. Thus, if we lose any disk with data blocks, we can restore them to any point in time, including the moment right before the crash, rolling the current archived logs on the latest archived log files.
')

Stand by copy

The above archive files can be sent over the network and applied to a copy of the database on the fly. This way you will always have a hot copy with minimum data latency on hand. In some applications where there is no need to show data up to the last moment, you can configure such a database only for reading and unload the main database instance, and there may be several such instances for reading.

Hanging some write requests

If some of your requests hang at an arbitrary point in time, it is worth looking in alert.log for the presence of incomplete checkpoint. This suggests that your operational log files are too large or too small, so the data protected by them does not have time to be flushed from the cache to disk, and the DBMS has filled all the available operational log files and wants to use them in a circle again, what to do by no means impossible, here comes a pause. Although if your application runs on java, then first of all I would look at the availability of Full GC in the logs.

Non-blocking read and rollback segment

One of the most remarkable features of Oracle DBMS is non-blocking read, which is achieved through a rollback segment. Read requests to Oracle are never blocked, since data can almost always be read from the rollback segment.

The rollback segment provides yet another bonus: from it, you can try to read a little bit of the outdated data for some table that was at some point in it. This feature is called flashback .

However, sometimes the rollback segment can put a pig in: if you have a big job for bulk deleting data (deleting generates all the more data in the rollback segment), then you can get ORA-01555: snapshot too old . The main thing to remember in this case is that you do not need to rewrite your job so that it commits every N operations, but you need to use a separate specially created rollback segment for such operations.

Transaction Isolation Levels

There is no isolation level READ_UNCOMMITED in Oracle. The fact is that in other databases it is used to achieve maximum parallelism by removing read locks. But in Oracle, reading is always performed without locks, so we already have all the advantages that this level can give without imposing any additional restrictions.

In general, only two levels of isolation are clearly available in Oracle: READ_COMMITTED is used by default, but you can set SERIALIZABLE if you wish.

However, at the statement level (SELECT, UPDATE, etc.), by default you already have REPEATABLE_READ, i.e. within one operator, you always get a consistent reading, which is achieved of course due to the rollback segment. I have always liked the example cited by Tom Kite to describe what it gives. Suppose you have a very large table with invoices and you perform a SELECT to get the amount. In Oracle, unlike many other databases, even if in the middle of your request another transaction transfers some amounts from the first account to the last one, you end up with data relevant to the beginning of your request, because when you reach the last line, your SELECT will see that the line was changed, go to the rollback segment and read the data that was in this cell at the time the query was started. In many other databases, you will receive a response in the form of a sum that never exists in your table. However, in Oracle in this case there is a danger of getting ORA-01555: snapshot too old .

In addition to the standard isolation levels in Oracle, there are also so-called READ_ONLY transactions that give REPEATABLE_READ throughout the entire transaction, and not just within a single statement. But as the name suggests, in such a transaction you can only read.

Let Oracle cache your data efficiently.

In Oracle, all data is read-written not directly to disk, but through a cache. By default, the cache is based on the LRU algorithm, so if you read a very large table by identifier in large quantities, requesting a new line each time, such requests can force out a small static table from the cache, which would always be the cutest thing in the cache. For such purposes, when creating a table, you can specify a special kind of cache where requests to your tables will go. So for the first table in the above example, the RECYCLE cache is suitable, which essentially does not store any data, but immediately throws it out of the cache. And for the second table, the KEEP cache is suitable, which allows storing small static tables in the cache and queries to all other tables will not force the static table data out of the cache.

Blank lines

In orakl there is one very interesting feature, from which they now can never get rid of. The fact is that if you put an empty string in the database, it will be saved as NULL. Thus, on subsequent reading, you will never get an empty string, only NULL. Also keep in mind that for the same reason empty lines do not fall into the index, so if you make queries whose execution plan will use the index, then you will never receive your empty (or rather NULL) lines, but about this a little bit later.

Indices

In addition to the well-known indexes in the form of B-trees, there are also so-called bit indices in Oracle, which show very high performance on queries to tables with columns with very sparse values. Particularly effective in this case will work queries (compared with the usual index) in which there are complex combinations of OR and AND to the discharged columns. This index is not stored in the B-tree, but in the bitmap, which allows the quick execution of the described queries. The question of the number of unique values in the table for which this index will still be more preferable is very complicated: it can be either 10 unique values or 10,000. Here you need to create an index on a specific table and see what happens. The main thing is not to try to use this index on tables with a large number of inserts and updates of the indexed column, since such operations will block fairly large areas in the indexed table and your system may get a stake or even catch a deadlock.

One of the things that I have always been very pleased with in Oracle is the ability to create an index by function. Those. if you have to use a function in queries, then you can build an index on it and significantly speed up the read operation.

Another interesting property of indexes that you need to know about is that the index does not store NULL values. Thus, if you make queries with the condition <,> or <> on the column being indexed, then you will not get back the answer lines with the value NULL in the column being indexed. On the other hand, this property can very effectively use the day of some specific cases. For example, you have a very large plate in which orders are stored, which is never cleaned. And there is a background process, which is obliged to send all orders to any backoffice system. The first solution that suggests itself is to start another column with the is_sent flag, where 0 is initially set and we will put down 1. When sent out. the background process at each start will make a query to the table with the condition is_sent = 0. You cannot use a bit index here, since the plate is very actively replenished. A regular B-tree-based index will take up a lot of space, since you need to store references to a huge number of lines. But if we slightly change our logic and put a NULL instead of 1 in the is_sent column, we will have a tiny index, since at any moment only non-NULL values will be stored in it, and there will be very few of them.

Tables are different

In addition to the usual tables in oracle, as in many other databases, there are so-called index tables, when these tables directly lie in the index key of the primary key. Thus, two things are accomplished at once: firstly, you have less readings per reading for the primary key, secondly, the data in the table is ordered by the primary key, so the ORDER BY PK operation will be performed without additional sorting. The disadvantages include the fact that you can no longer distinguish between logging in the online log files of this index.

Another great type of table is cluster tables, which allow you to store data from two or more tables clustered by the same key value in a single data block. This can be very effective if you always use some tables together.

On the basis of clustered tables, there are also clustered hash tables, in which a table based on a hash of a cluster key is used for access instead of a B-tree. It sounds, of course, very interesting, but, to be honest, I have never encountered it in practice.

Variable binding

Probably every programmer has already heard about this, but I still mention such a mandatory technique as the binding of variables. The fact is that for each unique request a parse plan is built and placed in the cache. If there are a lot of different requests, such as a very common request by ID, then each plan will generate its own plan, besides, they will force out all other plans from the cache, which can increase the response time of your database several times.

It is also worth noting that you should not abuse this and use the binding for columns with a small number of different values, such as the is_deleted flag, because there will not be so many different queries in this case, and perhaps for a more specific query, the DBMS will be able to build more effective plan.

A couple more notes for the programmer

If your column is of type VARCHAR2 (100), then trying to push the string longString.substring (0, 100) there is not a fact that will succeed, since the limit of 100 in the default column definition refers to the number of bytes, not characters, therefore in the presence of double-byte characters you can be trapped. In fact, this behavior can be slightly configured, you can read more here . It’s good if you don’t try to perform an insertion in an infinite loop, by the principle you don’t have to do it yet, because this “work” in this case will never come.

Well, the general recommendation for all types of databases: never update all columns in the table when changing one object field. It seems very obvious, but as practice shows, this anti-pattern often takes place, so I strongly recommend checking that your frameworks do UPDATE only of really changed fields.

Conclusion

I tried to describe most things that in my opinion can be useful for a programmer. Since there are quite a lot of them, I only identified them, often without going into details. How specifically to make the necessary settings, you can always read in the aforementioned book of Tom Kite, find in the asktom column or just google. The main thing to know is to google, and, I hope, this topic has suggested this to you.

Source: https://habr.com/ru/post/120003/

All Articles