We read (and write) MyISAM directly

In the depths of the MySQL documentation on dev.mysql.com, I somehow found the mention of the fact that if MyISAM is used, you can get a 5-7 times increase in reading speed from the table if you read the data from the table yourself. For a long time I wanted to check this fact, and finally, I got around to trying it. What came of it, read under the cut

What type of data will we read from MyISAM?

The MyISAM format is, in fact, several formats (namely, several index storage formats and several types of data storage, for data the most common are fixed and dynamic). I was interested to consider the simplest case: when a table has a fixed record length, it does not contain NULL fields, and the type of queries that we are going to test is a simple SELECT * FROM tbl WHERE (condition) without the participation of indices and other tables ( what is called full scan).

What for?

Basically, I moved sports interest. Nevertheless, the results turned out to be quite good, so perhaps someone besides me will want to try to apply this method in practice.

How are we going to read?

Initially, I considered quite a few options how to read data from MyISAM, including using indexes. There is a pretty good description for reading directly from index files in the file myisam.txt, which used to be distributed with MySQL source codes, and now you can find it, for example, in MariaDB . For a program that wants to read MyISAM along with index structures, you will probably need to build a tangible part of MySQL. Well, that it needs to be done only once :).
')
I will consider a standalone application that does not depend on MySQL source codes and does not use indexes, but only reads .MYD files directly, in the simplest, fixed, format of strings.

Storage format in .MYD

The storage format for various MyISAM structures is very well described here: forge.mysql.com/wiki/MySQL_Internals_MyISAM . The format for fixed lines is so simple that it can be given directly in the text of the article:

1) .MYD consists of consecutive lines, and only from them. No service information in .MYD no
2) Each line has a header that contains information about NULL fields and a flag indicating that the line has been deleted. If there are no NULL fields, then the header length is 1 byte.
3) The fields in the string are in binary format, in the order in which the table fields are defined, in reverse byte order (little endian), without gaps

For different types of MySQL, the binary representation is different, and it is beautifully described by the link above.

How to read from .MYD

Assuming you're going to read data from .MYD from a C program, here are some interesting points from my implementation that relate to performance:

1) if the length of a single line is small (say, 10 bytes), then even when using fread () it’s worth reading not one line, but, say, 100 each — for my program it accelerated reading from the file about 2 times
2) most likely, access to individual fields will be unaligned, therefore, if your architecture is different from x86 (for example, PPC), then you should not use the following simple construction to pull out any particular field: * (int *) ( ptr) (ptr is a pointer to the beginning of the field, the field itself is int type)

In addition, it is necessary to attend to the issues of locking the table at the beginning of reading (this can be done using LOCK TABLES tbl READ). If this is not done, then there is a chance to run into the "dirty" data when reading (up to half-written lines, if not lucky). On the other hand, it is in this scenario of reading from MyISAM that it is quite safe to ignore locks and read from the table at the moment when someone writes there, with a minimum of side effects.

Did you say something about the record?

Yes, MyISAM with the fixed row format is so simple that we can create tables for MySQL ourselves and fill them with data. Unless, without indexes. But they are far from always needed, and if they are very necessary, you can later ask MySQL to create them via ALTER TABLE.

What we need

As you know, MyISAM tables consist of 3 files: .frm, .MYI, and .MYD. We know how to generate .MYD. It remains to generate .frm and .MYI. You will probably ask: why do we need .MYI at all? Are there not stored indexes (which we have not planned)? In fact, not only indexes are stored there. More on this later.

The easiest way to generate .frm and .MYI is to take these files from some “donor”, that is, another table. This table should not have indices, and this table should have a structure that corresponds to our .MYD file, otherwise MySQL cannot read such a table :).

If you simply copy the .frm and .MYI files of the finished table and write the necessary rows in .MYD, then most likely you will see that MySQL believes that the new table contains 0 rows. This should lead us to believe that things are not so simple in MyISAM :). In fact, there is one fairly well-known fact - in MyISAM tables there is information about the number of rows, which is located in one of the fields of the structure of this table. It remains to find this field and it's in the hat :)! Oddly enough, this field is not in the .frm file, but in the .MYI. Yes, in that file in which "indexes" lie. In the same place in combination lies another meta-information that is specific for MyISAM and is not written to the .frm file (.frm are common for several types of tables).

In the above description of the MyISAM format I have a description of MYI files. In fact, we only need to rewrite only 2 fields in this file (for MySQL version 5.1):
1) state-> state.records (offset from the beginning of the file - 0x1C, length 8 bytes, direct byte order (big endian), contains the total number of lines minus deleted)
2) state-> state.data_file_length (offset from the beginning of the file - 0x44, length 8 bytes, direct byte order (big endian), contains the file size .MYD in bytes).

If you have problems reading the "newly-made" tables, you should first check that you are not rewriting the existing table, but creating a new one. If you want to write a table instead of an old one, you must first make DROP TABLE of this table directly from MySQL, and only then create this table again (so that MySQL does not accidentally open the cache of open files). Another option is to make FLUSH TABLE for this table and then write new data there (useful if you make an APPEND to the table).

Total

So, if you read the article to the end (and if you know C), then you should have enough knowledge to write a program on C that reads and writes simple MyISAM tables. Depending on the complexity of the requests that you want to execute with your program, the speed increase can be up to 5-6 times. I managed to write a program that did something useful, reading the entire MyISAM table, in a time that was about 1/4 of the execution of the same query directly in MySQL. In my opinion, not bad. By the way, the program turned out only ~ 5 Kb and ~ 150 lines of C code.

So, if you suddenly need a super-high-performance full scan, then try MyISAM + your C program to read from the table. You will be pleasantly surprised at how easy it is to implement it, and with a little blood you can increase your reading performance several times.

UPD
Did you ask for the numbers? Here is a little bit for you:

1. The numbers will be very dependent on the implementation. I already wrote that I managed to speed up the request I needed 4 times. Total, read speed was 1 Gb / s with a write size of 25 bytes
2. I did not find any differences with FULL SCAN when using HANDLER or SELECT.
FULL SCAN in MyISAM gives reading speed, according to my estimates, at least 3-4 times more than FULL SCAN in InnoDB.
3. By itself, FULL SCAN in MyISAM works 5-10 times faster than an index scan with the same number of records (with some desire, the lag can be reduced to somewhere 2-3 times).
4. In InnoDB, a PRIMARY KEY scan works as fast as a FULL SCAN, a scan on any other index works about 3-4 times slower.

Comparisons with NoSQL do not make much sense if FULL SCAN is done for a large table:

1. Memcached cannot do FULL SCAN memory.
2. HandlerSocket though is able FULL SCAN, but on speed from SELECT / HANDLER does not differ (if it is a lot of lines). In addition, it is not supported in MySQL 5.1.
3. Mongo, Redis, etc occupy an order of magnitude more memory / storage, which in many cases is completely unacceptable. They work fast, no doubt. But their mission is different.

Source: https://habr.com/ru/post/126612/

All Articles