📜 ⬆️ ⬇️

Go: we accelerate sampling of large tables from MySQL

I have been using Go to write an ad network for almost a year now. I am developing on the server Intel i7-7700, 16Gb RAM, 256Gb SSD. And in the script that runs once a day, the task appeared to select all the shows for the past day and recalculate on this basis the statistics for the day for several objects at once (website, campaign, banner).

On Go idioms, everything is done quite trivially:

type Hit struct { siteID, zoneID, poolID, mediaID, campaignID uint32 } rows, err := db.Query("SELECT siteID, zoneID, poolID, mediaID, campaignID FROM "+where) if err != nil { log.Fatal("Query fail", err) } defer rows.Close() var ( c uint32 h Hit ) for rows.Next() { rows.Scan(&h.siteID, &h.zoneID, &h.poolID, &h.mediaID, &h.campaignID) campCounter.Inc(h.campaignID) siteCounter.Inc(h.siteID) zoneCounter.Inc(h.zoneID) poolCounter.Inc(h.poolID) mediaCounter.Inc(h.mediaID) c++ } if err := rows.Err(); err != nil { log.Fatal("Scan Rows err", err) } log.Println(name, " ", c, " ", where, "in", time.Since(now)) 

Everything is working. And the sample rate is 36 seconds for almost 56 million records.

 hit_20180507 55928930 time BETWEEN 1525640400 AND 1525726799 in 36.331342451s 

Under the hood of the go tool pprof performance analyzer we see something like the following
')
  flat flat% sum% cum cum% 7130ms 18.32% 18.32% 10800ms 27.75% runtime.mallocgc 2380ms 6.12% 24.43% 5710ms 14.67% fmt.(*pp).doPrintf 2140ms 5.50% 29.93% 13300ms 34.17% github.com/go-sql-driver/mysql.(*textRows).readRow 1800ms 4.62% 34.56% 2170ms 5.58% runtime.mapassign_fast32 1700ms 4.37% 38.93% 1700ms 4.37% runtime.heapBitsSetType 1170ms 3.01% 41.93% 36350ms 93.40% main.loadHits 1110ms 2.85% 44.78% 8500ms 21.84% runtime.convT2Eslice 1070ms 2.75% 47.53% 1970ms 5.06% fmt.(*fmt).fmt_integer 950ms 2.44% 49.97% 1380ms 3.55% github.com/go-sql-driver/mysql.readLengthEncodedString 930ms 2.39% 52.36% 1060ms 2.72% runtime.freedefer 930ms 2.39% 54.75% 930ms 2.39% runtime.mapaccess1_fast32 910ms 2.34% 57.09% 2070ms 5.32% runtime.deferreturn 860ms 2.21% 59.30% 1220ms 3.13% runtime.scanobject 

You may notice that we are working in the MySQL text protocol using mysql. (* TextRows) .readRow, respectively, the incoming rows Scan converts to uin32 types. But in the first place in time we have a memory allocation function.

What can be accelerated?

I accidentally caught the eye of the RawBytes type which guarantees that the bytes from the database driver will be transferred to the user without copying. What. We will try to extract Scan into an intermediate structure with sql.RawBytes fields and convert themselves then [] bytes to uint32 using the bu2 function that hastily written, discarding error checks (after all, you will not search for them in the text that came from the database, right?)

 func b2u(b []byte) uint32 { n := uint32(0) for _, c := range b { n = n*uint32(10) + uint32(c-'0') } return n } type HitRaw struct { siteID, zoneID, poolID, mediaID, campaignID sql.RawBytes } 

As a result, the processing time was reduced to 28 seconds, which makes reading 2 million lines per second already!

And the profiler already gives this picture

  4690ms 15.68% 15.68% 7630ms 25.51% runtime.mallocgc 2400ms 8.02% 23.70% 2700ms 9.03% runtime.mapaccess1_fast32 1660ms 5.55% 29.25% 1660ms 5.55% runtime.heapBitsSetType 1640ms 5.48% 34.74% 28110ms 93.98% main.loadHits 1590ms 5.32% 40.05% 1860ms 6.22% runtime.mapassign_fast32 1300ms 4.35% 44.40% 12450ms 41.62% github.com/go-sql-driver/mysql.(*textRows).readRow 1140ms 3.81% 48.21% 2090ms 6.99% runtime.deferreturn 1060ms 3.54% 51.76% 1470ms 4.91% github.com/go-sql-driver/mysql.readLengthEncodedString 1050ms 3.51% 55.27% 1050ms 3.51% main.b2u 1040ms 3.48% 58.74% 1130ms 3.78% database/sql.convertAssign 910ms 3.04% 61.79% 8640ms 28.89% runtime.convT2Eslice 730ms 2.44% 64.23% 2540ms 8.49% database/sql.(*Rows).Scan 

Well, not bad for a start. Next, I learned to study the MySQL driver, which, as it turned out, was written specifically for Go and implements low-level protocols itself, using sockets. And the second MySQL protocol turned out to be binary. That, in theory, gives a faster generation of a MySQL server response. Accordingly, the driver, less calls the functions of converting text-integer. To enable the binary protocol, you need to switch from db.Query to db.Prepare - stsm.Query - the minimum of changes to the source code and voila - 26.70 seconds of execution.

 stmtOut, err := db.Prepare(sqlQ) defer stmtOut.Close() if err != nil { log.Fatal("prepare", err, sqlQ) } rows, err := stmtOut.Query() if err != nil { log.Fatal("query", err, sqlQ) } defer rows.Close() 

The profiler shows that the protocol is already truly binary by (* binaryRows) .readRow, but when reading in RawBytes, it still goes through conversion to text, and then back.

  flat flat% sum% cum cum% 2910ms 10.79% 10.79% 3310ms 12.27% runtime.mallocgc 2280ms 8.45% 19.24% 2600ms 9.64% runtime.mapaccess1_fast32 1960ms 7.27% 26.51% 7070ms 26.21% database/sql.convertAssign 1530ms 5.67% 32.18% 1810ms 6.71% runtime.mapassign_fast32 1460ms 5.41% 37.60% 6660ms 24.69% github.com/go-sql-driver/mysql.(*binaryRows).readRow 1420ms 5.27% 42.86% 26680ms 98.92% main.loadHits 1210ms 4.49% 47.35% 3010ms 11.16% strconv.AppendInt 1100ms 4.08% 51.43% 1320ms 4.89% strconv.formatBits 950ms 3.52% 54.95% 1650ms 6.12% runtime.deferreturn 820ms 3.04% 57.99% 820ms 3.04% reflect.ValueOf 810ms 3.00% 60.99% 4120ms 15.28% runtime.convT2E64 750ms 2.78% 63.77% 4240ms 15.72% database/sql.asBytes 

Let's do Scan right away in uint32 structures! Already nothing should be converted - only the conversion is integer-integer.

The result was sad - 49.827306314s That is, the slowdown is generally terrifying. The most stupid option of all, despite a good theoretical basis for the fastest result. What is the matter?

We look:

  4620ms 9.22% 9.22% 29230ms 58.32% database/sql.convertAssign 3610ms 7.20% 16.42% 4010ms 8.00% runtime.mallocgc 3010ms 6.01% 22.43% 8610ms 17.18% reflect.(*rtype).Name 2980ms 5.95% 28.37% 5600ms 11.17% reflect.(*rtype).String 2770ms 5.53% 33.90% 3330ms 6.64% runtime.mapaccess1_fast32 2570ms 5.13% 39.03% 2570ms 5.13% reflect.ValueOf 1760ms 3.51% 42.54% 1980ms 3.95% runtime.mapassign_fast32 1640ms 3.27% 45.81% 6630ms 13.23% github.com/go-sql-driver/mysql.(*binaryRows).readRow 1540ms 3.07% 48.88% 3870ms 7.72% strconv.FormatInt 1240ms 2.47% 51.36% 49600ms 98.96% main.loadHits 1150ms 2.29% 53.65% 1150ms 2.29% reflect.Value.Type 1120ms 2.23% 55.89% 1120ms 2.23% reflect.Value.Elem 1070ms 2.13% 58.02% 30950ms 61.75% database/sql.(*Rows).Scan 1070ms 2.13% 60.16% 1070ms 2.13% strconv.ParseUint 

Judging by the presence of strconv.ParseUint - the conversion of 2 types is performed through a string! Seriously? reflect-transformations took the first lines in terms of execution time. No wonder Rob Pike talks about the careful use of reflection. You can mess things up.

Having studied the MySQL driver, I stumbled upon the fact that from the binary protocol all data is converted to int64 - we will try to derive benefit from it. Scan do in structure

 type HitRaw struct { siteID, zoneID, poolID, mediaID, campaignID int64 } ... h.siteID = uint32(raw.siteID) h.zoneID = uint32(raw.zoneID) h.poolID = uint32(raw.poolID) h.mediaID = uint32(raw.mediaID) h.campaignID = uint32(raw.campaignID) 

The result was 33.98 seconds. With this layout by function

  3600ms 10.48% 10.48% 14360ms 41.79% database/sql.convertAssign 2860ms 8.32% 18.80% 3340ms 9.72% runtime.mallocgc 2560ms 7.45% 26.25% 2920ms 8.50% runtime.mapaccess1_fast32 1660ms 4.83% 31.08% 6730ms 19.59% github.com/go-sql-driver/mysql.(*binaryRows).readRow 1540ms 4.48% 35.56% 33970ms 98.86% main.loadHits 1410ms 4.10% 39.67% 1690ms 4.92% runtime.mapassign_fast32 1340ms 3.90% 43.57% 1340ms 3.90% reflect.ValueOf 1290ms 3.75% 47.32% 4010ms 11.67% reflect.Value.Set 940ms 2.74% 50.06% 15960ms 46.45% database/sql.(*Rows).Scan 900ms 2.62% 52.68% 900ms 2.62% reflect.Value.Elem 840ms 2.44% 55.12% 840ms 2.44% reflect.Value.Type 840ms 2.44% 57.57% 1500ms 4.37% runtime.deferreturn 810ms 2.36% 59.92% 810ms 2.36% reflect.directlyAssignable 760ms 2.21% 62.14% 760ms 2.21% runtime.getitab 730ms 2.12% 64.26% 900ms 2.62% reflect.Value.assignTo 720ms 2.10% 66.36% 4060ms 11.82% runtime.convT2E64 

It can be seen that sql.convertAssign reduces all the benefits of using a binary protocol. And now the data is not copied through the text, but inside reflect it is still quite difficult to determine that int64 can be copied to the user's int64 variable. And copying numbers to text and back goes faster than reflect.directlyAssignable - reflect.Value.assignTo.

As a warm-up, I tried to translate the b2u function into a go-assembler. Assembler was my one of the first BK-0011 programming languages ​​learned at school without a drive and a cassette recorder) So it was funny. Although Go generates a practically optimal code and if you don’t come up with algorithmic tricks or using non-standard ASM commands, there’s no special point in writing these functions.

 // func b2u(data []byte) uint32 // // memory layout of the stack relative to FP // +0 data slice ptr // +8 data slice len // +16 data slice cap #include "textflag.h" TEXT ·B2u(SB),NOSPLIT,$0-24 // data ptr MOVQ data+0(FP), SI // data len MOVQ data+8(FP), CX // result in AX MOVBLZX (SI), AX // - '0' SUBL $48, AX // check end of loop DECQ CX JZ AX2RET LOOPBYTE: //move to one byte upper INCQ SI MOVBLZX (SI), BX //prev result *= 10 IMULL $10, AX // bx -= '0' SUBL $48, BX ADDL BX, AX // check end of loop while (cx--) DECQ CX JNZ LOOPBYTE AX2RET: MOVL AX, ret+24(FP) RET 

According to the tests, it gives an acceleration of 2-20% from the Go version. Depends on the number of digits in the number.
As a result, the working example accelerated to 26.94 seconds.

Conclusion from the article, for those who only looked through the text - the fastest way to read a large amount of integer data from MySQL to memory - use db.Prepare - stmt.Query - Scan in interface {} and convert uint32 (hit.siteID. (Int64) ) into an unsigned integer that works for uint64 without cutting the upper bit off.
That is, the ways of working with the driver shown in standard examples are not always optimal. Perhaps the developers will pay attention to the driver's behavior, because there are a lot of hidden calls and overheads for both a simple and not overloaded with language functionality. After all, Go in tests, in which SELECT selections from the database appear, does not shine with performance . Moreover, not all experienced programmers have the time and desire to dig under the hood. As far as I know, nobody has ever conducted such tests on SELECT at all.

UPD: In the comments gave an example of a miraculous driver github.com/lazada/sqle supposedly designed for quick reading. The result was when reading through
rows.Scan (& h.siteID, & h.zoneID, & h.poolID, & h.mediaID, & h.campaignID) in uint32 variables is very sad
55928930 time BETWEEN 1525640400 AND 1525726799 in 1m0.307942824s
And if you look at what the program did for a minute, it becomes clear that there was simply no thought of optimizing this case. Standard driver wins 2 times.
flat flat% sum% cum cum%
4.63s 7.62% 7.62% 29.25s 48.11% database/sql.convertAssign
4.22s 6.94% 14.56% 4.68s 7.70% runtime.mallocgc
2.97s 4.88% 19.44% 8.48s 13.95% reflect.(*rtype).Name
2.96s 4.87% 24.31% 5.51s 9.06% reflect.(*rtype).String
2.90s 4.77% 29.08% 41.28s 67.89% github.com/lazada/sqle.(*Rows).Scan
2.51s 4.13% 33.21% 2.95s 4.85% runtime.mapaccess1_fast32
2.38s 3.91% 37.12% 2.38s 3.91% reflect.ValueOf
1.92s 3.16% 40.28% 3.69s 6.07% runtime.assertE2I2
1.77s 2.91% 43.19% 1.77s 2.91% runtime.getitab
1.65s 2.71% 45.90% 7.04s 11.58% github.com/go-sql-driver/mysql.(*binaryRows).readRow
1.49s 2.45% 48.36% 1.86s 3.06% runtime.mapassign_fast32
1.35s 2.22% 50.58% 60.27s 99.13% main.loadHits
1.31s 2.15% 52.73% 4.01s 6.60% github.com/lazada/sqle.typeCheck
1.31s 2.15% 54.88% 4.19s 6.89% strconv.FormatInt
1.28s 2.11% 56.99% 2.88s 4.74% strconv.formatBits
1.25s 2.06% 59.05% 1.25s 2.06% reflect.(*rtype).Kind
1.12s 1.84% 60.89% 31.05s 51.07% database/sql.(*Rows).Scan

If you read Scan to [] byte variables and convert b2u () to uint32, you get 44 seconds. In my opinion, you can not further test what a great substitute for the standard database / sql guys wrote. The next myth about acceleration has been debunked about practical tests.

UPD2: To understand the speed of reading from MySQL made a cycle without conversions and calculations
 for rows.Next() { c++ } 

total - 11.9 seconds. Thus, 15 seconds out of 27 spend on converting and working with hashmaps.

UPD3: After writing the article, I began to understand more about the insides of Go - and found a pill for my case right under my nose:
golang.org/pkg/database/sql/#Rows.Scan
If an argument has a type that has been defined as the interface case, it does not apply.
Here is the final version of the code, which I recommend to everyone:
 type HitRaw struct { siteID, zoneID, poolID, mediaID, campaignID, status interface{} } var ( hit HitRaw h Hit ) for rows.Next() { rows.Scan(&hit.siteID, &hit.zoneID, &hit.poolID, &hit.mediaID, &hit.campaignID, &hit.status) h.siteID = uint32(hit.siteID.(int64)) h.zoneID = uint32(hit.zoneID.(int64)) h.siteID = uint32(hit.siteID.(int64)) h.poolID = uint32(hit.poolID.(int64)) h.mediaID = uint32(hit.mediaID.(int64)) h.campaignID = uint32(hit.campaignID.(int64)) h.status = uint8(hit.status.(int64)) 

The result will not leave anyone indifferent:
12.206921479s on 55.9 million records. And this is after 27 seconds, in which I thought I had reached Nirvana.
And the profiler produces already quite a good deal - most of the time, the program updates the counters in hash. Binary data is sent from the driver straight to where they are waiting. Assembler was not at all mandatory.
flat flat% sum% cum cum%
3110ms 15.20% 15.20% 3460ms 16.91% runtime.mallocgc
2350ms 11.49% 26.69% 2700ms 13.20% runtime.mapaccess1_fast32
1850ms 9.04% 35.73% 2150ms 10.51% runtime.mapassign_fast32
1460ms 7.14% 42.86% 6670ms 32.60% github.com/go-sql-driver/mysql.(*binaryRows).readRow
1420ms 6.94% 49.80% 20070ms 98.09% main.loadHits
1170ms 5.72% 55.52% 1190ms 5.82% database/sql.convertAssign
840ms 4.11% 59.63% 4300ms 21.02% runtime.convT2E64
710ms 3.47% 63.10% 2470ms 12.07% database/sql.(*Rows).Scan
680ms 3.32% 66.42% 1260ms 6.16% runtime.deferreturn
680ms 3.32% 69.75% 680ms 3.32% sync.(*RWMutex).RLock
650ms 3.18% 72.92% 650ms 3.18% runtime.aeshash32
630ms 3.08% 76.00% 630ms 3.08% sync.(*RWMutex).RUnlock

Source: https://habr.com/ru/post/358522/


All Articles