How to save the project from closing, having understood with MySQL

I continue the story of the development of economic online games. In this part, we will discuss the history of 2016, when the question of the closure of the project was in its full height.

A starting point

As the game development game objects become more and more, companies grow and cheat game situation becomes more and more difficult. Transactions hung on time out and game objects kept their state with errors, which in turn led to other errors. In the server logs, the following problem was written with enviable regularity: Lock wait timeout exceeded; try restarting transaction .

Google did not give an explicit solution, the general recommendation was to scour the business logic.
')
Night calls about problems, sleepless nights, ripped off weekends. At some point we came to a state of permanent anxiety, and ceased to be surprised at the errors that occurred. Also, the server reacted to some actions of the players for an inadmissibly long time.

This situation provoked the logical resentment of the players, which led to a gradual outflow of players and a drop in revenue.

In general, the situation had to be saved. Rolling up our sleeves, we began to look for a solution from scratch.

Update toolkit

For this, MySQL had to be upgraded to version 5.7, so that performance_schema fully worked, it allows you to collect statistics in real time and quickly track the impact of changes. I can not say that the upgrade went smoothly, patience is required. By the way with 5.5. You can not immediately go to 5.7, you first need to put 5.6. Next, install MySQL Workbench - a convenient interface for analyzing the database. Then we install the performance_schema in the Performance section.

First setting

We look at the data and check how the database works in accordance with the business logic, if there is no increased load where it should not be, we check potential bottlenecks. Using the Top File I / O File by Time report it was discovered that retail statistics are each time requested from the database. Since the data does not change on the game day and is the same for all players, you need to cache this data. These requests fell out of the statistics of the most frequent, the analytics began to be issued a little, and in general everything seems to be ok.

Slow anyway

The load has decreased, but for some reason the interface works with difficulty. Probably there is a Lock on MyISAM tables, trying to catch on the show processlist - but without success. The situation occurs several times a day, with no serious symptoms. We write simple code in order to log the arising Lock:

Table for saving data.

create table processlist_stat like information_schema.processlist alter table processlist_stat add column datetime datetime;

We do Thread which will constantly write to base about problems.

  public static class ProcessListThread extends Thread { private long timeout; public ProcessListThread(long timeout) { super("ProcessListThread"); this.timeout = timeout; } @Override public void run() { if (timeout > 0) { while (!interrupted()) { try { sleep(timeout); } catch (InterruptedException ie) { System.out.println("Interrupted!!!"); return; } if (processListThreadStopped) { Thread.currentThread().interrupt(); System.out.println("SelfDestruction"); } try { saveProcessListStat(); } catch (Throwable e) { Logger.error(e); } } } } }

Save the data to the database:

  private static void saveProcessListStat() throws ru.plazma.db.DBException { Executor e = GameObject.getExecutor(); Query rs = e.getConnection().newQuery(); rs.select("id,user,host,db,command,time,state,info", "information_schema.processlist", "info is not null and info not like '%info is not null%'"); rs.calculate("sysdate() as datetime"); Statement st = e.newStatement(); st.insert("processlist_stat", "id,user,host,db,command,time,state,info,datetime", rs); st.execute(); e.commit(); }

Start / Run:

  public static void runProcesslistLog() throws DBException { if (processListThread == null) { processListThread = new ProcessListThread(5 * 1000); } processListThreadStopped = false; processListThread.run(); } public static void stopProcesslistLog() { if (processListThread != null) { processListThreadStopped = true; processListThread.interrupt(); } }

This idea can be implemented on the triggers of the database itself, but when I started it, the server was completely down, so the path described above was chosen.

Bingo

We look at the table created above, where every 5 seconds. write the status of requests. Here they are ... Waiting for table level lock with a duration of 60 seconds ... it turned out that the key sales table is locked due to the fact that requests to it are suspiciously long. Along the way, some more requests surfaced, which are also taking too long.

We dare to double blow - to clear the tables of old data (they did not erase enough), to comb the business logic and remove unnecessary references to the table. It became better, but not much. What is the reason ... Indexes are, everything should be ok, but no. Long.

Oh these indexes

It turned out that the hanging requests make a full scan ... and the Primary Key does not involve ... the whole thing in the construction between. Filter by composite PK, if used between for some reason is not activated. We make additional index ... and vaul - everything flew.

Results

This alteration has breathed new life into the project, and has given a vital time to develop a new functionality, which we launched in a few months without losing the audience.

Source: https://habr.com/ru/post/322144/

All Articles