Accelerate Website Database

Website loading speed has always been an important topic, but it has become even more relevant since April 2010, when Google began to take it into account when ranking sites in search results. However, the main bias, as a rule, was done on reducing the size of files, optimizing server parameters, CSS and Javascript.
There is also another important factor. This is the speed with which web pages are formed on the server. Most large modern sites store all information in databases and use different languages (for example, PHP or ASP) to extract it, generate an HTML page and send it to the browser.

Thus, if the pages of your site are given in more than 1.5 seconds (the threshold value in Google, at which the site is considered fast), you may disappoint your visitors, similarly, if your search results page takes a very long time to load or if, for example, Product description pages load quickly, but User Testimonials load for a few seconds.

The threshold set by Google to determine “fast” websites is ~ 1.5 seconds. A similar schedule can be obtained from Webmaster Tools (go to [domain] → Diagnostics → Performance).

This article discusses such issues and describes several ways to speed up your website by optimizing the database. It begins with well-known facts, including a description of more complex methods, and contains a list of resources for further reference. The article is intended for fearless beginners and for developers who commit rash acts.

What is a database? What is SQL?

A database is generally a set of tables with information, such as, for example, a list of customers and their orders. This could be a card file, a bunch of spreadsheets, a Microsoft Access file, or 40 terabytes of Amazon data about books and customers .
A typical database for a blog contains tables containing information about users, categories, posts and comments. Wordpress initially has these and several other tables. The database for the e-commerce system contains tables with information about customers, products, categories, orders, products contained in the "basket". Magento open engine contains these and many other tables. The database has many application possibilities — for content management, storage of customer relations, invoices and invoices, events — both of these types (entry for blogs and e-commerce sites) will appear in this article.
Some tables in the database of the bath are associated with other tables. For example, a blog post may contain several comments, a client may make several orders (these are “one-to-many” relationships). The most complex type of relationship in a database is many-to-many. This type of relationship is the core of the database of e-commerce systems: one order may contain different products, just as any product may be contained in different orders. This is where the “order content” table appears, which is “located” between orders and goods, and it is recorded in it every time a user adds an item to the order. We will understand how important this will be later when we consider why some queries take a long time to execute.
A database is also understood as software that contains all of this data. Software means when they say, “While I had breakfast today, my database“ fell ”” or “I need to update the database”. ( “Here we mean most likely a DBMS, not a database,” said the translator (:) . The common systems are Microsoft Access 2010, Microsoft SQL Server, MySQL, PostgreSQL and Oracle Database 11g.
The acronym SQL is often mentioned when it comes to databases. It takes place in the phrases "Structured Query Language" (structured query language) and is pronounced "S'Ku -El" ( "The NSS, if they developed it in the USSR," said the translator (:) . This language allows us to "communicate" with the database in an amazing way:

SELECT lastname FROM customers WHERE city='Brighton';

This code is called a request. There are also other expressions for working with a database, such as INSERT (for adding data), UPDATE (for updating), DELETE (for deleting), CREATE TABLE (for creating tables), ALTER TABLE and many others.

How can a database slow down a website?

The new empty site will work very quickly, but as the project progresses, you may notice some “brakes” on certain pages, in particular, on pages with complex elements of functionality. Suppose you would like to display “What else do customers buy with this product ...” at the bottom of the product listing page. To obtain this information from the database, you must perform the following steps:

Determine the product with which we will "work".
Determine how many times this product has recently been added by customers to the “basket” (table “order content” above).
Get a list of products that were also added to the "basket" with the same order (only confirmed orders).
Select buyers who made these orders.
Select the orders that users made from the item above.
Look at the contents of their orders (the same table "contents of the order").
Get information about these products.
Identify products that are found in these orders more often, and display this list.

You could fit everything in one complex query or break it down into a few more simple ones. In any case, it can be executed very quickly if there are 20 products, 12 customers, 18 orders and 67 purchased goods in your database (the total number of goods in the “baskets”). But if everything is done inefficiently, the performance of such an operation will be very slow on a large amount of data. For example, processing 500 products, 10,000 customers, 14,000 orders and 100,000 purchased goods will slow down the page load.
This is a very intricate example, but it gives an opportunity to imagine what is going on “behind the scenes” and why a seemingly innocent element of functionality can “put on” a site.
The site slowdown may be caused by other reasons: the server is working with insufficient memory or disk space; another site on this server consumes a lot of resources; the server conducts email-mailing or is busy with some other “hard” task; software or hardware errors; wrong configuration. Maybe the popularity and, consequently, site traffic has suddenly increased? In the next two sections, the speed of work will be considered in more detail.
')

Case in the database?

Currently, there are several ways to analyze the speed of your site, including the Firebug plugin for Firefox , tools for developers in Google Chrome (Shift + Ctrl + I, more Resources → Enable resource tracking), Yahoo YSlow . There are also special sites like WebPagetest where you enter a URL and it will measure the speed from the specified location.
These tools display a diagram of all the resources (HTML, images, CSS and Javascript files) available on the page, indicating the load time for each of them. These tools will also determine the time spent on:

performing a DNS query (converting domain names to an IP address),
server connection
waiting for a response from the server ("Time to first byte" aka TTFB - time to the first byte (approx. translator)),
receiving (loading) data from the server.

Many web pages are collected entirely on the server, including PHP, which accesses the database, then sends everything to the browser at once, so any database delays will lead to a long waiting time, and the data receiving / loading time will be proportional to their size. . Thus, with a fast connection, a 20 KB web page that is formed in 5 seconds (albeit loading in 0.05 seconds) will create a big delay on the server.
But still not all pages are like that. The flush () PHP function sends already generated HTML data to the browser. Any further delays will already be associated with the loading of this data, and not with their waiting.
In any case, you can compare the waiting / loading times of supposedly slow and complex web pages with the waiting time of a similarly sized HTML page (or image, or other static element) on the same server at the same time. This eliminates the impact of perhaps a slow Internet connection, or server load (both of these options will cause delays) and will allow you to compare these periods of time spent on the formation of pages. This, of course, is not an exact science, yet it will give some idea of what and where it is slowing down.
The screenshots below show the results of the analysis by the Google Chrome developer tool of a web page and image of the same size - 20 Kb. The waiting time of the web page was 130 ms, the load time was 22 ms. The time for the image is 51 ms and 11 ms, respectively. They have approximately the same download time, but the server took an additional 80 ms to process and build the web page, which is a consequence of running the PHP code and interacting with the database.
When performing these tests, analyzing static content, refresh the page in order not to get its cached version. In addition, perform the test several times to ensure that you do not encounter a statistical deviation. The third screenshot below shows that WebPagetest shows almost twice as much time as a tool from Google, on the same page, at the same time. This suggests that you need to use any one tool for testing.

Using the Google Chrome toolkit, we get 130 ms when loading a webpage

The same tool. We have 51 ms when loading images of similar size

Analyzing the same page using WebPagetest, we get 296 ms to wait and 417 ms total load time

Measuring queries in MySQL / PHP?

Having a general idea, we will understand in more detail. If you suspect that the database may be slowing down your site, you need to find out exactly what is causing the delay. I will define a pair of functions that will calculate the execution time of each database query. This code is for PHP / MySQL, but the method can be used on any database-using site:

 function StartTimer ($what='') { global $MYTIMER; $MYTIMER=0; //global variable to store time //if ($_SERVER['REMOTE_ADDR'] != '127.0.0.1') return; //only show for my IP address echo '<p style="border:1px solid black; color: black; background: yellow;">'; echo "About to run <i>$what</i>. "; flush(); //output this to the browser //$MYTIMER = microtime (true); //in PHP5 you need only this line to get the time list ($usec, $sec) = explode (' ', microtime()); $MYTIMER = ((float) $usec + (float) $sec); //set the timer } function StopTimer() { global $MYTIMER; if (!$MYTIMER) return; //no timer has been started list ($usec, $sec) = explode (' ', microtime()); //get the current time $MYTIMER = ((float) $usec + (float) $sec) - $MYTIMER; //the time taken in milliseconds echo 'Took ' . number_format ($MYTIMER, 4) . ' seconds.</p>'; flush(); }

StartTimer starts the timer, and also displays everything that you measure. The second line is verification of your IP address. This can be useful if you (temporarily) take measurements on a working site and do not want everyone to be able to view such statistics. Uncomment the line by removing the initial // and replace 127.0.0.1 with your IP address . StopTimer stops the timer and displays the elapsed time.
Most modern sites (especially well-made open projects) have a lot of PHP files, but database queries are executed only in some of them. Look for the mysql_db_query or mysql_query lines in these files. Many software developments, such as BBEdit, have functions that allow you to perform a similar search. If you are familiar with the Linux console, try the following command:

 grep mysql_query `find . -name \*php`

As a result, get something like this:

 mysql_query ($sql);

For WordPress 3.0.4, this will be line 1112 of the wp-includes / wp-db.php file . You can copy the functions described above to the beginning of the file (or to the file that connects to each page), and then add the StartTimer and StopTimer functions before and after the mysql_query line to make it like this:

 StartTimer ($query); $this->result = @mysql_query( $query, $dbh ); StopTimer();

The screenshot below shows the result of adding our code immediately after installing WordPress. A total of 15 requests are processed, each takes about 0.0003 seconds. (0.3 ms), which is expected from an empty database.

Here all WordPress requests are displayed and measured.

If you find this line in other widely used systems, please share this information by adding a comment to this article.
You can do other interesting things: you can see how much faster your computer is compared to mine. “Countdown” to 1.000.000 takes 2.9420 seconds on my computer, my server is a bit faster - 2.0726 seconds.

 StartTimer ('counting to 10000000'); for ($i=0; $i<10000000; $i++); //count to a high number StopTimer();

Something about the results

This method gives only comparative results. If your server was busy at this moment, then all requests were executed more slowly than usual. But they should have been at least able to determine how long the “fastest” request is executed (perhaps 1-5 ms), slower (more than 200 ms) and the “heaviest” (more than 1 second). You can run this test several times over the course of an hour or a day (but not immediately after the previous test — see the database cache section) to make sure this is not a fluke.
It is also likely to seriously spoil the design of the web page. Various warnings may also be issued from PHP such as “Cannot modify header information. Headers already sent by ... ". This happens because measurement messages are ahead of the cookies and session headers. If the main content of the page is displayed, you can ignore these messages. If the page turns out to be empty, then you may need to declare the StartTimer and StopTimer functions around certain blocks of code, and not around mysql_query .
This method is a fairly quick way to get rough results; you shouldn’t leave it on a working website.

What else could be the reason?

If database queries are not so slow, but the page is still formed for a long time, then the reason is most likely poorly written code. You can add timer functions around large areas of code to see if there is a delay there? Perhaps the reason is that you run over 10,000 lines of information, even if you display only 20 items?

Profiling

If you're still confused and / or want to get more detailed information about what's going on in your code, try various debugging and profiling tools, such as Xdebug , which analyzes the local copy of the site. It can even visually display all bottlenecks.

Indexing tables

The experiment above may have surprised you by showing how many queries to the database page on your site, and hopefully helped you identify slow queries.
To speed up the process, let's look now at some simple improvements. To do this, you need to somehow send requests directly to the database. Many server administration packages (such as CPanel or Plesk) come with PhpMyAdmin to perform these tasks. In addition, you can upload something like phpMiniAdmin to the site , just one PHP file that allows you to browse the database and execute queries. You will need to enter a database name, username and password. If you do not know them, you can easily find them in the configuration file of your site, if there is one (in WordPress, for example, this is WP-config.php).
Among the queries to the database, which contain pages of the site, you probably saw conditions with WHERE . This is a way to filter results using SQL. For example, if you are viewing the Purchase History page on the site, there are probably queries that determine who placed the orders. Something like that:

 SELECT * FROM orders WHERE customerid = 2;

This request retrieves all orders placed by the client with id 2. On my computer, with 100,000 order records is executed in 0.2158 seconds.
Columns, such as CustomerID, containing many possible values that are used in WHERE clauses , in combination with = or < , or > , should be indexed. It's like the content at the end of the book: it helps the database quickly retrieve indexed data. This is one of the fastest ways to speed up database queries.

What to index?

In order to know which columns to index, you need to have an idea of how the database is generally used. For example, if your site is often used to search for categories by name or events by date, then these columns should be indexed:

 SELECT * FROM categories WHERE name = 'Books'; SELECT * FROM events WHERE startdate >= '2011-02-07';

Each of the database tables must have an identifier column (usually id , but sometimes an ID or ArticleID , etc.) specified as the primary key , as in the screenshot of the wp_posts table below. These primary keys are automatically indexed. But you should also index columns that reference identifiers in other tables, such as the CustomerID in the example above. In this case, they will be foreign keys .

 SELECT * FROM orders WHERE customerid = 2; SELECT * FROM orderitems WHERE orderid = 231;

If you need to search by a large amount of textual data, for example, description of goods or content of articles, you can add another type of index - FULL TEXT . Queries that use indexes of the type FULL TEXT can span multiple columns and are initially set up to allow for words with a length of 4 characters or more. It also excludes " stop words " and words found in more than 50% of indexed records. However, to use this type of index, you need to change the SQL query. Below are the queries, with and without using the FULL TEXT index:

 SELECT * FROM products WHERE name LIKE '%shoe%' OR description LIKE '%shoe%'; SELECT * FROM products WHERE MATCH(name,description) AGAINST ('shoe');

It may seem that in this way you need to index everything. But, although indexing speeds up the selection , it slows down the insertion , update and delete operations. So, if you have a table with a description of the goods, which is unlikely to change often, you can index it. But the table with orders will probably constantly change - in this case, you need to be more careful with indexing.
You also need to remember about the cases in which indexing does not help . For example, if most of the values in a column would have the same value. If the item status in the status column was “1”, meaning “is available”, and 95% of all products would be “available”, the index would not help when searching for goods that are available. Imagine if you had to make a pointer at the end of the book for any excuse, then links would go to each of its pages.

 SELECT * FROM products WHERE stock_status = 1;

How to index?

Using phpMyAdmin or phpMiniAdmin, you can look at the structure of each table and see if the required columns are indexed. In PhpMyAdmin, select the name of the table; at the end of the structure there will be a list of indices; in phpMiniAdmin, at the top, click on "Show tables", then "sct" (show create table) opposite the table you need; As a result of these actions you will see a query necessary to create a table, at the end of which there will also be a list of indexes, something like:

 orderidindex" KEY ("orderid");

Use PhpMiniAdmin to view the list of indexes in the wp_posts table in WordPress

If the index does not exist, you can create one yourself. In PhpMyAdmin in the section "Indexes" specify the number of columns for which you want to build an index and click "Go". Enter the index name, select the desired columns and click “Save”, as shown in the screenshot below:

Creating an index using phpMyAdmin

In PhpMiniAdmin you will need to perform the following request by inserting its corresponding field at the top of the page:

 ALTER TABLE orders ADD INDEX customeridindex (customerid);

Performing a search query after creating the index required 0.0019 seconds on my computer, which is 113 times faster.
Adding a FULL TEXT index is done similarly. The index should be compiled by the columns, in fact, by which you are looking for:

 ALTER TABLE articles ADD FULLTEXT(title,author,articletext); SELECT * FROM articles WHERE MATCH(title,author,articletext) AGAINST ('mysql');

Backup and Security

Before making any changes in the tables, make a backup of the entire database. You can do this with PhpMyAdmin and PhpMiniAdmin by clicking the “Export” button. If your database contains important information, such as customers, keep backups in a safe place. You can also use the mysqldump command to back up the database via SSH:

 mysqldump --user=myuser --password=mypassword --single-transaction --add-drop-table mydatabase > backup`date +%Y%e%d`.sql

Such scenarios also pose a security risk because provide an attacker with an easier way to get to your data. Unlike PhpMyAdmin, which is to some extent protected by server management tools, phpMiniAdmin is one file that is not long to download and forget about it. It will be better to protect access to it with a password or delete immediately after use.

Optimizing tables

MySQL and other types of database software have built-in optimization tools. If the data in your tables changes frequently, then you can use these tools regularly to make the database tables take up less space and be more efficient. But such procedures take a certain amount of time (from a few seconds to a few minutes or more, depending on the size of the table), and they can block other queries, so that optimization is better in the period of the smallest loads. Disputes about the necessary periodicity of optimizations do not abate .
To start the optimization procedure (for the orders table), run the following command:

 OPTIMIZE TABLE orders;

My non-optimized orders table with 100,000 entries occupied 31.2 MB, and the query of the form SELECT * FROM orders ran for 0.2676 seconds. After the first optimization, the size was reduced to 30.8 MB, and the query is executed in 0.0595 sec.
The following PHP function will start optimizing all tables in the database:

 function OptimizeAllTables() { $tables = mysql_query ('SHOW TABLES'); //get all the tables while ($table = mysql_fetch_array ($tables)) mysql_query ('OPTIMIZE TABLE ' . $table[0]); //optimize them }

Before starting this function you should connect to the database. Most modern websites do this automatically, but for the sake of completeness we give the corresponding code:

 mysql_connect (DB_HOST, DB_USER, DB_PASSWORD); mysql_select_db (DB_NAME); OptimizeAllTables();

We use cache

Just as the browser caches the web pages you visit, frequently accessed database queries can be indexed. It took 0.0019 seconds to complete the above request. using index:

 SELECT * FROM orders WHERE customerid=2;

Repeated execution of the same request requires only 0.0004 seconds, since MySQL remembers the execution results and can display them without re-running the query completely.
However, many news sites and blogs may use similar queries to ensure that articles are displayed only after the “publication” date:

 SELECT * FROM posts WHERE publisheddate <= CURDATE(); SELECT * FROM articles WHERE publisheddate <= NOW();

Such requests cannot be cached, because they depend on the current time and date. In the table with 100,000 entries, the query by type, one of the above, was executed on my computer for about 0.38 seconds. on non-indexed column.
If such requests are executed on each page of your site hundreds of times a minute, then just the same caching will significantly increase performance. You can use caching to make requests by replacing NOW and CURDATE with actual time, for example, like this:

 SELECT * FROM articles WHERE publisheddate <= '2011-01-17 17:00';

You can use PHP to make sure that the time intervals are 5 minutes or so:

 $time = time(); $currenttime = date ('Ymd H:i', $time - ($time % 300)); mysql_query (“SELECT * FROM articles WHERE publisheddate <= '$currenttime'”);

The expression % 300 rounds time to within 300 seconds (5 minutes).
MySQL also has other non-cacheable functions , for example, RAND.

And the cache is growing ...

The increasing amount of cached data can also slow down the website. The more posts, pages, categories, products, articles and other elements will be on your site, the requests should be more related. Take a look at an example:

 SELECT * FROM articles WHERE publisheddate <= '2011-01-17 17:00' AND categoryid=12;

Perhaps, if your site had 500 categories, such requests would be placed in the cache, and the result would be returned in milliseconds. And what if there are 1,000 constantly viewed categories? They will push each other out of the cache and run much slower. In this case, an increase in cache size may help. But allocating more memory for the cache can adversely affect other tasks, so be careful. You can find many tips on enabling and improving cache performance by changing server variables.

When caching is powerless

In case of changes in the table, the cache becomes powerless. When inserting, updating, deleting a row from a table, all queries related to this table are deleted from the cache. So if the 'articles' table is updated every time you view any article (for example, there is a field with a view counter in this table), then the improvements described above may not help.
In such cases, you may want to explore any tools that implement the cache at the application level, for example, Memcached . You can also read the following section for more information on creating your own caching system. Both of these options require much larger software changes than we discussed.

Own cache

If particularly heavy database queries are executed for a long time, and the data does not change often, you can cache the results yourself.
Suppose you want to show 20 popular records on the site over the last week, using a formula that takes into account search queries, views, adding to "favorites", "Send to a friend". And you want to display this list as a bulleted list on the main page.
The easiest way would be, for example, using PHP, to make a query to the database once every hour or day and save the results to a separate file, which then simply connect to the site page.
After writing the PHP code that generates the list file, you can use several methods to run it on a schedule. You can use the server scheduler (in Plesk 8: Server → Scheduled Tasks) to run this script hourly like this:

 wget -O /dev/null -q http://www.mywebsite.co.uk/runhourly.php

In addition, you can use the same PHP to check the file creation time. If the file was created at least an hour ago, then execute the query. 3600 in this case is the number of seconds in an hour:

 $filestat = stat ('includes/complicatedfile.html'); //look up information about the file if ($filestat['mtime'] < time()-3600) RecreateComplicatedIncludeFile(); //over 1 hour readfile ('includes/complicatedfile.html'); //include the file into the page

Returning to the example “What else do customers buy with this product ...”, you can also cache the data in a new column or in general a table. Once a week, it will be possible to run a large set of complex queries for each product in order to determine which products are bought with it.
You can then store the resulting product identifiers in a new column as multiple items, separated by commas. In the future, to get a list of products that are purchased together with a product with id = 12, you will need to do this:

 SELECT * FROM products WHERE FIND_IN_SET(12,otherproductids);

Reducing the number of requests using JOIN

In any of the sections of your site, possibly in the administrative section, a list of users with orders made by them is displayed.
This uses a query like the one below (for sampling by value, meaning that the order is completed):

 SELECT * FROM orders WHERE status>1;

And for each order you still need to find the client who designed it:

 SELECT * FROM customers WHERE id=1; SELECT * FROM customers WHERE id=2; SELECT * FROM customers WHERE id=3; etc

If the page displays information about 100 orders at once, 101 requests will need to be made. And if you also need to display information about the delivery address from another table, the total cost of all orders, the page creation speed will drop, fall ... Everything can be done much faster by combining queries through JOIN. The following is an example of combining the above queries:

 SELECT * FROM orders INNER JOIN customers ON orders.customerid = customers.id WHERE orders.status>=1;

There is also another way to write these queries, but without the JOIN:

 SELECT * FROM orders, customers WHERE orders.customerid = customers.id AND orders.status>=1;

Translation of requests to use a JOIN can be difficult because will have to change and php code. But if thousands of requests are executed on the “slow” page, then maybe it is worth paying attention to the described method? For more information you can refer to Wikipedia, which details about JOIN . The columns that are used in the JOIN (in the example customerid ) should be indexed.
You can also ask MySQL to " explain " how it handles your query. As a result, you will see how and what tables are used in the execution of the query, and can optimize something. The screenshot below shows the result of performing one of the complex queries in WordPress using EXPLAIN:

Using EXPLAIN, we learn how MySQL "perceives" complex queries

The screenshot shows which tables and indexes are used, the JOIN type, the number of rows analyzed and other information. The official MySQL site describes what EXPLAIN explains , and says a little about how to use this information to optimize queries (for example, adding indexes).

Shit

Finally, returning again to the example of “What else do customers buy with this product ...”, you can simplify it. Rename it to "Recommended Products", for example, and display several other products of their same category or specified manually.

Conclusion

This article shows several methods for improving database performance from simple to fairly complex. Although most well-designed sites should already include similar methods (using JOINs and indexes).
There is a lot of controversy about the effectiveness and reliability of using some of the methods described (measuring speed, indexing, optimization, caching, etc.), so the final decision is yours. In any case, you now have options to think about.
If your site has started to work more slowly after several months or years of normal work, you have something to push off to find the cause.

PS , , .
PPS Q&A , Google Page Speed Online .

Source: https://habr.com/ru/post/116142/

All Articles