📜 ⬆️ ⬇️

Experience of implementing caching in a small project with a strong social component

I want to share the experience of implementing caching with memcached on my site. The text will be useful to beginners in web development, who are wondering "how to put into practice those 100,500 caching articles that are easily found in search engines." I do not pretend to the truth, just telling how it happened with me.

Initial data:
The site is spinning on one dedicated server, but due to the likelihood of further growth in the future, memcached is selected for caching;
Daily attendance: ~ 23,000 unique visitors and ~ 300,000 page views;
80% of visitors are authorized users;
The main content: the text (books that authors write and publish on the site by chapters, like samizdat).
Services: personalized news, reading texts, divided into chapters, comments, profiles, blogs, ratings, subscriptions, tags, bookmarks, private messages, counters, email notifications ...
User activity: more than 10,000 actions, leading to a change in content, per day.

Difficulty implementing caching: the vast majority of pages contain personalized data. Somewhere everything is unique, right down to requests to the database, somewhere you can divide requests into general and unique, somewhere it is impossible, somewhere personal user settings are applied to the data after they have been selected from the database.

Introduction and Lyrics


I have a bicycle - a favorite tricycle, written nine years ago. Since then, he corresponded many times, but since I didn’t work on web development or programming professionally, the project remained technologically very low and the principle of “do not touch while working” reigns in work on it.
')
Initially, the project was written in PHP naked, without frameworks, without OOP, without the use of third-party libraries, without template engines and without templates at all. The database, by itself, MySQL, tablets, by itself, MyISAM, the server, by itself, Apache (later nginx stood before it), on the front end, by itself, jQuery and quite a lot of AJAX.

Time passed, users arrived, I was entertained by thinking of how to make the site even more convenient for authors and readers (the two main roles that users of the site share), authors and readers liked and called friends. Over time, the site has accumulated a lot of various settings, checkboxes and functions. And so, when in the evenings the server began to show suspicious 3-4 la, and phpmyadmin told me about 89 requests per second and 48 Gb / h of network traffic, I found it useful to look for information on caching.

Preparation - analyze what to cache


I reached into liveinternet, took statistics on the pages for the last month and made a sign with the distribution of views on the sections of the site.



You can clearly see that first of all, you need to cache two sections (3, read), which together account for almost 60% of all page views and for which there are many requests to the database and a lot of data formatting. Then it is necessary to cache main & news, well, everything else is already insofar as it is to clear your conscience and in the hope of a multiple increase in attendance.

Scene one - we cache the texts of books


Section - read.
The easiest part and the most important resource saving. So, the texts of books are stored in the database in the form of a single tablet, where the entry is the chapter of the book. The text of the chapter and the meta-information are stored together, and the text is stored in raw form - it must also be processed before output. Why is raw text stored? - It happened historically, from the database, the text can be displayed in many places: on the reading page, in the simple editor to the author, in the visual editor to the author, in the script for creating the fb2 version. In an amicable way, it is long overdue that the database should be redone and the option ready for output should be stored next to the raw text, but this still remains the basis for further optimization.

A separate chapter or all chapters can be displayed on the page at once. Therefore, it was decided to cache the following to save memory:
1. An array with a list of chapters (one-dimensional array containing sequence numbers of chapters). The key is "f123".
2. For each chapter - an array containing meta-information about the chapter and prepared for the conclusion of the text of the chapter. The key is “f123c1”.
Why is not stored html-ka at once with the whole chapter - because when displaying all the chapters and outputting one chapter, the formatting of the chapter name is different.

A minor complication is that for the author of the book and the administrator, the output must go past the cache, since it will also contain unpublished chapters.

The script algorithm is simple:
1. Chapter requested.
if($is_author || $is_admin) { //    ,    } else { $data = cache_get('f'.$id.'c'.$cid); // $id - id , $cid -     ,      $_GET if(!$data) { //       } else { //   } } 

2. All chapters requested.
 if($is_author || $is_admin) { //     ,    } else { $data = cache_get('f'.$id.'conarr'); // $id - id     $_GET if(!$data) { //       .      ,     ,  ,   ,   ,           .     } else { $data = unserialize($data); foreach($data as $tmp) { $tmp_data = cache_get('f'.$id.'c'.$tmp['cid']); if(!$tmp_data) { //    ,     } else { //   } } } } 

Cache reset

The cache of chapters and content is created indefinitely, its resetting should be managed.
1. In the admin there is a function of resetting all entries in the cache for a particular book.
2. When the author publishes a new chapter, the chapter list cache is deleted.
3. When the author makes changes to the chapter, the cache of the changed chapter is deleted.

Scene two - we cache the contents of the book.


The content of the book is displayed in two sections (3 and read), but they are displayed differently, you have to make two versions of the cache.
The difficulty of caching content:
1. In the read section in the case of viewing a specific chapter, the content is marked on which chapter we are currently located.
2. Section 3 may display links to load the audio version of a particular chapter, if that audio version exists.
3. In section 3, if a particular chapter is published today, a picture is displayed after it NEW.
4. In both sections of the content can be displayed bookmarks, if the reader has previously read this book and noted some places. There can be a maximum of five bookmarks in one book and one user, but each chapter can have several bookmarks (at least all five in one chapter).

Decision:
The cache will save the template content with anchors for additional information.
1. The note about being on a specific chapter is a special character that appears before the chapter name (in the future I will make it a special class for li, but nothing will change for the described algorithm). The anchor has the form {Nar}, where N is the ordinal number of the chapter. Accordingly, at the conclusion:
 //      . $cid -    ,    $_GET $contents_list = str_replace('{'.$cid.'ar}', '<span class="red">></span>', $contents_list); //      $contents_list = preg_replace('/{\d+ar}/', '', $contents_list); 

2. With audio versions, everything is simple. In the version of the cache that is intended for section 3, there are links to audio versions of the chapters, and in the version for the read section, they are not.
3. To display the picture NEW, after the title of each chapter in the template, anchors of the form {dmY} with the publication dates of each chapter are inserted, and in the output:
 //       $tmp = date('dmY'); $contents_list = str_replace('{'.$tmp.'}', ' <img src="/images/new.png">', $contents_list); //      $contents_list = preg_replace('/{\d+.\d+.\d+}/', '', $contents_list); 

4. Here, first I had to implement the caching of each user's bookmarks. Briefly: a cache is created with an array of bookmarks of a particular user in a particular book, a key of the form “u90000f123bm”, where 90000 is the user id, and 123 is the book id. If the user does not have bookmarks in this book, the cache is still created, the value is indicated by “no” (this is necessary not to request the database every time, if the cache does not contain data using the required key). The cache is created indefinitely, each time a bookmark is added or deleted, an attempt is made to delete the cache for the book that owns the bookmark being created / deleted. In the case of the existence of bookmarks, an array is stored in the cache, each entry of which contains meta-information of the bookmark and the sequence number of the chapter in the book to which the bookmark belongs.
In the content template, after the title of each chapter, an anchor of the form {Nbm} is retained, where N is the sequence number of the chapter. When withdrawing:
 $bm = LibMember_bm($id); //          $id,    ,   if($bm != 'no') { $bm = unserialize($bm); foreach($bm as $tmp) { $contents_list = str_replace('{'.$tmp['cid'].'bm}', $tmp[' '].'{'.$tmp['cid'].'bm}', $contents_list); // ,    ,         } } //     $contents_list = preg_replace('/{\d+bm}/', '', $contents_list); 


Cache reset

The content cache is created indefinitely. Need to manage manually.
The content cache is deleted when the author publishes new chapters or when old ones change (the chapter name may change).

Bye all


These two points and some other little things are for the time being and all my steps towards caching in a working project. Further will be more if the article is of interest, I will describe new techniques that will be invented for caching data in the remaining sections of the site.
Already, we managed to reduce the evening la server to 0.8-1.5 and the phpmyadmin readings to 59 requests per second and 4.6 Gb / h of network traffic (although this readings a day after the server is restarted - the traffic figure per hour will drop due to the cache filling).
Let me remind you that the initial figures were as follows:
3-4 la, and phpmyadmin told me about 89 requests per second and 48 Gb / h of network traffic

Source: https://habr.com/ru/post/211478/


All Articles