This article addresses one of the problems of storing
PHP sessions in memcached: the absence of their blocking.
Introduction
It's no secret that one of the most popular ways to improve site performance is to use memcached. This was repeatedly said and cited numerous examples. The easiest way to do this is to use memcached to store
PHP sessions. To do this, there is no need to rewrite all the code, just a few simple steps. I will not tell why it is necessary to store sessions in memcached. I will talk about why storing sessions in memcached is dangerous.
Query Counter or "Who is to blame?"
Suppose we need to calculate the number of user’s navigation through the site (in practice, this can be anything from storing the user’s travel history through the site to purchases in the
online store’s cart). Consider an example consisting of 2 files: counter.php and frameset.php:
counter.php
<? php
//ini_set('session.save_handler ',' memcache ');
//ini_set('session.save_path ',' tcp: // localhost: 11211 ');
')
session_start ();
$ _SESSION [ 'habra_counter' ] = isset ($ _ SESSION [ 'habra_counter' ])? $ _SESSION [ 'habra_counter' ]: 0;
usleep (1,000,000); // Useful work
$ _SESSION [ 'habra_counter' ] ++; // counter
usleep (1,000,000); // Useful work
echo 'Page count' . $ _SESSION [ 'habra_counter' ];
?>
frameset.php
<? php session_start (); // this is so that the cook got up?>
<form action = "" method = "post" onsubmit = "work (); return false;" >
<input type = "submit" name = "submit" value = "Work" />
</ form>
<iframe src = "" name = "iframe1" id = "idframe1" > </ iframe>
<iframe src = "" name = "iframe2" id = "idframe2" > </ iframe>
<script>
function work () {
document.getElementById ( 'idframe1' ) .src = 'counter.php? f = 1 ' + Math .random ();
document.getElementById ( 'idframe2' ) .src = 'counter.php? f = 1 ' + Math .random ();
}
</ script>
http://foldo.ru/developer/habrahabr/standard-session/frameset.phpOpen frameset.php in the browser and see: each request to counter.php increases the counter in the session by one and the counter works correctly. Now let's look at the same example, only with memcached sessions. To do this, uncomment the 2 lines at the beginning of the script.
http://foldo.ru/developer/habrahabr/memcache-session/frameset.phpWhat do we see? The counter does not work correctly. Why? Let's figure it out. Consider what happens in reality. If a session is stored in a file, when you call session_start, the file is opened, blocked, read, work with $ _SESSION is performed, after which the new value is written over the old one, the lock is removed from the file and the file is closed. At the same time, the parallel thread honestly waits for the blocking to be released and only after that it works. Unfortunately, at present there is no variable blocking in memcached, so it turns out that both streams read the same source data, process them and write them, and all changes of the first stream are irretrievably erased. The table shows an approximate scheme of work for these two cases.
+ - + ----------------------------------------- ++ --- ---------------------------------------- ++
| | Sessions on the hard disk || Sessions in memcache ||
+ - + ------------------- + --------------------- ++ --- ------------------ + --------------------- ++
| | Stream 1 | Stream 2 || Stream 1 | Stream 2 ||
+ - + ------------------- + --------------------- ++ --- ------------------ + --------------------- ++
| 1 | open file | || connect memcache | ||
| 2 | lock file | open file || read memcache 5 | connect memcache ||
| 3 | read file 5 | lock file || work 5 + 1 | read memcache 5 ||
| 4 | work 5 + 1 | lock || write memcache 6 | work 5 + 1 ||
| 5 | | write file 6 | lock || close memcache | write memcache 6 ||
| 6 | | unlock file | lock || | close memcache ||
| 7 | close file | read file 6 || | ||
| 8 | | work 6 + 1 || | ||
| 9 | | | write file 7 || | ||
| 10 | | unlock file || | ||
| 11 | | close file || | ||
+ - + ------------------- + --------------------- ++ --- ------------------ + --------------------- ++
With the question "Who is to blame?" We figured out. Let's summarize:
- There is a possibility that with the active interaction of the client and the server, part of the data will be irretrievably lost;
- The transition to the storage of sessions in memcached may be simply impossible;
- Memcached allows you to reduce request processing time;
- In sessions, it is desirable to store only data that rarely changes (for example, a user profile);
- With an increase in the number of servers, memcahed can act as a single storage of sessions.
As you can see, the storage of sessions in memcached is not only flawed.
"What to do?"
We still have only one question - “What to do?”. I will say right away that I do not have a ready-made solution, however there are two sketches on this subject. Both sketches are based on the fact that memcached
still has a way to organize a lock. The lock is based on the Memcache class's add method. About him in the documentation is written:
Returns TRUE on success or FALSE on failure. Returns FALSE if such key already exist.
So, we can organize our own view lock:
function lock ($ session_id, $ memcache)
{
$ max_iterations = 15;
$ iteration = 0;
while (! $ memcache-> add ( 'lock_' . $ session_id, ...))
{
$ iteration ++;
if ($ iteration> $ max_iterations) {
return false ;
}
usleep (1000);
}
return true ;
}
function unlock ($ session_id, $ memcache)
{
return $ memcache-> del ( 'lock_' . $ ession_id);
}
Using these two functions, we can write our
session save handler and use it, however, this will entail an additional load on the server and we will not receive any additional performance gain.
I approached the question from the other side. After analyzing my needs, I came to the conclusion that in reality I need to store only 2-3 groups of actively changing data. In this case, the data often need to read, not write. Therefore, I introduced the concept of subcession for myself. Subsession (subsession) - a virtual object that is physically located outside the session. Subsession is designed to store frequently changing data. If a change in data is necessary, the subsession is blocked, read, modified, written and unlocked. Here's how it looks from the outside:
$ this -> session-> init_subsession ( 'fupload' , $ this -> memc);
// initialization of the subsession
/ * lock * / $ this -> session-> fupload-> lock ();
// blocking the subsession
$ fupload = this -> session-> fupload-> get ();
// we get a subsession
$ fupload = is_array ($ fupload)? $ fupload: array ();
// check for correctness
$ fupload [] = $ new_data;
// add data
$ this -> session-> fupload-> set ($ fupload);
// record the session
/ * unlock * / $ this -> session-> fupload-> unlock ();
// remove the lock
If you just need to get data from the subsession, you can not block. So what does the subsession give me? Most of the code is executed in unblocked mode, so no significant delays occur. Reading data from a subsession also occurs in an unblocked form, so the lock does not work for the entire duration of the script, but only for short sections of it. Yes, this somewhat complicates the code, but, in my opinion, the advantages are obvious.
findings
Session storage in memcached is great for multi-server systems. In addition to a clear increase in performance, there is an additional increase (due to the absence of a lock). The transition to the storage of sessions in memcached is very simple, but contains pitfalls. At the design stage of the system, it is necessary to take into account the absence of a lock in memcached, and therefore it is necessary either to bypass this moment or implement the lock yourself.