Preamble
In connection with the weekend, I spent some time implementing the Memcache server using the Twisted python framework. As a result, I received a speed twice as low, which I do not consider very critical, as well as the ability to implement a couple of extensions of the original protocol. Optimizations are also possible, which will further improve performance.
The protocol was not fully implemented - there are still points that can be worked on, but the standard set / get is quite efficient and ready to use.
Facilities
To store the cache, we use the base class dict. As you might guess, the implementation of dict in python is fast, this basic type is used in python so actively that it was not left without detailed optimization. Thus, we automatically have a structure for storing the cache in memory. It remains to implement the memcache protocol, to provide access to dict to other programs.
To implement the server use Twisted. There are many variations of non-blocking IO for python today, but Twisted is already a classic, and has enough tools in its arsenal to easily solve such problems.
')
Network Protocol Implementation
How are the protocols implemented? First of all, you certainly need to find a description of the protocol. I found it here -
code.sixapart.com/svn/memcached/trunk/server/doc/protocol.txt
After reading the protocol, it becomes clear that we will receive one or two lines from the client, and we can safely divide the first line into elements by spaces. The second line is used in commands that send data to the server - set, add, replace, etc. If you would like to get into the article in more detail, then I will send you to read the description on your own, the purpose of posting his translation here was not
Armed with this knowledge, we look at what Twisted can offer us to solve this problem, and immediately find LineOnlyReceiver — the protocol from Twisted's basic delivery, which works only with protocols that exchange strings, that is what we need. Even in Twisted, there is already an implementation of the memcache protocol, but only its client part, and we are making the server.
1 class MemcacheProtocol (LineOnlyReceiver):
2 "" "
3 Implements a protocol basis - receiving messages from a client
4 and return results.
five "" "
6
7 def lineReceived (self, line):
8 debug (repr (line))
9 if not ' parameters ' in self.instruction:
10 parameters = line.split (
' ' )
11 debug (
" Got new command " + parameters [0])
12 self.instruction [
' parameters ' ] = parameters
13
14 # If data is not expected, then to execution
15 if parameters [0]
in Cache.oneline_commands:
16 self.process ()
17 else :
18 # Received data for two-line command, for execution
19 debug (
" Got data " + line)
20 self.instruction [
' data ' ] = line
21 self.process ()
22
23 def process (self):
24 # Cache.call returns generator
25 for line
in Cache.call (self.instruction):
26 # And we send everything that it generates in separate lines
27 debug (
" Send line " + line)
28 self.sendLine (line)
29 # Ready for further instructions, cashier!
30 self.instruction = {}
31
32 def connectionMade (self):
33 debug (
" Connected! " )
34 self.instruction = {}
As can be seen from the code, Cache is used for the actual work. This singleton is essentially just a class whose methods are wrapped by the decorator @classmethod. Calling Cache.call should return a generator that will be returned by strings, which, in turn, our implementation of the protocol will give to the client.
Parse the request from the client
The first line is a command and parameters separated by spaces, so we use the string split method, and the output is a list. Next, it must be disassembled into components, before the command begins to work with the data. I use the class, as I like the perspective to access the parameters, pointing them through the dot. The code below already requires reading the protocol description, and for a lazy pair of suggestive lines:
Data Commands:
<command name> <key> <flags> <exptime> <bytes> [noreply] \ r \ n
cas <key> <flags> <exptime> <bytes> <cas unqiue> [noreply] \ r \ n
Receiving data:
get <key> * \ r \ n
gets <key> * \ r \ n
delete <key> \ r \ n
Well, and the like.
Implementation parsing:
1 class Instruction (object):
2 def __init__ (self, i):
3 p = i [
' parameters ' ]
4 self.cmd = p.pop (0)
five
6 # Check noreply
7 if p [-1] ==
' noreply ' :
8 self.reply = False
9 # Throw it out
10 p.pop (-1)
11 else :
12 self.reply = True
13
14 if self.cmd
in Cache.storage_commands:
15 # If CAS, then there is one more parameter (i.e., a special case)
16 if self.cmd ==
" cas " :
17 self.unique = p.pop (-1)
18
19 # Now all parameters are unambiguous, but we want to expand the protocol,
20 # because everything is not as simple as dict (zip ())
21 self.bytes = p.pop (-1)
22 self.exptime = p.pop (-1)
23 self.flags = p.pop (-1)
24 self.keys = p
25 self.data = i.get (
' data ' , None)
26
27 # get and gets
28 elif self.cmd
in [
" get " ,
" gets " ,
" getn " ,
" delete " ]:
29 self.keys = p
thirty
31 def __str__ (self):
32 return str (self .__ dict__)
Implementing cache storage and working with it
The protocol was immediately expanded by me, namely, there is the possibility of working with embedded data. The cache is converted into a tree, and all operations that, according to the standard, indicate one key, can indicate a list of keys separated by spaces. But it is easy to get rid of this, but then the meaning of the work will be completely unclear.
As a storage unit, the Entry class is implemented, which contains a dictionary (childs of type dict) with child Entry instances. Moreover, the top of the hierarchy is also an instance of the Entry class.
Here I will give a fragment of the singleton Cache:
1 class Cache (object):
2 # consts
3 storage_commands = [
" set " ,
" add " ,
" replace " ,
" append " ,
" prepend " ,
" cas " ]
4 oneline_commands = [
" get " ,
" gets " ,
" getn " ,
" delete " ,
" incr " ,
" decr " ,
" stats " ]
five
6 # cache storage
7 data = Entry (0,0,0)
eight
9 # cache operations
10 @ classmethod
11 def call (cls, instruction):
12 i = Instruction (instruction)
13 debug (i)
14 command = getattr (cls, i.cmd)
15 return command (i)
sixteen
17 @ classmethod
18 def set (cls, i):
19 " set, support for nested keys "
20 parent = cls.data.get_child (i.keys [: - 1])
21 if parent:
22 parent.set_child (i.keys [-1], Entry (i.data, i.flags, i.exptime))
23 yield " STORED "
24 else :
25 yield " NOT_STORED "
Entry code and everything else we look at here -
github.com/Deepwalker/tx-cache/blob/master/mck.py
Amendments
As was rightly noted in the comments, the use of a singleton for storing Cache is somewhat unjustified, therefore, on github lies the corrected version, which fixes this annoying flaw. Thank you,
drJonnie !
Excuse
What do I expect from this article? I expect that clever people, of whom there are many, will look at my code and stick their nose at the shortcomings of the lack of academic education. Perhaps someone this article will be useful, with possible shortcomings, it describes the path to the implementation of network protocols in your programs. Someone may use this code for their memcache extensions.