MemBase is an open, distributed, persistent key-value storage optimized for storing web application data.
- persistent
- has a quasi-deterministic low response time
- high speed
- Scales linearly from one server to thousands
- no data schema (key value only)
- protocol compatible with memcached
')
Key system features:
Simplicity. All MemBase cluster servers are the same. Clone any server, attach it to the cluster and click the Rebalance button in the web interface. Nothing more needs to be done. The protocol is compatible with memcached, which means it can be immediately used on a variety of platforms. In fact, the system uses a sufficiently large part of the memcached code + code to store data on disk. Installing the server and running MemBase takes about 5 minutes
Speed. MemBase automatically distributes both data and requests for their manipulation between cluster servers. Data is replicated to ensure high availability even in the event of a server failure, and is automatically moved inside storage to memory with the appropriate access time (often used data enters the fastest memory, and least often to slow disks). The entire system architecture is optimized for the highest speed.
Flexibility. MemBase scales linearly. Servers can be added / removed from the cluster right on the fly.
Reliability. Any number of servers in the system (up to the value of the replication count, which can be changed) can refuse at any time, but the cluster will still continue its work. Even in the event that the cluster leader server fails, its replacement will be selected automatically without user intervention.
Full compatibility with memcached protocol. To communicate with membase, applications use the memcached library and the memcached protocol.
SET algorithm
When performing a key value setting operation, the following occurs:
- The application accesses the memcached API with key and value information.
- The API hashes the key and determines the cluster master server for the given key.
- The request is sent via the network to the master server of this key.
- Master server performs the operation
- Master server replicates to other servers in the cluster.
- Cache stored value in memory (like memcached)
- Data is queued for writing to the persistent store if such a request to write a key with such a value has not yet been sent.
- The application returns an answer.
Architecture
At the highest level, MemBase consists of two parts: the Data Manager and the Cluster Manager. Moreover, the presence of Cluster Manager on all the cluster machines is not necessary, although in order to exclude it, you will have to reassemble MemBase.
Data manager

MemBase DataManager uses two ports to communicate with clients. 11211 for clients that support only memcapable API version 1.0 and port 11210 for more advanced clients with memcapable API 2.0 and higher. Such clients are able to hash the keys themselves to determine their master servers. Client 1.0 keys are hashed using a piece of code called Moxi.

The MemBase data storage system is designed so that the most frequently used data is stored in the fastest memory (operational), and least often in the slow memory (regular disks). If your system has other types of storage (SSD), you can specify their priority. When determining what data should be stored where used LRU algorithm.
Cluster manager

The MemBase web interface and a REST-like interface hang on port 8080 (can be changed). The membase utility can send REST requests. Thus, you can manage the cluster from the command line.
Services running on all servers in a cluster at the same time:
Heartbeat is a monitor that periodically communicates with the cluster leader server to obtain information about the state of the entire system.
ProcessMonitor - monitors the status of the Data Manager, restarting the service in case of failures and knocks about the whole Heartbeat.
Configuration Manager — monitors the cluster configuration — a key hashing card, active replication requests, a rebalancing card, and so on.
Global Singleton Supervisor - monitors the server-leader of the cluster and participates in its re-election if anything happens.
Services running on only one cluster server at a time:
Rebalance Orchestrator - directly controls the process of cluster rebalancing
Node Health Monitor or The Doctor - collects information from the Heartbeat processes of all cluster machines, processes it and responds accordingly (for example, sending notifications)
vBucket state and replication manager - monitors replication processes in a cluster
FAQ
Q: Who is behind the membase?A:
NorthScale ,
Zynga (everyone remembers Farm Farm?) And
NHN .
Q: Is membase used on production servers?A: Zynga uses it to run servers that support FarmVille. Other major implementations are still unknown to me.