Cassandra - a distributed database based on the ideas of
Dynamo and
Google Big Table
Overview of the main features:
- Tested in use (used on Facebook, Twitter, Digg and others)
- Failsafe (each record is duplicated on several nodes in the cluster)
- Decentralized (all nodes in the cluster are equivalent)
- Flexible data model
- The read and write throughput increases linearly with the addition of new nodes.
- Dimensionless (no limit on volume, search is always O (1))
Preparing to install
It is convenient to begin acquaintance with Cassandra with ruby, I assume that you have it installed. Now we install the eponymous ruby gem, which will simplify the assembly of Cassandra
')
sudo gem install cassandra --source http://gemcutter.org
Together with other gems, thrift will also be established for you (roughly speaking, it is an intermediary between different languages).
In cassandra-gem there are convenient rake tasks that will quickly help you build the latest version of cassandra, but they are not available to the system as a whole, but only inside the cassandra-gem folder. To quickly access it, I made a link to my home directory:
ln -s /usr/lib/ruby/gems/1.8/gems/cassandra-0.8.0/ ~/cassandra_gem
You can see where the installed gems are located using the
gem environment
command, in my case this is /usr/lib/ruby/gems/1.8. Go to the
cd cassandra_gem
directory, and here I recommend to get acquainted with the list of everything that can be done using rake:
rake -T
. Perhaps here you will need to install another gem echoe.
Install cassandra
Now we will collect a database. You can do this with the command
rake cassandra
Here I had some incident that ended in error.
tar xzf apache-cassandra-0.6.0-beta2-bin.tar.gz
tar: apache-cassandra-0.6.0-beta2-bin.tar.gz: Cannot open: No such file or directory
This usually should not be, but a new beta3 version of cassandra has been released, and the gem has not yet been updated, so it is trying to download beta2, which is no longer there. In general, this will be quickly corrected, but if you stumbled upon this, then
you can download beta2 for this version of gem (0.8). Next we create the folder cassandra in the home directory and move the downloaded file apache-cassandra-0.6.0-beta2-bin.tar.gz there. Run the command again:
rake cassandra
To build cassandra, you need ant and ivy-retrieve. Just in case, I copied my compilation output, which can be viewed
here . It will automatically start the database, but first you need to configure the paths for storing data, otherwise it will not work. Default paths require superuser privileges.
Configuring configuration files
Create folders for data storage:
cd ~
mkdir cassandra_data
cd !$
mkdir bootstrap callouts commitlog data staging
touch system.log
Configuration files are stored in two directories: ~ / cassandra / server / conf and ~ / cassandra_gem / conf. Since we are going to run cassandra using gem, the configuration files are loaded from there, therefore, they need to be edited.
Change the logger'a settings:
sudo gedit ~/cassandra_gem/conf/log4j.properties
Row
log4j.appender.R.File=data/logs/system.log
replace with
log4j.appender.R.File=~/cassandra_data/logs/system.log
In production, this logger is disabled.
Configuring storage-conf.xml:
Modify the Directories section in the same way (use absolute paths). Now Cassandra should start successfully.
Setting the data structure
The data structure is also defined in storage-conf.xml.
For understanding, we will use terminology from relational databases. Roughly speaking, the data structure in Cassandra is a hash.
Keyspaces - a list of all databases
Keyspace - a database that contains tables
ColumnFamily - a table with the ability to create "columns" online
CompareWith specifies how keys are organized. Usually they are sorted alphabetically (CompareWith = "UTF8Type") or by time (CompareWith = "TimeUUIDType")
ColumnFamily can be of two types: regular and Super. Super means that each column stores any number of other columns. This type is suitable for the model when the post contains a lot of comments. Naturally, in the SuperColumn, you can organize both the keys themselves and other sub-columns (CompareSubcolumnsWith). For example, we want comments to be ordered by time, then CompareSubcolumnsWith = "TimeUUIDType".
Sooner or later you will come across a term such as consistency level. This is a kind of trust in the system, that is, if during the recording you want to wait until cassandra writes data to several nodes (this is configured), then the consistency level should be as high as possible. If you just need to wait for the recording on at least one machine, then set 1. The level of the value 0 - send the data to the recording and forget, without waiting for it to be recorded. The latter, of course, the fastest, the first - the longest. Usually in production put 0.
Attempt at writing
By default, several Keyspaces (Twitter, Multiblog, MultiblogLong, maybe CassandraObject) are configured in storage-conf, which we will use. Go to the ruby console using the irb command:
irb (main): 001: 0> require 'cassandra'
=> true
irb (main): 002: 0> client = Cassandra.new 'Twitter', 'localhost: 9160'
=> # <Cassandra: 69944556734180, @ keyspace = "Twitter", @ schema = {}, @servers = ["localhost: 9160"]>
irb (main): 003: 0> client.insert: Users, 'user_name@web.com', {'screen_name' => 'Suvo'}
=> nil
irb (main): 004: 0> suvo = client.get: Users, 'user_name@web.com'
=> # <OrderedHash {"screen_name" => "Suvo"}>
irb (main): 005: 0> suvo ['screen_name']
=> "Suvo"
user_name@web.com
has a key.
I note that the client.insert command does not return anything. Writing to Cassandra is always successful. Now let's try to update the record and add another field:
irb (main): 006: 0> client.insert: Users, 'user_name@web.com', {'status' => 'Hello world!'}
=> nil
irb (main): 007: 0> client.get: Users, 'user_name@web.com'
=> # <OrderedHash {"status" => "Hello world!", "Screen_name" => "Suvo"}>
Another important query (besides get and insert) that is available to you is slice_range. Using this query, you can get values in a specific key range.
So, we succeeded (I hope) to install cassandra, write and read test data.
Next, I would like to share with you what may be disappointing in Cassandra, what has been done and what I have done for Cassandra in Ruby on Rails, but the article did not fit completely. You can continue reading in
here .