MongoDB Replication on Amazon EC2

system.indexes

Foreword
Configure Amazon EC2
Install MongoDB
Replication setup
What to read

local.abstract

In this article, I will talk about how to organize MongoDB replication based on Amazon EC2 as painlessly as possible. Undoubtedly, there is excellent documentation on how to work with Amazon EC2, and how to configure MongoDB in general, and replication in particular. But, as you know, the devil lives in trifles. And in this article, I will highlight those "little things" that most bothered me.

{step: 1, title: "Amazon EC2 configure", devilCount: 2}

Let's start from the beginning - setting up the instances.

First of all, you need to create two privacy groups: for web-instances and for database instances.
For web instances, let's open access for SSH, HTTP and HTTPS:

For db-instances we will open all the same access via SSH, plus access to port 27017 for web and db privacy groups:

')
Now we can run the instances themselves: one small-instance for the web-application, two large-and one micro-instance for the database. I chose Ubuntu Server 12 as Amazon Machine Image (AMI). Important: for MongoDB replication to work, a prerequisite is an odd number of instances. To this end, we will use the 3rd - micro instance - as an arbitrator. About what the arbitrator, I will tell below. Of course, we could just run 5, 7, or 2n + 1 large instances. But with this example, I want to show a good version of how to minimize the cost of Amazon EC2, and once again focus on the fact that there must be an odd number of instances in replication.

There is one more unobvious, but significant enough nuance - the dynamic IP addresses of the instances. Accordingly, tied to them when setting up replication is not the most ideal option. It is better for these purposes to use aliases that are configured in the / etc / hosts file. On each instance we will bring the hosts file to something like this:

127.0.0.1 db1 localhost 10.40.120.30 db1 10.40.120.31 db2 10.40.120.32 db3

Now we have ready-to-use instances.

{step: 2, title: "MongoDB install", devilCount: 1}

Let's start installing MongoDB. The installation process is perfectly described in the official MongoDB manual, so we follow its instructions clearly. Create the mongo_install.bash file and write the following script into it:

 apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | tee -a /etc/apt/sources.list.d/10gen.list apt-get -y update apt-get -y install mongodb-10gen

We execute our script:

 sudo bash ./mongo_install.bash

If everything went well, we will see the PID of the MongoDB running:

 mongodb start/running, process 2368

It now remains to run the mongod process:

 sudo service mongodb start

A little trick at last: in order not to go through such an agonizing and monotonous way of installing software on each instance, you can use the functionality of Amazon EC2 Images.

{step: 3, title: "Replication", devilCount: 2}

Here we come to the main point - setting up replication. First, in the configuration files (/etc/mongodb.conf) on all db instances, we define the replSet parameter. This parameter should contain the replication name:

 replSet = myproject

After that, restart the service:

 sudo service mongodb restart

Next, connect to the Monge team

 mongo

We initiate a remark:

 rs.initiate()

Add the second instance to our replication:

 rs.add("db2:27017")

The third instance, and this is important, is added as an arbitrator:

 rs.addArb("db3:27017")

The arbitrator does not keep his copy of the database. He does not participate in writing or reading data. Designed exclusively for voting for Primary. This is the reason for the fact that we can run the arbiter on the minimum hardware.

Let's see the current status of the replica:

 mydb:PRIMARY> rs.status() { "set" : "myproject", "date" : ISODate("2013-02-04T12:17:42Z"), "myState" : 1, "members" : [ { "_id" : 0, "name" : "db1:27017", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 1139012, "optime" : Timestamp(1359738450000, 12), "optimeDate" : ISODate("2013-02-01T17:07:30Z"), "self" : true }, { "_id" : 1, "name" : "db2:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 1138953, "optime" : Timestamp(1359738450000, 12), "optimeDate" : ISODate("2013-02-01T17:07:30Z"), "lastHeartbeat" : ISODate("2013-02-04T12:17:42Z"), "pingMs" : 0 }, { "_id" : 2, "name" : "db3:27017", "health" : 1, "state" : 2, "stateStr" : "SECONDARY", "uptime" : 442498, "optime" : Timestamp(1359738450000, 12), "optimeDate" : ISODate("2013-02-01T17:07:30Z"), "lastHeartbeat" : ISODate("2013-02-04T12:17:40Z"), "pingMs" : 0 } ], "ok" : 1 }

You can find out which field is responsible for what here: docs.mongodb.org/manual/reference/replica-status/#fields

Check the config:

 mydb:PRIMARY> rs.config() { "_id" : "myproject", "version" : 12, "members" : [ { "_id" : 0, "host" : "db1:27017" }, { "_id" : 1, "host" : "db2:27017" }, { "_id" : 2, "host" : "db3:27017", "arbiterOnly" : true } ] }

We see that everything looks exactly as we conceived. Hooray!

And for dessert, one more thing. In the settings of the replica, each participant, among others, has a priority property. By default, it is 1 and, as the default value, is not displayed in the config. This value affects the likelihood that a member will be elected Primary. Let's make db1 be guaranteed Primary (well, for example, it has more memory):

 config = rs.config() config.members[0].priority = 2 rs.reconfig(config)

Now the config will look like this:

 mydb:PRIMARY> rs.config() { "_id" : "myproject", "version" : 12, "members" : [ { "_id" : 0, "host" : "db1:27017", "priority" : 2 }, { "_id" : 1, "host" : "db2:27017" }, { "_id" : 2, "host" : "db3:27017", "arbiterOnly" : true } ] }

A very important note from StamPit :
Problem: The size of the oplog is not specified. For 64bit systems, the default is 5% of the available disk space, but not less than 1Gb. If the disk is large and the average activity of insert / update is to limit the size in the config, 2000Mb is enough:

 oplogSize = 2000

If there is not much data, and the number of insert / update is not too large, you can slightly reduce disk activity as follows:
1) Disable preallocate noprealloc = true
2) Reduce the size of the files (both data files and log files will decrease) smallfiles = true

Perhaps this and all that is needed in order to set up replication Mongi based on Amazon EC2. If necessary, you can easily add new instances to this configuration.

local.links

Amazon EC2 aws.amazon.com/documentation/ec2
MongoDb Installation docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu
MongoDB docs.mongodb.org/manual/tutorial/getting-started
MongoDb Replication docs.mongodb.org/manual/replication
MongoDb Replica Set Arbiters docs.mongodb.org/manual/administration/replica-sets/#replica-set-arbiters
MongoDb Replica Set Configuration docs.mongodb.org/manual/reference/replica-configuration

Source: https://habr.com/ru/post/168691/

All Articles