📜 ⬆️ ⬇️

MongoDB Sharded Cluster on Centos 6.5

In this article, we will only look at the MongoDB configuration, without affecting how to connect the Mongi repository and install packages into the system.

MongoDB distributed cluster consists of the following components:

Shard
A shard is a mongod instance that stores some of the data in a shard collection. For use in production, each shard must be a replica set (replicaSet).
')
Configuration Server
It is also the mongod instance that stores the cluster metadata. Metadata indicates which data is stored on which shard.

Routing server
Instance mongos. Its task is to route requests from applications to shards.
Below is a diagram of the operation of the shardirovannogo MongoDB cluster



It is most convenient to group the necessary roles as follows:


Suppose we have 3 servers to create these roles:


Configuration Server Configuration

In order for mongod to work as a configuration server, we give /etc/mongod.conf to the following form:
logpath=/var/log/mongodb/mongod.log logappend=true fork=true dbpath=/opt/mongocfg pidfilepath=/var/run/mongodb/mongod.pid bind_ip=<lo ip>,<eth ip> configsvr=false 


Then we start the service
# service mongod start


Configure Routing Server

Before proceeding to setting up a routing server, you need to make sure that the mongodb-org-mongos package is installed on the system.
# rpm -qa | grep mongos
mongodb-org-mongos-2.6.2-1.x86_64


First, create a configuration file for the mongos /etc/mongos.conf service and bring it to the following form:
 configdb=mongos01:27019,mongos02:27019,mongos03:27019 # Mongo config servers addresses port = 27017 logpath = /var/log/mongodb/mongos.log logappend = true fork = true bind_ip=<lo ip>,<eth ip> verbose = false 


Mongo did not include mongos in its init script, so we’ll create it

 cat > /etc/init.d/mongos << TheEnd #!/bin/bash # mongos - Startup script for mongos # chkconfig: 35 85 15 # description: Mongo Router Process for sharding # processname: mongos # config: /etc/mongos.conf # pidfile: /var/run/mongos.pid . /etc/rc.d/init.d/functions # mongos will read mongos.conf for configuration settings # Add variable to support multiple instances of mongos # The instance name is by default the name of this init script # In this way another instance can be created by just copying this init script # and creating a config file with the same name and a .conf extension # For Example: # /etc/init.d/mongos2 # /etc/mongos2.conf # Optionally also create a sysconfig file to override env variables below # /etc/sysconfig/mongos2 INSTANCE=`basename $0` # By default OPTIONS just points to the /etc/mongod.conf config file # This can be overriden in /etc/sysconfig/mongod OPTIONS=" -f /etc/${INSTANCE}.conf" PID_PATH=/var/run/mongo PID_FILE=${PID_PATH}/${INSTANCE}.pid MONGO_BIN=/usr/bin/mongos MONGO_USER=mongod MONGO_GROUP=mongod MONGO_ULIMIT=12000 MONGO_LOCK_FILE=/var/lock/subsys/${INSTANCE} # Source sysconfig options so that above values can be overriden SYSCONFIG="/etc/sysconfig/${INSTANCE}" if [ -f "$SYSCONFIG" ]; then . "$SYSCONFIG" || true fi # Create mongo pids path if it does not exist if [ ! -d "${PID_PATH}" ]; then mkdir -p "${PID_PATH}" chown "${MONGO_USER}:${MONGO_GROUP}" "${PID_PATH}" fi start() { echo -n $"Starting ${INSTANCE}: " daemon --user "$MONGO_USER" --pidfile $PID_FILE $MONGO_BIN $OPTIONS --pidfilepath=$PID_FILE RETVAL=$? echo [ $RETVAL -eq 0 ] && touch $MONGO_LOCK_FILE return $RETVAL } stop() { echo -n $"Stopping ${INSTANCE}: " killproc -p $PID_FILE -t30 -TERM $MONGO_BIN RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f $MONGO_LOCK_FILE [ $RETVAL -eq 0 ] && rm -f $PID_FILE return $RETVAL } restart () { stop start } ulimit -n $MONGO_ULIMIT RETVAL=0 case "$1" in start) start ;; stop) stop ;; restart|reload|force-reload) restart ;; condrestart) [ -f $MONGO_LOCK_FILE ] && restart || : ;; status) status -p $PID_FILE $MONGO_BIN RETVAL=$? ;; *) echo "Usage: $0 {start|stop|status|restart|reload|force-reload|condrestart}" RETVAL=1 esac exit $RETVAL TheEnd 


Let's make it executable
chmod + x /etc/init.d/mongos

Now you can run
service mongos start

And do not forget
 # chkconfig mongod on # chkconfig mongos on 


Now you need to repeat these steps on the other servers.

Setting shards

The first thing to remember when setting up shards for a production environment is that each shard is a replica set.
You can read more about replication in MongoDB in the official documentation.
We will not dwell on this in detail, but proceed to set up.

We will have 4 servers:


Suppose that all four servers have already installed the system and installed mongodb
In /etc/mongodb.conf on mongo01-rs01 and mongo02-rs01, you must specify a name for the replica set that this shard will use
 replSet=rs01 

Save and run mongod.
Next, go to the mongo console on the server that we plan to make Master
# mongo

And initialize the replica set
> rs.initiate ()

To make sure that the replica set is initialized, let's see its config.
rs01: PRIMARY> rs.conf ()

The output should show something like this:
 { "_id" : "rs01", "version" : 7, "members" : [ { "_id" : 0, "host" : "mongo01-rs01:27017" } ] } 


Next we add our second server to this set.
rs01: PRIMARY> rs.add ("mongo02-rs01")

And check the config
rs01: PRIMARY> rs.conf ()

 { "_id" : "rs01", "version" : 7, "members" : [ { "_id" : 0, "host" : "mongo01-rs01:27017" }, { "_id" : 1, "host" : "mongo02-rs01:27017", } ] } 


To increase the resiliency of MongoDB, it is recommended that the number of machines in the set is not even.
Since we do not want to create another copy of the data, we can create an Arbitrator

An arbitrator is a mongod instance that is a member of the replica set but does not store any data. He is involved in the selection of a new master.
About how the elections are arranged, it is written in great detail in the official documentation.
In order not to allocate a separate machine for it - we will use one of the previously created ones - mongos01
As we remember, there by the service mongod start an instance of mongod is started which is the configuration server.
In order not to run the arbiter by hand, we must do an init script for it
 cat > /etc/init.d/mongo-rs01-arb << TheEnd #!/bin/bash # mongod - Startup script for mongod # chkconfig: 35 85 15 # description: Mongo is a scalable, document-oriented database. # processname: mongod # config: /etc/mongod.conf # pidfile: /var/run/mongodb/mongod.pid . /etc/rc.d/init.d/functions # things from mongod.conf get there by mongod reading it # NOTE: if you change any OPTIONS here, you get what you pay for: # this script assumes all options are in the config file. CONFIGFILE="/etc/mongod-rs01-arb.conf" OPTIONS=" -f $CONFIGFILE" SYSCONFIG="/etc/sysconfig/mongod-rs01-arb" # FIXME: 1.9.x has a --shutdown flag that parses the config file and # shuts down the correct running pid, but that's unavailable in 1.8 # for now. This can go away when this script stops supporting 1.8. DBPATH=`awk -F= '/^dbpath[[:blank:]]*=[[:blank:]]*/{print $2}' "$CONFIGFILE"` PIDFILE=`awk -F= '/^pidfilepath[[:blank:]]*=[[:blank:]]*/{print $2}' "$CONFIGFILE"` mongod=${MONGOD-/usr/bin/mongod} MONGO_USER=mongod MONGO_GROUP=mongod if [ -f "$SYSCONFIG" ]; then . "$SYSCONFIG" fi # Handle NUMA access to CPUs (SERVER-3574) # This verifies the existence of numactl as well as testing that the command works NUMACTL_ARGS="--interleave=all" if which numactl >/dev/null 2>/dev/null && numactl $NUMACTL_ARGS ls / >/dev/null 2>/dev/null then NUMACTL="numactl $NUMACTL_ARGS" else NUMACTL="" fi start() { # Recommended ulimit values for mongod or mongos # See http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings # ulimit -f unlimited ulimit -t unlimited ulimit -v unlimited ulimit -n 64000 ulimit -m unlimited ulimit -u 32000 echo -n $"Starting mongod: " daemon --user "$MONGO_USER" "$NUMACTL $mongod $OPTIONS >/dev/null 2>&1" RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/mongod-rs01-arb } stop() { echo -n $"Stopping mongod: " killproc -p "$PIDFILE" -d 300 /usr/bin/mongod RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/mongod-rs01-arb } restart () { stop start } RETVAL=0 case "$1" in start) start ;; stop) stop ;; restart|reload|force-reload) restart ;; condrestart) [ -f /var/lock/subsys/mongod ] && restart || : ;; status) status $mongod RETVAL=$? ;; *) echo "Usage: $0 {start|stop|status|restart|reload|force-reload|condrestart}" RETVAL=1 esac exit $RETVAL TheEnd 


Making it executable
# chmod + x /etc/init.d/mongo-rs01-arb

Create a BaseDir for it and a configuration file.
# mkdir / opt / mongo-rs01-arb; chown mongod: mongod / opt / mongo-rs01-arb
# cp -av /etc/mongod.conf /etc/mongod-rs01-arb.conf

Next, in the /etc/mongod-rs01-arb.conf file , edit the following lines
 port=27020 dbpath=/opt/mongo-rs01-arb pidfilepath=/var/run/mongodb/mongod-rs01-arb.pid 

And delete / comment the line
 configsvr=true 

Save the file and run the service.
# service mongo-rs01-arb start

Next, we return to our Master for rs01, and in the mongo console add the arbitrator to the replica set
> rs.addArb ("mongos01: 27020")


Checking the config
rs01: PRIMARY> rs.conf ()

 { "_id" : "rs01", "version" : 7, "members" : [ { "_id" : 0, "host" : "mongo01-rs01:27017" }, { "_id" : 1, "host" : "mongo02-rs01:27017", }, { "_id" : 2, "host" : "mongos01:27020", "arbiterOnly" : true } ] } 


Repeat this procedure with the remaining two servers under the second set of replicas which will be the second shard in our cluster (mongo01-rs02 and mongo02-rs02)

And so, we created 2 sets of replicas, which now need to be added to our distributed cluster.
To do this, go to mongos01 and go to the console mongo (It should be remembered that, in this case, we connect to the service mongos)
> sh.addShard ("rs01 // mongo01-rs01: 27017, mongo02-rs01: 27017")
> sh.addShard ("rs02 / mongo01-rs02: 27017, mongo02-rs02: 27017")

Checking:
> sh.status ()

The output should contain the following lines:
  shards: { "_id" : "rs01", "host" : "rs01/mongo01-rs01:27017,mongo02-rs01:27017" } { "_id" : "rs02", "host" : "rs02/mongo01-rs02:27017,mongo02-rs02:27017" } 


This means that 2 shards have been successfully added to our cluster.

Now we will add a base to our distributed cluster which we will shard.
In our case, this will be the base containing the GridFS file system.
> use filestore
> sh.enableSharding ("filestore")
> sh.shardCollection (“filestore.fs.chunks”, {files_id: 1, n: 1})

Checking status
> sh.status ()

The output should be something like this:
  shards: { "_id" : "rs01", "host" : "rs01/mongo01-rs01:27017,mongo02-rs01:27017" } { "_id" : "rs02", "host" : "rs02/mongo01-rs02:27017,mongo02-rs02:27017"} databases: { "_id" : "admin", "partitioned" : false, "primary" : "config" } { "_id" : "test", "partitioned" : false, "primary" : "rs02" } { "_id" : "filestore", "partitioned" : true, "primary" : "rs01" } filestore.fs.chunks shard key: { "files_id" : 1, "n" : 1 } chunks: rs01 1363 rs02 103 too many chunks to print, use verbose if you want to force print 


That's all, now we can use our distributed GridFS in the application by accessing mongos instances

PS: about errors and inaccuracies, please write in a personal,

Source: https://habr.com/ru/post/227395/


All Articles