MongoDB is a great NoSQL database management system. But using a single standalone MongoDB server is not the way you want to use it in the production level. In case you want to scale out and increase the throughput of your application you can use sharding to distribute your data in multiple clusters, called shards. Furthermore, MongoDB can support continuous availability and fault tolerance by using replication servers that form a replica set. A replica set not only provides us a mechanism for fault tolerance and availability but a way for safely and continuously backing up our data. The latency of our database queries can be improved by using primary and secondary indexes.

In this article I will present to you how to configure and set up a sharded MongoDB system that supports also data replication. Each shard is actually a replica set of 3 nodes. All MongoDB servers are about to run in the same system and each one will use the localhost address. In a more realistic scenario you should replace these hosts to your needs and assure that connectivity between your servers or data centers (in case you want to separate your MongoDB servers in different data centers) is enabled. I don’t use any arbiter replica set members for voting and elections. All members are real MongoDB servers which are about to replicate data and also be able to work either as primary or as secondary whenever is needed.

The majority of each replica set is 2 (one more than the half nodes in each replica set N/2+1) which means that a primary member can continue to work only when it sees a majority of members. Also, a write to be safe needs to be replicated in the majority of members and also a secondary to be a primary needs to see a majority of members. The majority number is very crucial in replica sets and in general in clustered systems.

Before showing you how you could set up things see first the files tree structure:

+---mongo-binaries
|       bsondump.exe
|       libeay32.dll
|       mongo.exe
|       mongod.exe
|       mongod.pdb
|       mongodump.exe
|       mongoexport.exe
|       mongofiles.exe
|       mongoimport.exe
|       mongooplog.exe
|       mongoperf.exe
|       mongorestore.exe
|       mongos.exe
|       mongos.pdb
|       mongostat.exe
|       mongotop.exe
|       ssleay32.dll
+---shard-configuration-servers
|   +---config0
|   |   +---configuration
|   |   |       mongodb.conf
|   |   +---data
|   |   |   \---db
|   |   \---logs
|   +---config1
|   |   +---configuration
|   |   |       mongodb.conf
|   |   +---data
|   |   |   \---db
|   |   \---logs
|   \---config2
|       +---configuration
|       |       mongodb.conf
|       +---data
|       |   \---db
|       \---logs
\---shard-servers
    +---shard0
    |   +---replica0
    |   |   +---configuration
    |   |   |       mongodb.conf
    |   |   +---data
    |   |   |   \---db
    |   |   \---logs
    |   +---replica1
    |   |   +---configuration
    |   |   |       mongodb.conf
    |   |   +---data
    |   |   |   \---db
    |   |   \---logs
    |   \---replica2
    |       +---configuration
    |       |       mongodb.conf
    |       +---data
    |       |   \---db
    |       \---logs
    +---shard1
    |   +---replica0
    |   |   +---configuration
    |   |   |       mongodb.conf
    |   |   +---data
    |   |   |   \---db
    |   |   \---logs
    |   +---replica1
    |   |   +---configuration
    |   |   |       mongodb.conf
    |   |   +---data
    |   |   |   \---db
    |   |   \---logs
    |   \---replica2
    |       +---configuration
    |       |       mongodb.conf
    |       +---data
    |       |   \---db
    |       \---logs
    \---shard2
        +---replica0
        |   +---configuration
        |   |       mongodb.conf
        |   +---data
        |   |   \---db
        |   \---logs
        +---replica1
        |   +---configuration
        |   |       mongodb.conf
        |   +---data
        |   |   \---db
        |   \---logs
        \---replica2
            +---configuration
            |       mongodb.conf
            +---data
            |   \---db
            \---logs

So, try to create the same tree structure in a directory in your filesystem. Inside the mongo-binaries directory copy the MongoDB binaries from your installation (this directory contains the mongod, mongos and other executables, libraries).

You will see that shard0, shard1 and shard2 directories exist and each one contains replica0, replica1 and replica2 directories. That means that 9 standalone MongoDB servers will run only for the data. Each replicaX directory has the data directory for the databases, a logs directory for the log files and a configuration directory containing the configuration file mongodb.conf of the server.

Also, you will see the config0, config1 and config2 directories with the same tree structure. That means that 3 configuration standalone MongoDB servers will run. These servers are not used for storing the data but the sharding metadata so that the mongos routing service can know where to route the database queries. The configuration servers also form a replica set (but not a shard).

Now, let’s continue with the contents of the mongodb.conf files.

The contents of the mongodb.conf of config0 configuration server is:

quiet = false
dbpath = shard-configuration-servers\config0\data\db
logpath = shard-configuration-servers\config0\logs\mongod.log
logappend = true
journal = true
smallfiles = true
oplogSize = 100
configsvr = true

The dbpath and logpath settings should be changed appropriately for the config1 and config2.

The contents of the shard0\replica0 data server is:

quiet = false
dbpath = shard-servers\shard0\replica0\data\db
logpath = shard-servers\shard0\replica0\logs\mongod.log
logappend = true
journal = true
shardsvr = true
smallfiles = true
oplogSize = 100

The dbpath and logpath settings should be changed appropriately for the rest replicaX members.

In order to initialize the sharded system you may use the script initialize-sharded-system.bat.

The above script does the following things:

  1. Starts 3 sharded servers in localhost:37017/37018/37019 with replica set shard0.
  2. Starts 3 sharded servers in localhost:47017/47018/47019 with replica set shard1.
  3. Starts 3 sharded servers in localhost:57017/57018/57019 with replica set shard1.
  4. Configures the first member of each shardX replica set.
  5. Starts 3 config servers in localhost:57060/57061/57062 with replica set configReplSet.
  6. Configures the first member of the configReplSet replica set.
  7. Starts the mongos server in localhost:27017 and connects it with the config servers.
  8. Sends the shards configuration to the mongos server.

The only thing you have to do now is to inform your mongos which collections you want to shard.

For example, if you want to shard a students collection from a school database, enable the sharding of the collection and also define the shard key. As shard key let’s use the email of students. You can do this with the following commands (connect to mongos and run them).

db.adminCommand({enableSharding:'school'});

db.adminCommand({shardCollection:'school.students', key: {email: 1}});

The second command specifies the shard key in students collection and also creates an index in the field. Sharding needs to have an index for each shard key in order to work. Now, you can run some insert and find queries to see how it goes. Note that each student should have an email field. This is a necessity so that sharding can work. Also, you can check the log files for more information.

Lastly, you can find the source code in http://github.com/efstathios-chatzikyriakidis/mongodb-sharded-cluster-example

That’s all folks!