Deploying MongoDB Replica Set on Google Cloud Platform

Deploying MongoDB Replica Set on Google Cloud Platform

What is Replication?

Synchronizing the same set of data across multiple servers is a common practice that is followed to ensure the availability of data across various servers. Data Replication is the process of storing data in more than one site or node.

Why do you need Replication?

Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.

MongoDB Replica Set

MongoDB Replica Set is a group of MongoDB processes known as mongod instances that basically host the same data set. It is featured by one primary node, several secondary nodes for bearing data and optionally one arbiter node.

You can learn more about MongoDB Replica Set here

Deploying on Google Cloud Platform (GCP)

Let's begin with the deployment. We will be using Google Cloud's Compute Engine for creating VM Instances and the official MongoDB server binaries. We will be deploying 3 nodes (Primary + Secondary + Arbiter) on Compute Engine.

MongoDB Primary-Secodary-Arbiter architecture Picture Credits: docs.mongodb.com

Prerequisites

  • Basic knowledge of Linux Terminal and DevOps
  • Google Cloud Project: GCP provides a free 300$ trial for everyone signing up the first time
  • Domain Name control panel: This is optional but recommended for easily connecting to the Replica Set from public networks.

The Metal

We need a minimum of 3 VM instances for Replica Set. Two of the VM's serve as Primary and Secondary nodes for data replication and third is the arbiter node. It is up to you to set the capacity for the VM's instances. For arbiter node, you can save a few bucks by setting low capacity (even GCP's free f1-micro instance gets the job done).

This article demonstrates deployment process for Primary + Secondary + Arbiter nodes but you can use the same process for deploying another Secondary node instead of Arbiter.

MongoDB's default port

MongoDB defaults to 27017 port for connection. For our VM's to open this port we need to create a Firewall Rule.

In GCP console, Go to Firewall Rule under VPC Network and create a mongo-default-port rule with the following config.

GCP-Mongo-Port.jpg

Primary and Secondary Node

We are now ready to launch our Virtual Machines. In this PSA configuration, it is recommended to keep Primary and Secondary nodes of the same config, as in any point of time these two nodes can interchange Primary and Secondary positions. For the Arbiter node, you can opt-in for a low resource VM. In regards to Storage, it is recommended to use SSD. For more information about Hardware considerations read this

Here's an example VM for one of the primary nodes.

GCP-Mongo-VM.jpg

  • You can choose any Linux distro but Debian is recommended for its stability and bloatware-free environment.
  • Make sure you untick HTTP/HTTPS ports (unless you have a specific use case) and add mongo-default-port network tag which we created in the last step.
  • When working with stateful VM's always attach an external disk to store your data, which will allow you to retain the data when upgrading/migrating VM's and have a backup service like GCP Snapshot Schedule.

If you want to configure your VM for performance, now is a good time. One of the recommended steps for Linux server is Disable Transparent Huge Pages (THP). You can find a few more performance tweaks on the internet but be careful because every linux distro behaves differently.

Setting up MongoDB Replicas

You need to replicate the following process on all 3 VM's

Install MongoDB

Here's the official guide on installing MongoDB on Linux. Feel free to use any other community guides as long as it's downloading binaries from the official source.

Database path (Optional)

By default, MongoDB will store the data in /var/lib/mongodb. If you wish to change the directory or if you are using external disk then you need to create a directory for that path.

sudo mkdir -p /mnt/disks/persistent/db
sudo chmod 777 /mnt/disks/persistent/db

Authentication

We will be using keyfile authentication for this guide, but for a Production environment, you should consider using more secure x.509 certificates.

The keyfile contains a password/secret in plain-text. You can choose any password or generate a complex random string to store in the keyfile. We are placing the key inside /var/tmp so it is accessible to all users.

Use nano/vim to create the keyfile and type in your password/secret text.

nano /var/tmp/mongo-keyfile

Assign permission and ownership to keyfile:

sudo chmod 400 /var/tmp/mongo-keyfile
sudo chown -R mongodb:mongodb /var/tmp/mongo-keyfile

Configuration

MongoDB uses /etc/mongod.conf as startup configuration for mongod process. We can configure that file to support our Replica Set. You can learn more about mongod.conf here.

Below is the configuration file we will use in this guide.

# mongod.conf

# Storage: Where and how to store data
# Default dbPath: /var/lib/mongodb
storage:
  dbPath: <path-to-database-dir>
  journal:
    enabled: true

# Authorization with keyfile
security:
  keyFile: <path-to-keyfile>

# Logging
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Network Interfaces
# Allows access from all IP address
net:
  port: 27017
  bindIp: 0.0.0.0
  bindIpAll: true

# Replication config
replication:
   replSetName: dev-rs0
   enableMajorityReadConcern: false

# How the process runs
processManagement:
  timeZoneInfo: /usr/share/zoneinfo

# Cloud config for enabling free monitoring
cloud:
   monitoring:
      free:
         state: on
         tags: [dev-node-0]

Set storage.dbPath, security.keyFile and replication.replSetName for your mongod.conf.

Now, restart your MongoDB deamon (mongod) to let it start with latest mongod.conf.

sudo service mongod restart

Verify that the deamon has started successfully.

sudo service mongod status

If the mongod process crashes then check out the debugging section at the bottom of the article.

Networking

VM Mapping

We will be connecting our Replica Set with hostnames, this allows us to connect to our replica set using External or Internal IP.

We need to map our Virtual Machine's /etc/hosts file to each other with the hostnames pointing to other VM's Internal IP.

10.128.0.7    mongo-node0.example.com  # Current Machine
10.128.0.8    mongo-node1.example.com  # Secondary Node
10.128.0.9    mongo-node2.example.com  # Arbiter Node

Similarly, you need to edit the /etc/hosts file on the rest of the VM's as well. You can even use the local IP 127.0.0.1 instead the Internal IP for the VM that you are currently logged into (current machine).

Domain Mapping

If you have a registered domain then you need to edit the DNS records to add A records pointing to VM's External IP. This will allows us to connect our Replica Set from the public internet.

GCP-Mongo-DNS.jpg

If you don't have a registered domain or If you wish to not make the replica set available on the public internet then use any arbitrary domain name as hostname and on any system that you are using to connect to your replica set, edit the system's hosts file to point the hostnames to VM's IP.

If you are trying to connect any other VM (deployed on GCP) to your Replica Set then you can still use the VM's Internal IP to point the hostname. But for devices that will be accessing the Replica Set from outside of GCP network (like your local developer machine) then use the VM's External IP to point the hostname:

35.247.10.47      mongo-node0.example.com  # Primary Node
35.247.199.41     mongo-node1.example.com  # Secondary Node
35.82.190.33      mongo-node2.example.com  # Arbiter Node

Initiating Replica Set

Its time for us to initiate Replica Set and connect them to each other. SSH into the VM that you want as Primary node and connect to the running mongod.

mongo

Initiate the Replica Set on this node to make this a Primary node.

rs.initiate()

Check the replica status with

rs.status()

If the replica set has successfully been initialized then you'll see the replica set name and an entry in members array with the current machine state as PRIMARY.

Once the node is in PRIMARY mode, you will lose access to any administrative commands used inside MongoDB. So lets create an user with DB Admin priviledges on the Mongo's default admin database.

use admin

and create the user with admin roles.

db.createUser(
  {
    user: "username",
    pwd: "password",
    roles: [
      {
        role: "readWriteAnyDatabase",
        db: "admin"
      },
      {
        role: "userAdminAnyDatabase",
        db: "admin"
      },
      {
        role: "dbAdminAnyDatabase",
        db: "admin"
      },
      {
        role: "clusterAdmin",
        db: "admin"
      }
    ]
  }
)

The above list of roles allows almost all the permissions that any admin might need but you can configure the roles as per your requirement. It is mandatory to keep atleast one user with clusterAdmin role to handle the Replica Set.

Now exit and reconnect to mongod with user credentials

mongo -u username

you will be prompted to enter password. Now you can use any regular database commands. Whatever data you create now will be replicated by the nodes that we will add in the next step.

When you initiate the Replica Set it will be created with a default Replica configuration . You can check the configuration with

rs.conf()

Adding other nodes to Replica Set

Setting the priority for members

We need to edit the default Replica config for setting the priority of our current primary node. Priority can be any integer. I'll recommend setting higher priority for the member you want as the first preference for Primary Node voting. In this case, we are going to set priorities as this:

  • 10 - Primary Node
  • 5 - Secondary Node
  • 0 - Arbiter Node

As replica set currently only has one primary node connected, we will change its priority by reconfiguring the rs.conf()

Save the current configuration in a variable

cfg = rs.conf()

Make sure you have 1 node present in members array and then set its new priority just like you would do in any other programming language.

cfg.members[0].priority = 10

Now to make the changes take effect, reconfigure the Replica set with new configuration

rs.reconfig(cfg)

Connecting secondary node

Before proceeding to connect the secondary node, make sure its mongod process is properly running.

Now from the PRIMARY node, add secondary node configuration.

rs.add( { host: "mongo-node1.example.com:27017", priority: 5, votes: 1 } )

After exchanging a couple of heartbeats and syncing the data, the node should attain SECONDARY status in rs.status().

Connecting arbiter node

Similar to secondary node, we will now add arbiter node.

rs.add( { host: "mongo-node2.example.com:27017", priority: 0, votes: 1, arbiterOnly: true, hidden: true } )

This node won't replicate any data and will be hidden to any read requests.

...and that's about it. You should now have a fully functional MongoDB Replica Set deployed on Google Cloud Platform.

Connecting to Replica Set from clients

There are various ways you can connect to the replica set but make sure you use the specified domain names of nodes and the replica set name that was specified in mongod.conf.

From MongoDB 3.4 onwards, you can use the Mongo Connection String in shell command, example:

mongo "mongodb://username:password@mongo-node0.example.com:27017,mongo-node1.example.com:27017,mongo-node2.example.com:27017/?replicaSet=repl-set-name"

You will be using the same string in various client drivers like Node.js, Java and Go.

Debugging

mongod process crashing

You can find the logs for crashes in the file specified at systemLog.path in mongod.conf. By default the path will be /var/log/mongodb/mongod.log. More info about log messages.

Replica Nodes not able to connect to each other

  • Use ping to test the general network connectivity of each node to other nodes.
  • Verify your hosts file for correctly mapping between IP and Domain Name.
  • Make sure you have correct domain names as members.host in rs.conf(). Reconfigure if you dont.

Replica Set Voting

If you want to test the voting process of Replica Set then there's a helper method rs.stepDown() for switching the primary node. This method, when called on Primary Node, will make the node forgo its Primary status and trigger a vote between Replica Sets.