What is Replication?
Synchronizing the same set of data across multiple servers is a common practice that is followed to ensure the availability of data across various servers. Data Replication is the process of storing data in more than one site or node.
Why do you need Replication?
Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.
MongoDB Replica Set
MongoDB Replica Set is a group of MongoDB processes known as mongod instances that basically host the same data set. It is featured by one primary node, several secondary nodes for bearing data and optionally one arbiter node.
You can learn more about MongoDB Replica Set here
Deploying on Google Cloud Platform (GCP)
Let's begin with the deployment. We will be using Google Cloud's Compute Engine for creating VM Instances and the official MongoDB server binaries. We will be deploying 3 nodes (Primary + Secondary + Arbiter) on Compute Engine.
Picture Credits: docs.mongodb.com
Prerequisites
- Basic knowledge of Linux Terminal and DevOps
- Google Cloud Project: GCP provides a free 300$ trial for everyone signing up the first time
- Domain Name control panel: This is optional but recommended for easily connecting to the Replica Set from public networks.
The Metal
We need a minimum of 3 VM instances for Replica Set. Two of the VM's serve as Primary and Secondary nodes for data replication and third is the arbiter node. It is up to you to set the capacity for the VM's instances. For arbiter node, you can save a few bucks by setting low capacity (even GCP's free f1-micro instance gets the job done).
This article demonstrates deployment process for Primary + Secondary + Arbiter nodes but you can use the same process for deploying another Secondary node instead of Arbiter.
MongoDB's default port
MongoDB defaults to 27017 port for connection. For our VM's to open this port we need to create a Firewall Rule.
In GCP console, Go to Firewall Rule under VPC Network and create a mongo-default-port
rule with the following config.
Primary and Secondary Node
We are now ready to launch our Virtual Machines. In this PSA configuration, it is recommended to keep Primary and Secondary nodes of the same config, as in any point of time these two nodes can interchange Primary and Secondary positions. For the Arbiter node, you can opt-in for a low resource VM. In regards to Storage, it is recommended to use SSD. For more information about Hardware considerations read this
Here's an example VM for one of the primary nodes.
- You can choose any Linux distro but Debian is recommended for its stability and bloatware-free environment.
- Make sure you untick HTTP/HTTPS ports (unless you have a specific use case) and add
mongo-default-port
network tag which we created in the last step. - When working with stateful VM's always attach an external disk to store your data, which will allow you to retain the data when upgrading/migrating VM's and have a backup service like GCP Snapshot Schedule.
If you want to configure your VM for performance, now is a good time. One of the recommended steps for Linux server is Disable Transparent Huge Pages (THP). You can find a few more performance tweaks on the internet but be careful because every linux distro behaves differently.
Setting up MongoDB Replicas
You need to replicate the following process on all 3 VM's
Install MongoDB
Here's the official guide on installing MongoDB on Linux. Feel free to use any other community guides as long as it's downloading binaries from the official source.
Database path (Optional)
By default, MongoDB will store the data in /var/lib/mongodb
. If you wish to change the directory or if you are using external disk then you need to create a directory for that path.
sudo mkdir -p /mnt/disks/persistent/db
sudo chmod 777 /mnt/disks/persistent/db
Authentication
We will be using keyfile authentication
for this guide, but for a Production environment, you should consider using more secure x.509 certificates.
The keyfile contains a password/secret in plain-text. You can choose any password or generate a complex random string to store in the keyfile. We are placing the key inside /var/tmp
so it is accessible to all users.
Use nano/vim to create the keyfile and type in your password/secret text.
nano /var/tmp/mongo-keyfile
Assign permission and ownership to keyfile:
sudo chmod 400 /var/tmp/mongo-keyfile
sudo chown -R mongodb:mongodb /var/tmp/mongo-keyfile
Configuration
MongoDB uses /etc/mongod.conf
as startup configuration for mongod
process. We can configure that file to support our Replica Set. You can learn more about mongod.conf here.
Below is the configuration file we will use in this guide.
# mongod.conf
# Storage: Where and how to store data
# Default dbPath: /var/lib/mongodb
storage:
dbPath: <path-to-database-dir>
journal:
enabled: true
# Authorization with keyfile
security:
keyFile: <path-to-keyfile>
# Logging
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log
# Network Interfaces
# Allows access from all IP address
net:
port: 27017
bindIp: 0.0.0.0
bindIpAll: true
# Replication config
replication:
replSetName: dev-rs0
enableMajorityReadConcern: false
# How the process runs
processManagement:
timeZoneInfo: /usr/share/zoneinfo
# Cloud config for enabling free monitoring
cloud:
monitoring:
free:
state: on
tags: [dev-node-0]
Set storage.dbPath
, security.keyFile
and replication.replSetName
for your mongod.conf.
Now, restart your MongoDB deamon (mongod) to let it start with latest mongod.conf.
sudo service mongod restart
Verify that the deamon has started successfully.
sudo service mongod status
If the mongod process crashes then check out the debugging section at the bottom of the article.
Networking
VM Mapping
We will be connecting our Replica Set with hostnames, this allows us to connect to our replica set using External or Internal IP.
We need to map our Virtual Machine's /etc/hosts
file to each other with the hostnames pointing to other VM's Internal IP.
10.128.0.7 mongo-node0.example.com # Current Machine
10.128.0.8 mongo-node1.example.com # Secondary Node
10.128.0.9 mongo-node2.example.com # Arbiter Node
Similarly, you need to edit the /etc/hosts
file on the rest of the VM's as well. You can even use the local IP 127.0.0.1
instead the Internal IP for the VM that you are currently logged into (current machine).
Domain Mapping
If you have a registered domain then you need to edit the DNS records to add A records pointing to VM's External IP. This will allows us to connect our Replica Set from the public internet.
If you don't have a registered domain or If you wish to not make the replica set available on the public internet then use any arbitrary domain name as hostname and on any system that you are using to connect to your replica set, edit the system's hosts
file to point the hostnames to VM's IP.
If you are trying to connect any other VM (deployed on GCP) to your Replica Set then you can still use the VM's Internal IP to point the hostname. But for devices that will be accessing the Replica Set from outside of GCP network (like your local developer machine) then use the VM's External IP to point the hostname:
35.247.10.47 mongo-node0.example.com # Primary Node
35.247.199.41 mongo-node1.example.com # Secondary Node
35.82.190.33 mongo-node2.example.com # Arbiter Node
Initiating Replica Set
Its time for us to initiate Replica Set and connect them to each other. SSH into the VM that you want as Primary node and connect to the running mongod.
mongo
Initiate the Replica Set on this node to make this a Primary node.
rs.initiate()
Check the replica status with
rs.status()
If the replica set has successfully been initialized then you'll see the replica set name and an entry in members
array with the current machine state as PRIMARY.
Once the node is in PRIMARY mode, you will lose access to any administrative commands used inside MongoDB. So lets create an user with DB Admin priviledges on the Mongo's default admin
database.
use admin
and create the user with admin roles.
db.createUser(
{
user: "username",
pwd: "password",
roles: [
{
role: "readWriteAnyDatabase",
db: "admin"
},
{
role: "userAdminAnyDatabase",
db: "admin"
},
{
role: "dbAdminAnyDatabase",
db: "admin"
},
{
role: "clusterAdmin",
db: "admin"
}
]
}
)
The above list of roles allows almost all the permissions that any admin might need but you can configure the roles as per your requirement. It is mandatory to keep atleast one user with clusterAdmin role to handle the Replica Set.
Now exit and reconnect to mongod with user credentials
mongo -u username
you will be prompted to enter password. Now you can use any regular database commands. Whatever data you create now will be replicated by the nodes that we will add in the next step.
When you initiate the Replica Set it will be created with a default Replica configuration . You can check the configuration with
rs.conf()
Adding other nodes to Replica Set
Setting the priority for members
We need to edit the default Replica config for setting the priority of our current primary node. Priority can be any integer. I'll recommend setting higher priority for the member you want as the first preference for Primary Node voting. In this case, we are going to set priorities as this:
- 10 - Primary Node
- 5 - Secondary Node
- 0 - Arbiter Node
As replica set currently only has one primary node connected, we will change its priority by reconfiguring the rs.conf()
Save the current configuration in a variable
cfg = rs.conf()
Make sure you have 1 node present in members array and then set its new priority just like you would do in any other programming language.
cfg.members[0].priority = 10
Now to make the changes take effect, reconfigure the Replica set with new configuration
rs.reconfig(cfg)
Connecting secondary node
Before proceeding to connect the secondary node, make sure its mongod process is properly running.
Now from the PRIMARY node, add secondary node configuration.
rs.add( { host: "mongo-node1.example.com:27017", priority: 5, votes: 1 } )
After exchanging a couple of heartbeats and syncing the data, the node should attain SECONDARY status in rs.status()
.
Connecting arbiter node
Similar to secondary node, we will now add arbiter node.
rs.add( { host: "mongo-node2.example.com:27017", priority: 0, votes: 1, arbiterOnly: true, hidden: true } )
This node won't replicate any data and will be hidden to any read requests.
...and that's about it. You should now have a fully functional MongoDB Replica Set deployed on Google Cloud Platform.
Connecting to Replica Set from clients
There are various ways you can connect to the replica set but make sure you use the specified domain names of nodes and the replica set name that was specified in mongod.conf.
From MongoDB 3.4 onwards, you can use the Mongo Connection String in shell command, example:
mongo "mongodb://username:password@mongo-node0.example.com:27017,mongo-node1.example.com:27017,mongo-node2.example.com:27017/?replicaSet=repl-set-name"
You will be using the same string in various client drivers like Node.js, Java and Go.
Debugging
mongod process crashing
You can find the logs for crashes in the file specified at systemLog.path
in mongod.conf. By default the path will be /var/log/mongodb/mongod.log
. More info about log messages.
Replica Nodes not able to connect to each other
- Use
ping
to test the general network connectivity of each node to other nodes. - Verify your
hosts
file for correctly mapping between IP and Domain Name. - Make sure you have correct domain names as
members.host
in rs.conf(). Reconfigure if you dont.
Replica Set Voting
If you want to test the voting process of Replica Set then there's a helper method rs.stepDown()
for switching the primary node. This method, when called on Primary Node, will make the node forgo its Primary status and trigger a vote between Replica Sets.