The migrate-mongo-cluster is an application to help you migrate the data from one server to the other. The objective of this application is to help you acheive live migration of the data from source database to target database. This application comes handy especially, when you are in shard cluster and want to change the shard key without unsharding and resharding or even rename the namespaces as you migrate.
From technical stand point of view, the application reads data document by document from the source database and writes them into the target database. The application also tails the oplog and reapply them on target once it copied all data. If you are using renameNamespaces
settings in the config file then the documents or the oplogs are renamed as they are applied on the target.
Ideally, one should be using a backup of existing database, restore it to the server were you wanted to migrate, let the oplog catchup and re-elect the new server as primary. If for whatever reason, you cannot acheive the above recommended approach, you may use this application to do the migration, at your own risk! This application is not supported by MongoDB Technical Support Engineers.
If you are on sharded cluser, please make sure to disable the balancer for the entire duration of the migation.
This application records the recent oplog entry on the replica set at the time of first run (or everytime if drop
option is set). Once the application copies all the data from source onto the target, it tails the source oplog from the recorded oplog timestamp. For the migration process to successfully complete, it's crucial to have oplog big enough to accomidate the operations for the anticipated duration of the process. Please refer to the MongoDB documentation, Change the Size of the Oplog to an appropriate value.
While migrating the data from source to target, it is assumed that all the indexes of your interest are precreated on target before beginning the migration.
If you are planning to change the shard key then the application assumes that you configured the sharded collections accordingly.
You may use the below script to help generate code for creating the collections and the indexes for target.
- Open the MongoDB shell connected to the source
- Copy paste the script (below) in the MongoDB shell
- The script generates the commands that you need to run on Targer
- Copy the generated scripts
- Open the MongoDB shell connected to the target
- Paste / run the previously generated scripts
/*
Author: Shyam Arjarapu
Description: Generate create collection and index scripts for every collection in every database
*/
var collections = [];
db.getMongo().getDBs().databases.forEach(function(databaseMeta){
if (databaseMeta.name == 'admin' || databaseMeta.name == 'local' || databaseMeta.name == 'config')
return;
var database = db.getSiblingDB(databaseMeta.name);
database.getCollectionInfos().forEach(function(collectionInfo){
var collection = {
name: collectionInfo.name,
database: databaseMeta.name,
options: collectionInfo.options
};
collections.push(collection);
var ignoreKeys = [ "v", "key", "name", "ns" ];
var indexes = database.getCollection(collection.name)
.getIndexes()
.map( function(item) {
var index = {
key: item.key,
name: item.name
};
var options = {};
Object.keys(item).filter(key => !Array.contains(ignoreKeys, key)).forEach(key => options[key] = item[key]);
if (Object.keys(options).length > 0) {
index.options = options;
}
return index;
});
collection.indexes = indexes;
});
});
var collectionStrings = collections.map(function(collection){
return `db.getSiblingDB('${collection.database}').createCollection('${collection.name}', ${JSON.stringify(collection.options)})`;
}).join("\n");
var indexStrings = collections.map(function(collection){
return collection.indexes.map(function(index){
var optionsJSON = index.options ? `, ${JSON.stringify(index.options)}` : '';
return `db.getSiblingDB('${collection.database}').getCollection('${collection.name}').createIndex(${JSON.stringify(index.key)} ${optionsJSON});`
}).join("\n");
}).join("\n");
print(collectionStrings + "\n" + indexStrings);
git clone [email protected]:sarjarapu/migrate-mongo-cluster.git
cd migrate-mongo-cluster/migrator
mvn clean compile package
java -jar target/migrate-mongo-cluster-1.0-SNAPSHOT-jar-with-dependencies.jar -h
usage: migratecluster [-c <arg>] [-d] [-h] [-o <arg>] [-s <arg>] [-t <arg>]
-c,--config <arg> configuration file for migration
-d,--drop drop target collections before copying
-h,--help print this message
-o,--oplog <arg> oplog store connection string
-s,--source <arg> source cluster connection string
-t,--target <arg> target cluster connection string
-m,--mode <arg> migration mode. Supported modes: oplogOnly
java -jar target/migrate-mongo-cluster-1.0-SNAPSHOT-jar-with-dependencies.jar -c ../sample/sample-migration.conf
Below are the list of features that I thought of incorporating into the application.
- Get databases, collections and docs
- Save the documents onto target server
- Reactive Programming
- Buffered read / Bulk write
- Multithreading - Read full documents in a different thread
- Multithreading - Write full documents in a different thread
- Drop database / collection before inserting
- Oplog tail for each replicaSet
- Continuation from where we left off
- Error handling, duplicate key, etc
- Use connection string with all the members in replicaset
- Retry logic when the primary is down
- Read preference - secondary from source?
- While copying find the id and continue where you left off
- Apply the oplogs in bulk operations
- Make OplogWriter skip the blacklisted databases / collections
- Status Database to keep track of progress
- API to expose status of migrators from database
- Runtime injection of the log level
- Move the gapWatcher out of the oplogMigrator
- Use the readPreference on collection vs on client
- How do you track the multi shard -> mongos and last known id (lastknownid should be / rs)
- Target is behind by 446 seconds & 0000 operations; even after completing the transfer
Saving the oplog tail data help in scenarios when source oplog headroom is small compared to time it takes to populate all the historical data. For now, I assume oplog is big enough for multiple days if an oplog entry already exists then wait till all the copy process is done begin the oplog tail apply operations only after copy process is completed.