Instaclustr: 7 easy steps to Cassandra cluster migration

This is a guest post for the Computer Weekly Open Source Insider blog written by Ben Slater in his capacity as chief product officer at Instaclustr.

Instaclustr positions itself as firm offering managed and supported solutions for Apache Cassandra, ScyllaDB, Elasticsearch, Apache Spark, Apache Zeppelin, Kibana and Apache Lucene. 

Indeed, Instaclustr is known for its willingness to describe itself as a managed open source as a service company, if that expression actually exists.

The original title in full for this piece was: Migrating Your Cassandra Cluster – with Zero Downtime – in 7 Easy Steps.

Slater’s moves for writing this piece are (obviously) directed at companies who are looking to move a live Apache Cassandra deployment to a new location.

With this task in mind, it is (obviously) natural that these same companies will have some concerns, such as how you can keep Cassandra clusters 100% available throughout the process.

Arguing that if your application is able to remain online throughout connection setting changes, Slater says it can also remain fully available during this transition.

NOTE: For extra protection and peace of mind, the following technique also includes a rapid rollback strategy to return to your original configuration, up until the moment the migration is completed.

Slater writes as follows:

Here’s a recommended 7-step Cassandra cluster migration order-of-operations that will avoid any downtime:

1) Get your existing environment ready

First of all, make sure that your application is using a datacentre-aware load balancing policy, as well as LOCAL_*. Also, check that all of the keyspaces that will be copied over to the new cluster are set to use NetworkTopologyStrategy as their replication strategy. It’s also recommended that all keyspaces use this replication strategy when created, as altering this later can become complicated.

2) Create the new cluster

Now it’s time to create the new cluster that you’ll be migrating to. A few things to be careful about here: be sure that the new cluster and the original cluster use the same Cassandra version and cluster name. Also, the new datacenter name that you use must be different from the name of the existing datacenter.

3) Join the clusters together

To do this, first make any necessary firewall rule changes in order to allow the clusters to be joined, remembering that some changes to the source cluster may also be necessary. Then, change the new cluster’s seed nodes – and start them. Once this is done, the new cluster will be a second datacenter in the original cluster.

4) Change the replication settings 

Next, in the existing cluster, update the replication settings for the keyspaces that will be copied, so that data will now be replicated with the new datacenter as the destination.

5) Copy the data to the new cluster

When the clusters are joined together, Cassandra will begin to replicate writes to the new cluster. It’s still necessary, however, to copy any existing data over with the nodetool rebuild function. It’s a best practice to perform this function on the new cluster one or two nodes at a time, so as not to place an overwhelming streaming load on the existing cluster.

6) Change over the application’s connection points

After all uses of the rebuild function are completed, each of the clusters will contain a complete copy of the data being migrated, which Cassandra will keep in sync automatically. It’s now time to change the initial connection points of your application over to the nodes in the new cluster. Once this is completed, all reads and writes will be served by the new cluster, and will subsequently be replicated in the original cluster. Finally, it’s smart to run a repair function across the cluster, in order to ensure that all data has been replicated successfully from the original. 

7) Shut down the original cluster

Complete the process with a little post-migration clean up, removing the original cluster. First, change the firewall rules to disconnect the original cluster from the new one. Then, update the replication settings in the new cluster to cease replication of data to the original cluster. Lastly, shut the original cluster down.

There you have it: your Apache Cassandra deployment has been fully migrated, with zero downtime, low risk and in a manner completely seamless and transparent from the perspective of your end users.

You can follow Instaclustr on Twitter.

Join the conversation


Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Can data discovery tools democratize analytics?
Everyone doesnt know how to make the whole business better but most employees know how to make their own job better and analytics can serve as a thermometer for progress.
only legacy technical skilled resources in Microstrategy, Cognos, BOBJ would say no because they are afraid to learn new disruptive innovative technologies. They don't know what they don't know and if it's an executive - that's sad!!!
This direction is the way to go but IT needs to embrace it fully.
A pie chart? Really!?
It takes people to analyze the data generated by the tools to actually make a difference.
If anyone can, it's a Data Discovery Tool
Handing discovery tools to 'LOB' users also requires some collaboration. With IT that owns the datasources, and with other users doing analysis so analysis can be an ongoing build and tweak process rather than build and rebuild. As much as discovery results can feed ongoing reporting, the better. Let's work to spread the power to investigate and learn from data but not drive silos of data or wedges further between LOB and IT.