How to Fix Cassandra Consistency Issues using Read Repair


Why Consistency Issues or C in CAP theorem

As many of you probably knows Cassandra is a AP big data storage. In other words when a network partition happens Cassandra remains Available and relaxes the Consistency property. It is always said that it is eventually consistent or in other words it will be consistent at some point in time in future.

The important things to know which is not really obvious are:
  • Cluster does become inconsistent pretty often. Sure, there are many things influencing the stability of the cluster, such as proper configuration, dedicated resources, production load, professionalism of the ops guys etc, but the fact is the probability the nodes are going down from time to time and therefore the data become inconsistent are really high.
  • Cluster does NOT become consistent again automatically. This is something which goes against the god feeling towards the modern and mature distributed systems. Unless you have the enterprise version of Datastax and enable one the latest feature of DSE v6 you have to fix the inconsistency issues manually.

Ways to Fix Inconsistency

Fortunately there are ways to fix the inconsistency issues. There a couple of options here:
  • nodetool repair tool. This is probably the main and default method to use. Running the command on node which was down for all the tables or specific ones. One caveat though is: all the nodes should be UP while you are running the command.
  • read repair Cassandra feature. This is an important feature meaning, during the read requests the cluster organism is repairing itself, to be more precise it repairs the proper data replicas.  If the replicas involved in a read requests  are not consistent they are being aligned again

When and Why Read Repair

As it was stated above the default and main way to fix inconsistency is nodetool repair tool, so the natural question is when and why to use read repair method. Let me answer that using the experience from one of my project. 
At some point we entered a period when our Cassandra cluster started to become very unstable and it took significant amount of time until all the nodes returned to the UP state again. That lead to the 2 main consequences:
  • data became inconsistent
  • nodetool repair tool was not possible to use to fix inconsistencies during that period
Having these circumstances the best we could do was using the read repair feature to make sure at least at the majority of replicas, at the Quorum level, the data is consistent so that for all the reads which are using the Quorum consistency level the data is consistent and up to date

How to Use Read Repair

Read repair feature fixes the consistency only on the records which are involved in the reads, so how to repair the whole table? 
It is possible to use the following approach:
  • select one of narrowest column to read
  • read the whole table using the copy command to export the data to the file
So let's consider we have a Cassandra table "event" in a keyspace "test"  with one of the narrowest column called "id"; the copy command would look like this:

    cqlsh -e "consistency QUORUM; copy test.event(fid) to '/tmp/tid'"

Alternatively you could read whole records and send them to '/dev/null':

    cqlsh -e "consistency QUORUM; copy test.event to '/dev/null'"

And sure, when all the nodes are UP you could use consitency ALL, but in this case it's better to use the nodetool repair tool like this:

    nodetool repair test event

Conclusion

Cassandra is great big data storage but in order to leverage it to the full it requires good understanding of main principles how it works and as any beautiful thing it requires some care :) 



Comments