≡ Menu

Cassandra TWCS must have TTL’s

The Last Pickle did a great blog post on TWCS a little while ago, explaining how Time Window Compaction is great for certain time series data.

To recap TWCS is suitable for

  • Data should only be inserted and not updated afterwards
  • Data must have a TTL attached
  • Data shouldn’t be explicitly deleted, it should only be expired via the TTL

These are only recommendations, cassandra will allow these recommendations to be broken, but if you do, there will be disk usage problems in the future. In this post I will explain the problems that occur if the data does not have a TTL.

[continue reading…]

This is a post to go with a recent presentation at the Datastax Accelerate conference, and describes the process used to move clusters from Rackspace to Google cloud.

In part 1 we looked at setting up the new datacenter and migrating the data to new nodes. In part 2 we looked at decommissioning the original datacenter.

In this final post, we look at some of the things that can go wrong and how to mitigate and recover from them.

[continue reading…]

This is a post to go with a recent presentation at the Datastax Accelerate conference, and describes the process used to move clusters from Rackspace to Google cloud.

Our requirement was to move multiple clusters, but one at a time, from the UK, to GCP in Belgium without any downtime.

Within this post the original datacenter is called RS_UK and the new datacenter will be GL_EU

[continue reading…]

Finding Rogue Cassandra Queries

Recently I have needed to track down what queries were being run against a cluster. We were needing queries to be run with a consistency of LOCAL_QUORUM but it appeared that some where being run with QUORUM instead. So I needed to prove this to the development team.

Recently I wrote about using ngrep to discover connections to the cluster, so this post will build on that to show how to use ngrep and wireshark to capture queries hitting a specific node.

[continue reading…]

Who is Connecting to a Cassandra Cluster?

Recently we have had some issues with multi DC Cassandra clusters, there were issues around latency and timeouts. this had only started since we had created the second datacenter.

The development team had been tasked to ensure they were using a DC aware load balancer and to ensure they were using local consistency, this ensured they should only be talking to the local DC, rather than the second DC which was positioned with a different cloud provider in a different country.

Using opscenter I could see there were some reads going directly to the new datacenter, but the development could not find them where they were coming from.

So it was time to use ngrep , which describes itself as “like GNU grep applied to the network layer”. Perfect to see which IP’s are connecting to cassandra via the 9042 port

Firstly you may need to install ngrep on to your nodes

apt install ngrep

Then it is ready to run:

ngrep ” -d any -x dst port 9042 and dst host xxx.xxx.xxx.xxx

The ” means grep for anything, if you want to look for something in a network packet then that can used here, but for what we are doing we don’t want to limit the packets by any character string
-d any ensures that all devices are checked
dst port 9042 limits the packets displayed to the ones with a destination port of 9042
dst host xxx.xxx.xxx.xxx limits the packets to the ones arriving at this node xxx.xxx.xxx.xxx is the IP address of the node you are running this on

This will now start displaying all the the incoming connections to this node on port 9042 which is the CQL native port, so this is not the port the nodes talk to each other on.

This will give you messages like this:

T zzz.zzz.zzz.zzz:42768 -> xxx.xxx.xxx.xxx:9042 [AP]

In this instance the zzz.zzz.zzz.zzz is the node sending the cql request.  You will also see the cql being sent in the message, so you should have a good idea where the requests are now coming from.

There is one complication if you are using Datastax Opscenter, as the Datastax agent uses cql to talk to cassandra on the local node, and also opscenter itself will connect to the cluster. To remove these packets you should use the following command:

ngrep ” -d any -x dst port 9042 and dst host xxx.xxx.xxx.xxx and not src host xxx.xxx.xxx.xxx and not src host yyy.yyy.yyy.yyy

where yyy.yyy.yyy.yyy is the IP address of opscenter.