Skip to content

Latest commit

 

History

History
285 lines (186 loc) · 14.5 KB

README.md

File metadata and controls

285 lines (186 loc) · 14.5 KB

Cassandra Diagnostics

Monitoring and audit power kit for Apache Cassandra.

Build Status Maven Central Download

Introduction

Cassandra Diagnostics is an extension for Apache Cassandra server node implemented as Java agent. It uses bytecode instrumentation to augment Cassandra node with additional functionalities. The following images depicts the position of Cassandra Diagnostics in a Apache Cassandra based system.

Placement diagram

Cassandra Diagnostics has a modular architecture. On one side it has connectors for different versions of Apache Cassandra nodes or Cassandra Java Driver and on the other it has various reporters to send measurement to different collecting/monitoring tools. In between lies the core with a set of metrics processing modules. Reusable code goes to commons.

Architecture diagram

Cassandra Diagnostic Commons

Cassandra Diagnostics Commons holds interface for core, connector and reports and it provides signature all the modules need to confront to be able to work together.

Cassandra Connector

Connector is a module which hooks into the query path and extract information for diagnostics. Bytecode instrumentation is used to augment existing Cassandra code with additional functionality. It uses low priority threads to execute the diagnostics information extraction with minimal performance impact to the target code (Cassandra node or application/driver).

Currently Cassandra Diagnostics implements the following connector implementation:

Cassandra Core

Cassandra Diagnostics Core is glue between connector and reporters. It holds all the modules for diagnostics, it has business logic for measurement and it decides what will be measured and what would be skipped. Its job is to load provided configuration or to setup sensible defaults.

Modules

There are default module implementations which serve as core features. Modules use configured reporters to report their activity.

Please read core modules README for more information and configuraion options for the modules.Core module implementations:

Heartbeat Module

Heartbeat Module produces messages to provide feedback that the diagnostics agent is loaded and working. Typical usage is with Log Reporter where it produces INFO message in configured intervals. Default reporting interval is 15 minutes.

Slow Query Module

Slow Query Module is monitoring execution time of each query and if it is above configured threshold it reports the value and query type using configured reporters. Default query execution time threshold is 25 milliseconds.

Request Rate Module

Request Rate Module uses codahale metrics library to create rate measurement of executed queries. Rates are reported for configurable statement types and consistency levels using configured reporters in configured periods. Default reporting interval is 1 second.

Metrics Module

Metrics Module collects Cassandra's metrics, which are exposed over JMX, and ships them using predefined reporters. Metrics package names configuration is the same as a default metrics config reporter uses. Default reporting interval is 1 second.

Status Module

Status Module is used to report Cassandra information exposed over JMX. It reports compaction information as a single measurement. Default reporting interval is 1 minute.

Cluster Health Module

Cluster Health Module is used to report the health status of the nodes such as which nodes are marked as DOWN by gossiper. It uses the information exposed over JMX. Default reporting interval is 10 seconds.

Hiccup Module

Module based on jHiccup that logs and reports platform hiccups including JVM stalls. Default reporting period is 5 seconds and reporter values and percentiles from 90 to 100 and Mean and Max values.

Reporters

Reporters take measurement from core and wrap them up in implementation specific format so it can be sent to reporters target (i.e. Influx reporter transforms measurement to influx query and stores it to InfluxDB).

Reporter implementations:

Log Reporter

LogReporter uses the Cassandra logger system to report measurement (this is default reporter and part of core). Reports are logged at the INFO log level in the following pattern:

Measurement {} [time={}, value={}, tags={}, fields={}]

Values for time is given in milliseconds. tags are used to better specify measurement and provide additional searchable labels and fields is a placeholder for additional fields connected to this measurement. Example can be Slow Query measurement, where value is execution time of query, tags can be type of statement (UPDATE or SELECT) so you can differentiate and search easy and fields can hold actual statement, which is not something you want to search against but it is valuable metadata for measurement.

Riemann Reporter

RiemannReporter sends measurements towards Riemann server.

Influx Reporter

InfluxReporter sends measurements towards Influx database.

Telegraf Reporter

Telegraf Reporter sends measurements towards Telegraf agent.

Datadog Reporter

Datadog Reporter sends measurements towards Datadog Agent using UDP.

Kafka Reporter

Kafka Reporter sends measurements towards Kafka.

Prometheus Reporter

Prometheus Reporter exposes measurements to be scraped by Prometheus server.

Configuration

Cassandra Diagnostics uses an external configuration file in YAML format. You can see default configuration in cassandra-diagnostics-default.yml. The default name of the config file is cassandra-diagnostics.yml and it is expected to be found on the classpath. This can be changed using property cassandra.diagnostics.config. For example, the configuration can be set explicitly by changing cassandra-env.sh and adding the following line:

JVM_OPTS="$JVM_OPTS -Dcassandra.diagnostics.config=some-other-cassandra-diagnostics-configuration.yml"

The following is an example of the configuration file:

global:
  systemName: "smartcat-cassandra-cluster"

reporters:
  - reporter: io.smartcat.cassandra.diagnostics.reporter.LogReporter

modules:
  - module: io.smartcat.cassandra.diagnostics.module.slowquery.SlowQueryModule
    measurement: queryReport
    options:
      slowQueryThresholdInMilliseconds: 1
    reporters:
      - io.smartcat.cassandra.diagnostics.reporter.LogReporter

Specific query reporter may require additional configuration options. Those options are specified using options property. The following example shows a configuration options in case of RiemannReporter and it shows how you can configure specific modules to use this reporter:

global:
  systemName: "smartcat-cassandra-cluster"

# Reporters
reporters:
  - reporter: io.smartcat.cassandra.diagnostics.reporter.LogReporter
  - reporter: io.smartcat.cassandra.diagnostics.reporter.RiemannReporter
    options:
      riemannHost: 127.0.0.1
      riemannPort: 5555 #Optional
      batchEventSize: 50 #Optional

# Modules
modules:
  - module: io.smartcat.cassandra.diagnostics.module.requestrate.RequestRateModule
    measurement: requestRate
    options:
      period: 1
      timeunit: SECONDS
    reporters:
      - io.smartcat.cassandra.diagnostics.reporter.LogReporter
      - io.smartcat.cassandra.diagnostics.reporter.RiemannReporter

By default all measurements are reported with hostname queried with InetAddress java class. If required, hostname can be set using a hostname variable in configuration file:

global:
  systemName: "smartcat-cassandra-cluster"
  hostname: "test-hostname"

reporters:
etc...

It is important to name system under observation because measurements can be collected by various systems. Hostname is not enough, it is easy to imagine one host having Cassandra node and Kafka node both emitting measurement and we want to group those by system. By default "cassandra-cluster" will be used but it is advised to override this to have unique grouping of measurements:

global:
  systemName: "cassandra-cluster"
  hostname: "test-hostname"

Information provider

Being deployed on the node itself, diagnostics connector should provide a connection to the node over JMX by wrapping the Cassandra's NodeProbe class with provides access to all actions and metrics exposed over JMX. This is configured in the connector part of the configuration which sits in the root of diagnostics config.

connector:
  jmxHost: 127.0.0.1
  jmxPort: 7199
  jmxAuthEnabled: false #Optional
  jmxUsername: username #Optional
  jmxPassword: password #Optional

Status module uses information provided by connector in order to collect info data.

Control and Configuration API

Cassandra Diagnostics exposes a control and configuration API. This API currently offers the following operations:

- getVersion - returns the actual Cassandra Diagnostics version.
- reload - reloads the configuration

This API is exposed over JMX and HTTP protocols.

The Diagnostics API JMX MXBean could be found under the following object name:

package io.smartcat.cassandra.diagnostics.api:type=DiagnosticsApi

The HTTP API is controlled using the following options in the global section in the configuration file:

global:
  # controls if HTTP API is turned on. 'true' by default.
  httpApiEnabled: true
  # specifies the host/address part for listening TCP socket. '127.0.0.1' by default.
  httpApiHost: 127.0.0.1
  # specifies the port number for the listening TCP socket. '8998' by default.
  httpApiPort: 8998
  # if API authorization is enabled, API key must be provided through the 'Authorization' header
  httpApiAuthEnabled: false
  # API access key
  httpApiKey: "diagnostics-api-key"

It implements the following endpoints for mapping HTTP requests to API operations:

  • GET /version for getVersion
  • POST /reload for reload

Installation

Script for automated installation is also available.

Cassandra Diagnostics consists of the following three components:

  • Cassandra Diagnostics Core
  • Cassandra Diagnostics Connector
  • Cassandra Diagnostics Reporter

Every of these components is packaged into its own JAR file (accompanied with necessary dependencies). These JAR files are available for download on Maven Central and need to be present on the classpath.

Pay attention to the fact that Cassandra Diagnostics Connector has to be aligned with the used Cassandra version. For example, cassandra-diagnostics-connector21 should be used with Cassandra 2.1.

Also note that more than one Cassandra Diagnostics Reporter can be used at the same time. That means that all respective JAR files have to be put on the classpath. The only exception to this rule is in case of LogReporter that is built in Cassandra Diagnostics Core and no Reporter JAR has to be added explicitly.

Place cassandra-diagnostics-core-VERSION.jar, cassandra-diagnostics-connector21-VERSION.jar and required Reporter JARs (e.g. cassandra-diagnostics-reporter-influx-VERSION-all.jar) into Cassandra lib directory.

Create and place the configuration file cassandra-diagnostics.yml into Cassandra's conf directory. Add the following line at the end of conf/cassandra-env.sh:

JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/cassandra-diagnostics-core-VERSION.jar -Dcassandra.diagnostics.config=cassandra-diagnostics.yml"

Usage

Upon Cassandra node start, the Diagnostics agent kicks in and instrument necessary target classes to inject diagnostics additions. LogReporter repors slow queries in logs/system.log at INFO level. The dynamic configuration could be inspected/changed using jconsole and connecting to org.apache.cassandra.service.CassandraDaemon.

Build and deploy

Build and deploy process is described here.

License and development

Cassandra Diagnostics is licensed under the liberal and business-friendly Apache Licence, Version 2.0 and is freely available on GitHub. Cassandra Diagnostics is further released to the repositories of Maven Central and on JCenter. The project is built using Maven. From your shell, cloning and building the project would go something like this:

git clone https://github.com/smartcat-labs/cassandra-diagnostics.git
cd cassandra-diagnostics
mvn package