Skip to content

Centralized Logging System for multiple Docker Containers using Apache Kafka.

Notifications You must be signed in to change notification settings

phantominh/centralized-logging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Centralized Logging System for Docker Containers

I. General Structure

So what are we trying to build?

  • We want to design a system that can automatically store and handle the logs from multiple services.
            Mechanism        Mechanism            Mechanism
                A                B                    C
                |                |                    |
> Service 1 --------  Zookeeper  |                    |
                    \    &       |                    |
> Service 2 ---------> Kafka --------> ClickHouse -------> Superset
                    /(Rsyslog)
> Service n --------

Notes:

  • Each service represents a Docker Container.
  • Zookeeper is a MUST for Kafka. Why?
  • To save space, we can try to run Zookeeper, Kafka, ClickHouse, Superset in a single container.

II. Framework / Services Introduction

Let's have a brief introduction about some of the framework we might be using, and why they could possibly help.

  • Kafka:
    • Kafka is an open source software which provides a framework for storing, reading and analysing streaming data. Simply speaking, Kafka works like a messaging queue, but distributedly.
    • We can use Kafka to distribute log messages from multiple services. 3 Kafka main components: producers, consumers, brokers (nodes). Kafka messages are distributed under a "topic". Read more on Kafka website.
    • Multiple Kafka nodes should be used in deployment, and Zookeeper is mandatory to help manage Kafka nodes. Read more about Zookeeper here: https://zookeeper.apache.org/
  • ClickHouse
    • ClickHouse is a fast open-source OLAP database management system.
    • We can benefit from ClickHouse Kafka Engine (which helps to connect with Kafka) and ClickHouse connection with Superset.
  • Superset
    • Able to connect to ClickHouse and visualize data.

III. Mechanism Discussion

1. Mechanism A: Transmits all logging messages from multiple services (Docker Containers) to Kafka.

  • First, we need to figure out how Docker handle its logging -> Docker Logging Driver.
  • I propose 3 possible solutions:
    • Moby Kafka Logdriver: A Docker plugin that works as a logdriver, transmit all log messages to Kafka.
    • Rsyslog: Transmit all log messages to a rsyslog server, then rsyslog decides where the messages go. Rsyslog Project For Kafka
    • Other ways than Moby to configure a custom logging mechanism for all services, and then some way to produce log messages to Kafka.

2. Mechanism B: Configure ClickHouse to automatically receive data from Kafka.

Check out this really helpful tutorial: Link

Structure

  • Note: We can configure each table to receive messages from Kafka topics (They are basically Kafka consumers).

3. Mechanism C: Configure Superset to visualize ClickHouse data

  • Setup Superset User Interface. Use the interface to connect to ClickHouse URI -> Then it's all done!

IV. Full Solution Demo

We will test exploration with a full demo that handles the logs of 3 seperate services.

Note: You will need to edit the IP address in this file and inside docker-compose.yml to match you machine's.

Let's follow the design below

              Kafka             Kafka
            Log Driver      Table Engine
                |                |
> Service 1 --------  Zookeeper  |
                    \    &       |
> Service 2 ---------> Kafka --------> ClickHouse -------> Superset
                    /(Rsyslog)
> Service n --------

Start the demo by running our base services

  • Run Kafka & Zookeeper, ClickHouse, Superset: Remember to edit docker-compose.yml to fit your machine settings.
$ sudo docker-compose up -d
  • Init a variable as your machine IP address:
$ IP_ADDRESS={YOUR_MACHINE_IP_ADDRESS}
  • Create 3 topics:
$ sudo docker exec -it kafka-broker kafka-topics \
--zookeeper $IP_ADDRESS:2181 \
--create \
--topic service_1 \
--partitions 6 \
--replication-factor 1
$ sudo docker exec -it kafka-broker kafka-topics \
--zookeeper $IP_ADDRESS:2181 \
--create \
--topic service_2 \
--partitions 6 \
--replication-factor 1
$ sudo docker exec -it kafka-broker kafka-topics \
--zookeeper $IP_ADDRESS:2181 \
--create \
--topic service_3 \
--partitions 6 \
--replication-factor 1
  • Check if the topic is created or not:
$ sudo docker exec -it kafka-broker kafka-topics \
--zookeeper $IP_ADDRESS:2181 \
--describe

1. Start 3 services that always generate logs and send to Kafka using Kafka Log Driver

First, we need to install and configure Kafka Log Driver, we will need 3 seperate plugins for 3 services.

$ sudo docker plugin install --alias service_1_logdriver --disable mickyg/kafka-logdriver:latest
$ sudo docker plugin set service_1_logdriver KAFKA_BROKER_ADDR="192.168.1.40:9091"
$ sudo docker plugin set service_1_logdriver LOG_TOPIC=service_1
$ sudo docker plugin enable service_1_logdriver:latest
$ sudo docker plugin install --alias service_2_logdriver --disable mickyg/kafka-logdriver:latest
$ sudo docker plugin set service_2_logdriver KAFKA_BROKER_ADDR="192.168.1.40:9091"
$ sudo docker plugin set service_2_logdriver LOG_TOPIC=service_2
$ sudo docker plugin enable service_2_logdriver
$ sudo docker plugin install --alias service_3_logdriver --disable mickyg/kafka-logdriver:latest
$ sudo docker plugin set service_3_logdriver KAFKA_BROKER_ADDR="192.168.1.40:9091"
$ sudo docker plugin set service_3_logdriver LOG_TOPIC=service_3
$ sudo docker plugin enable service_3_logdriver

Build our random logger images:

$ sudo docker build -t random-logger_1 ./random-logger
$ sudo docker build -t random-logger_2 ./random-logger
$ sudo docker build -t random-logger_3 ./random-logger

Run 3 services with configured log drivers:

$ sudo docker run --detach --log-driver service_1_logdriver random-logger_1
$ sudo docker run --detach --log-driver service_2_logdriver random-logger_2
$ sudo docker run --detach --log-driver service_3_logdriver random-logger_3

Make sure that Kafka is receiving the log messages by creating a consumer that is subscribed to each topic. The message should be appearing on the command line window. Later on, ClickHouse will be our Kafka Consumer.

$ sudo docker exec -it kafka-broker kafka-console-consumer --bootstrap-server 192.168.1.40:9091 --topic service_1

2. Configure ClickHouse as a Kafka Consumer

  • Configure ClickHouse to receive data from Kafka

    Structure

  • Let's design the ClickHouse data tables based on our logging messages, which is a one row nested JSON format. I will work with the first layer of JSON only, the nested part need to be handled later on.

{"Line":"{"@timestamp": "2021-07-08T10:44:08+0000", "level": "WARN", "message": "variable not in use."}","Source":"stdout","Timestamp":"2021-07-08T10:44:08.349998689Z","Partial":false,"ContainerName":"/cool_shamir","ContainerId":"4e07abaf3345e8d8b026826c49d3193e401e2635781dae86cbd57ec9579d18c1","ContainerImageName":"random-logger","ContainerImageId":"sha256:cbc554adcaabd8ce581d438a2223d2ebae3ccb84d477867b63bdfb4b91632067","Err":null}
  • Let's split this up for a better look:
{
  "Line":"{"@timestamp": "2021-07-08T10:44:08+0000", "level": "WARN", "message": "variable not in use."}",
  "Source":"stdout",
  "Timestamp":"2021-07-08T10:44:08.349998689Z",
  "Partial":false,
  "ContainerName":"/cool_shamir",
  "ContainerId":"4e07abaf3345e8d8b026826c49d3193e401e2635781dae86cbd57ec9579d18c1",
  "ContainerImageName":"random-logger",
  "ContainerImageId":"sha256:cbc554adcaabd8ce581d438a2223d2ebae3ccb84d477867b63bdfb4b91632067",
  "Err":null
}
  • Open ClickHouse CLI
$ sudo docker exec -it clickhouse bin/bash -c "clickhouse-client --multiline"

Important: Repeat this design for each service, below is a demo for service_1 JSONEachRow

# Create a MergeTree Table
CREATE TABLE service_1 (
    time DateTime Codec(DoubleDelta, LZ4),
    Line String,
    level String,
    message String,
    Source String,
    Timestamp String,
    Partial String,
    ContainerName String,
    ContainerImageName String,
    ContainerImageId String,
    Err String
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (time);
# Create Kafka Table Engine
CREATE TABLE service_1_queue (
    Line String,
    level String,
    message String,
    Source String,
    Timestamp String,
    Partial String,
    ContainerName String,
    ContainerImageName String,
    ContainerImageId String,
    Err String
)
ENGINE = Kafka
SETTINGS kafka_broker_list = '192.168.1.40:9091',
    kafka_topic_list = 'service_1',
    kafka_group_name = 'service_1_consumer_1',
    kafka_format = 'CSV',
    kafka_max_block_size = 1048576;
# Create a materialized view to transfer data
# between Kafka and the merge tree table
CREATE MATERIALIZED VIEW service_1_queue_mv TO service_1 AS
SELECT Line, level, message, Source, Timestamp, Partial, ContainerName, ContainerImageName, ContainerImageId, Err
FROM service_1_queue;
  • (Optional because the services has been producing messages already) Test the setup by producing some messages

    sudo docker exec -it kafka-broker kafka-console-producer \
    --broker-list 192.168.1.40:9091 \
    --topic readings
  • Use this data

    {"Line":"{\"@timestamp\": \"2021-07-08T10:44:08+0000\", \"level\": \"WARN\", \"message\": \"variable not in use.\"}","Source":"stdout","Timestamp":"2021-07-08T10:44:08.349998689Z","Partial":false,"ContainerName":"/cool_shamir","ContainerId":"4e07abaf3345e8d8b026826c49d3193e401e2635781dae86cbd57ec9579d18c1","ContainerImageName":"random-logger","ContainerImageId":"sha256:cbc554adcaabd8ce581d438a2223d2ebae3ccb84d477867b63bdfb4b91632067","Err":null}
    

3. Configure SuperSet to get data from ClickHouse

  1. Register a root SuperSet account

    $ sudo docker exec -it superset superset-init
  2. Connect to ClickHouse

    • Use the UI at localhost:8080
    • Add Clickhouse URI to Superset: clickhouse://clickhouse:8123
    • Then you should be able to visualize some ClickHouse Tables

Issues

To summarize, 3 things that I was not able to do yet includes:

  • Making ClickHouse handle JSON messages from Kafka (JSON mapping)
  • Merging all logging services into 1 logging container, so the demo is still using 4 containers: Kafka, Zookeeper, ClickHouse, Superset.
  • Fix the Superset - ClickHouse bug: Superset can connect to ClickHouse just fine, can recognize the table, but data on the table is always empty? Although the table is actually NOT empty.

Reference

About

Centralized Logging System for multiple Docker Containers using Apache Kafka.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published