Version: 1.30.2

SIGHUP Distribution Logging

Overview

SIGHUP Distribution Logging uses a collection of open source tools to provide the most resilient and robust logging stack for the cluster.

The central piece of the stack is the open source search engine opensearch, combined with its analytics and visualization platform opensearch-dashboards. The logs are collected using a node-level data collection and enrichment agent fluentbit, pushing it to the OpenSearch via fluentd. The fluentbit and fluentd stack is managed by Banzai Logging Operator. We are also providing an alternative to OpenSearch: loki.

High level diagram of the stack:

logging module

Module's repository: https://github.com/sighupio/module-logging

Packages

The following packages are included in the SIGHUP Distribution Logging module:

Package	Description
opensearch	Log storage and visualization.
logging-operator	Banzai logging operator, manages fluentbit/fluentd and their configurations
loki-distributed	Distributed Loki deployment to provide log visualization from Grafana
minio-ha	Three nodes HA MinIO deployment (optional, used as storage for Loki)

info

All the components are deployed in the logging namespace in the cluster.

Compatibility

Kubernetes Version	Compatibility	Notes
`1.27.x`	✅	No known issues
`1.28.x`	✅	No known issues
`1.29.x`	✅	No known issues
`1.30.x`	✅	No known issues

Check the compatibility matrix for additional information about previous releases of the modules.

Introduction: Logging in Kubernetes

Logs help developers and sysadmins to understand what is happening inside an application or a system, enabling them to debug and troubleshoot issues.

Pods and containers logs

Containers are designed to support logging. The easiest method to log messages for a containerized application is to write them directly into the "standard output" (stdout) and "standard error" (stderr) streams, relying on the container engine or runtime.

This is often enough to debug a live application, but container engines/runtimes often do not provide a complete logging management. For example, you may need to access logs from a crashed or deleted container, which would not be available.

In a Kubernetes cluster, when an application (Pod) writes logs on stdout/stderr streams, logs are captured by the container runtime and saved in a file inside the node that is currently running the Pod.

The Kubelet component is in charge of maintaing track of log files, which are saved inside /var/log/pods by default, and provide them through the Kubernetes APIs (for example using the kubectl logs command). It is also responsible for log rotations.

In Kubernetes, logs should have dedicated storage and lifecycle management which should be different from the one provided by nodes, pods and containers. This is commonly referred as "cluster-level logging".

Cluster-level logging architectures require a separate backend to provide storage, analysis and queries on logs. Vanilla Kubernetes does not provide a cluster-level solution for logging.

System components logging

Kubernetes' system-level components, such as the Kubelet, container runtimes and the etcd database, are not executed asd Pods inside the cluster: they are system daemons. As such, they are not subject to the log management techniques mentioned before.

System-level components log their messages through systemd/journald and are accessible using the journalctl tool inside each node.

Best-practices for application logging in Kubernetes

In this section you can find some of the commonly suggested best-practices about configuring and designing a proper logging architecture for applications inside Kubernetes:

Write logs to stdout and stderr streams and do not wirte logs to the filesystem. Leave the log capturing and rotation jobs to the underlying cluster-level logging management functionality.
Use structured logs whenever possible, for example using a json formatter, because they enable easier indexing and mapping for fields and provide powerful queries capabilities when debugging.
Find the right balance on the quantity of generated log messages. Having too many logging messages not only add "noise" when troubleshooting, but they also put the logging systems under unnecessary pressure on both CPUs and storage. For example, it is a good practice to disable DEBUG-level logging on applications running normally: you can always increase the logging level when problems arise, troubleshoot them and lower the level again.

If an application cannot write logs on stdout/stderr streams (for example, legacy applications that you cannot edit), you can find some alternative on this link, particularily the Tailer Webhook of the Logging Operator.

SD: Logging module

The Logging module provided by SD offers a cluster-level solution for logging inside a SD cluster.

The module includes:

Fluentbit agents running on each node, which collect and enrich logs coming from both Pods and system daemons and ship them to Fluentd.
Fluentd instances, which filter and ship log messages to the centralized log storage.
A centralized log storage system (OpenSearch or Loki from Grafana).
A system to view and query log messages (OpenSearch Dashboards or Grafana¹)

SD Logging diagram

This module can be configured in four different ways:

Disabled
OpenSearch (default): installs the Logging Operator with pre-configured Flows e Outputs, an HA MinIO instance, and OpenSearch in a single instance or with three instances to provide an HA installation.
Loki: installs the same components as the OpenSearch option, using Loki as the storage provider instead of OpenSearch.
customOutputs: installs Logging Operator with pre-configured Flows, without Outputs and storage. This option lets you configure the Outputs directly in the furyctl.yaml file, where you must specify the destination for each Flow (for example, an off-cluster instance of Loki).

Log collection

The fluentbit and Fluentd stack is managed and configured using the Logging Operator (ex-Banzai Cloud Logging Operator). The operator provides some CRDs (Custom Resources Definitions), including:

Flows and ClusterFlows: they define which log messages to collect and to which Output/ClusterOutput they must be shipped.
Outputs and ClusterOutputs: they define the storage destination of Flows/ClusterFlows.

ℹ️ INFO

To simplify the wording, this document will use Flow to indicate both Flows and ClusterFlows, and Output to indicate both Outputs and ClusterOutputs.

The Logging module includes the following Flows, each with its respective Output (using the same name):

audit: Kubernetes API server's audit logs. SD configures Kubernetes API by default to record the most relevant security events on the cluster (audit-logs).
events: Kubernetes events (equivalent to kubectl get events).
infra: logs written by Pods inside the "infra" namespace (kube-system, logging, monitoring, etc.), which provide infrastructural services for a SD cluster and are not application workload. This includes, for example, logs from the logging system itself.
ingressNginx: logs written bt Ingress NGINX Controller's pods inside the cluster. Logs are processed by a parsed and fields are mapped in a standardized structure.
kubernetes: logs written by Pods on non-"infra" namespaces. Basically, this Flow includes application workload's logs.
systemdCommon: logs written by system daemons running inside the cluster nodes.
systemdEtcd: logs written by the etcd daemons.
errors: logs which cannot be processed by the logging stack are sent to an internal MinIO bucket to enable debugging in case of errors. The bucket has a 7-day retention policy.

💡 TIP

Each Flow has its dedicated index in OpenSearch, to provide a simple way to visualize logs from a specific category inside OpenSearch Dashboards.

Log storage

By default, Flows are sent to a centralized log storage deployed inside the cluster. The Logging modules provides 2 options for this storage:

OpenSearch (default): forked from ElasticSeach, it's an open-source, distributed suite that provides storage, analytics and querying capabilities for data. SD can install OpenSearch in a single replica, which is suitable for dev/test environments, or using 3 replicas to provide high availability, which is more suitable for production environments.

OpenSearch can be compared to a database, uses an index system for the data and enables full-text search on logs. OpenSearch can be vertically scaled to provide more computational power when needed, for example when the quantity of ingested logs is growing.
Grafana Loki: Loki from Grafana Labs, it's an highly-available, distributed system that enables log aggregation. It can be horizontaly scaled and provides multi-tenancy.

Loki uses a time-series databases, saves chuncks of data in an S3-compatible object storage system and does not use and index system. Loki can filter logs using labels (similar to Prometheus) and thanks to it's S3 interface, it enables to save more data and gives you easier access to older logs if compared to OpenSearch.

On the downside, being a distributed system based on micro-services, Loki has more components to be managed.

Remote Storage

When configuring a SD cluster using the furyctl.yaml file, if you specify the Logging module's type to be customOutputs, Logging Operator will be installed with pre-configured Flows without Outputs, which have to be customized by the user, and it will not install a storage solution inside the cluster.

Outputs are defined inside the same furyctl.yaml file and you can specify any Output type supported by Logging Operator. For example, you can send a Flow to a syslog server, send another Flow to an off-cluster Loki instance and all other flow to an S3 bucket in AWS. You can also choose to not send logs from a Flow anywhere using the nullout option.

Example:

spec:
  distribution:
    modules:
      logging:
        type: customOutputs
        customOutputs:
          audit: |-
            syslog:
              host: SYSLOG-HOST
              port: 123
              buffer:
                timekey: 1m
                timekey_wait: 10s
                timekey_use_utc: true
          events: |-
            nullout: {}
          infra: |-
            s3:
              aws_key_id:
                value: minio
              aws_sec_key:
                value: minio123
              s3_bucket: infra
              s3_region: local
              s3_endpoint: 'http://minio.mycompany:9000'
              force_path_style: 'true'
              path: logs/${tag}/%Y/%m/%d/
              buffer:
                timekey: 10m
                timekey_wait: 30s
                timekey_use_utc: true
          ingressNginx: |-
            nullout: {}
          kubernetes: |-
            loki:
              url: http://loki.mycompany:3100
              extract_kubernetes_labels: true
              configure_kubernetes_labels: true
              extra_labels:
                flow: kubernetes
              buffer:
                timekey: "1m"
                timekey_wait: "10s"
                timekey_use_utc: true
                chunk_limit_size: "2m"
                retry_max_interval: "30"
                retry_forever: true
                overflow_action: "block"
          systemdCommon: |-
            nullout: {}
          systemdEtcd: |-
            nullout: {}
          errors: |-
            nullout: {}
...

Querying and retrieving logs

Once saved inside the dedicated storage, logs collected by the logging stack can be retrieved to be visualized and queried with a UI.

For OpenSearch-based logging, SD provides OpenSearch Dashboards for visualization, search and query capabilities for both real-time and historical logs with DQL.

On the other hand, with Loki-based logging you can use the same tooling that is used to visualize and query metrics: Grafana. With Grafana you can create custom dashboards to show both metrics and logs (and also traces, if you install the Tracing module) of a component on the same page, using both in real-time and historical data with the Explore option and LogQL.

Provided Flows and CusterFlows

This module provides the following Flows and ClusterFlows out of the box:

configs/kubernetes: only the cluster wide pods logging configuration (infrastructural namespaced excluded).
configs/infra: only the infrastructural namespaces logs
configs/ingress-nginx: only the nginx-ingress-controller logging configuration.
configs/audit: all the Kubernetes audit logs related configurations (with master selector and tolerations).
configs/events: all the Kubernetes events related configurations (with master selector and tolerations).
configs/systemd: all the systemd related configurations.
configs/systemd/kubelet: kubelet, docker, ssh systemd service logs configuration.
configs/systemd/etcd: only the etcd service logs configuration (with master selector and tolerations).

You can find more info on logging in Kubernetes at the following links:

Grafana is a component of SD's Monitoring module, so it's not included inside the Logging module. They are configured to integrate nicely with each other out of the box in SD. ↩

Overview​

Packages​

Compatibility​

Introduction: Logging in Kubernetes​

Pods and containers logs​

System components logging​

Best-practices for application logging in Kubernetes​

SD: Logging module​

Log collection​

Log storage​

Remote Storage​

Querying and retrieving logs​

Provided Flows and CusterFlows​

Read more​

Footnotes​