SIGHUP Distribution Logging
Overviewâ
SIGHUP Distribution Logging uses a collection of open source tools to provide the most resilient and robust logging stack for the cluster.
The central piece of the stack is the open source search engine opensearch, combined with its analytics and visualization platform opensearch-dashboards. The logs are collected using a node-level data collection and enrichment agent fluentbit, pushing it to the OpenSearch via fluentd. The fluentbit and fluentd stack is managed by Banzai Logging Operator. We are also providing an alternative to OpenSearch: loki.
High level diagram of the stack:
Module's repository: https://github.com/sighupio/module-logging
Packagesâ
The following packages are included in the SIGHUP Distribution Logging module:
Package | Description |
---|---|
opensearch | Log storage and visualization. |
logging-operator | Banzai logging operator, manages fluentbit/fluentd and their configurations |
loki-distributed | Distributed Loki deployment to provide log visualization from Grafana |
minio-ha | Three nodes HA MinIO deployment (optional, used as storage for Loki) |
All the components are deployed in the logging
namespace in the cluster.
Compatibilityâ
Kubernetes Version | Compatibility | Notes |
---|---|---|
1.27.x | â | No known issues |
1.28.x | â | No known issues |
1.29.x | â | No known issues |
1.30.x | â | No known issues |
Check the compatibility matrix for additional information about previous releases of the modules.
Introduction: Logging in Kubernetesâ
Logs help developers and sysadmins to understand what is happening inside an application or a system, enabling them to debug and troubleshoot issues.
Pods and containers logsâ
Containers are designed to support logging. The easiest method to log messages for a containerized application is to write them directly into the "standard output" (stdout
) and "standard error" (stderr
) streams, relying on the container engine or runtime.
This is often enough to debug a live application, but container engines/runtimes often do not provide a complete logging management. For example, you may need to access logs from a crashed or deleted container, which would not be available.
In a Kubernetes cluster, when an application (Pod) writes logs on stdout
/stderr
streams, logs are captured by the container runtime and saved in a file inside the node that is currently running the Pod.
The Kubelet component is in charge of maintaing track of log files, which are saved inside /var/log/pods
by default, and provide them through the Kubernetes APIs (for example using the kubectl logs
command). It is also responsible for log rotations.
In Kubernetes, logs should have dedicated storage and lifecycle management which should be different from the one provided by nodes, pods and containers. This is commonly referred as "cluster-level logging".
Cluster-level logging architectures require a separate backend to provide storage, analysis and queries on logs. Vanilla Kubernetes does not provide a cluster-level solution for logging.
System components loggingâ
Kubernetes' system-level components, such as the Kubelet, container runtimes and the etcd
database, are not executed asd Pods inside the cluster: they are system daemons. As such, they are not subject to the log management techniques mentioned before.
System-level components log their messages through systemd
/journald
and are accessible using the journalctl
tool inside each node.
Best-practices for application logging in Kubernetesâ
In this section you can find some of the commonly suggested best-practices about configuring and designing a proper logging architecture for applications inside Kubernetes:
- Write logs to
stdout
andstderr
streams and do not wirte logs to the filesystem. Leave the log capturing and rotation jobs to the underlying cluster-level logging management functionality. - Use structured logs whenever possible, for example using a
json
formatter, because they enable easier indexing and mapping for fields and provide powerful queries capabilities when debugging. - Find the right balance on the quantity of generated log messages. Having too many logging messages not only add "noise" when troubleshooting, but they also put the logging systems under unnecessary pressure on both CPUs and storage. For example, it is a good practice to disable
DEBUG
-level logging on applications running normally: you can always increase the logging level when problems arise, troubleshoot them and lower the level again.
If an application cannot write logs on stdout
/stderr
streams (for example, legacy applications that you cannot edit), you can find some alternative on this link, particularily the Tailer Webhook of the Logging Operator.
SD: Logging moduleâ
The Logging module provided by SD offers a cluster-level solution for logging inside a SD cluster.
The module includes:
Fluentbit
agents running on each node, which collect and enrich logs coming from both Pods and system daemons and ship them toFluentd
.Fluentd
instances, which filter and ship log messages to the centralized log storage.- A centralized log storage system (OpenSearch or Loki from Grafana).
- A system to view and query log messages (OpenSearch Dashboards or Grafana1)
This module can be configured in four different ways:
- Disabled
- OpenSearch (default): installs the Logging Operator with pre-configured Flows e Outputs, an HA MinIO instance, and OpenSearch in a single instance or with three instances to provide an HA installation.
- Loki: installs the same components as the OpenSearch option, using Loki as the storage provider instead of OpenSearch.
- customOutputs: installs Logging Operator with pre-configured Flows, without Outputs and storage. This option lets you configure the Outputs directly in the
furyctl.yaml
file, where you must specify the destination for each Flow (for example, an off-cluster instance of Loki).
Log collectionâ
The fluentbit
and Fluentd
stack is managed and configured using the Logging Operator (ex-Banzai Cloud Logging Operator). The operator provides some CRDs (Custom Resources Definitions), including:
- Flows and ClusterFlows: they define which log messages to collect and to which Output/ClusterOutput they must be shipped.
- Outputs and ClusterOutputs: they define the storage destination of Flows/ClusterFlows.
âšī¸ INFO
To simplify the wording, this document will use
Flow
to indicate bothFlows
andClusterFlows
, andOutput
to indicate bothOutputs
andClusterOutputs
.
The Logging module includes the following Flows, each with its respective Output (using the same name):
audit
: Kubernetes API server's audit logs. SD configures Kubernetes API by default to record the most relevant security events on the cluster (audit-logs).events
: Kubernetes events (equivalent tokubectl get events
).infra
: logs written by Pods inside the "infra" namespace (kube-system
,logging
,monitoring
, etc.), which provide infrastructural services for a SD cluster and are not application workload. This includes, for example, logs from the logging system itself.ingressNginx
: logs written bt Ingress NGINX Controller's pods inside the cluster. Logs are processed by a parsed and fields are mapped in a standardized structure.kubernetes
: logs written by Pods on non-"infra" namespaces. Basically, this Flow includes application workload's logs.systemdCommon
: logs written by system daemons running inside the cluster nodes.systemdEtcd
: logs written by theetcd
daemons.errors
: logs which cannot be processed by the logging stack are sent to an internal MinIO bucket to enable debugging in case of errors. The bucket has a 7-day retention policy.