Kubeadm-sm
This package provides monitoring for the following Kubernetes components:
- kubelet
- coredns
- api-server
- kube-control-manager
- kube-scheduler
- etcd
These are components needed to deliver a functioning Kubernetes cluster. If you want to learn more about these components please follow the official documentation of Kubernetes.
Requirements
- Kubernetes >=
1.20.0 - Kustomize =
v3.3.0 - prometheus-operator
Configuration
Prometheus scrapes Kubernetes component metrics on port metrics with following
intervals:
- kube-control-manager:
30s - coredns:
15s - etcd:
15s - api-server:
30s - kubelet:
30s - kube-scheduler:
30s - Dashboards shipped:
coredns: CoreDNS < 1.7.0api-server: Kubernetes / API servercluster-total: Kubernetes / Networking / Clusterkubelet: Kubernetes / Kubeletnamespace-by-pod: Kubernetes / Networking / Namespace (Pods)namespace-by-workload: Kubernetes / Networking / Namespace (Workload)persistent-volumes-usage: Kubernetes / Persistent Volumespod-total: Kubernetes / Networking / Podworkload-total: Kubernetes / Networking / Workloadcontroller-manager: Kubernetes / Controller Manageretcd: Etcdscheduler: Kubernetes / Scheduler
Alerts
The followings alerts are already defined for this package.
kubernetes-absent-kubeadm
| Parameter | Description | Severity | Interval |
|---|---|---|---|
| KubeControllerManagerDown | This alert fires if Prometheus target discovery was not able to reach the kube-controller-manager in the last 15 minutes. | critical | 15m |
| KubeSchedulerDown | This alert fires if Prometheus target discovery was not able to reach the kube-scheduler in the last 15 minutes. | critical | 15m |
| KubeClientCertificateExpiration | This alert fires when the Kubernetes API client certificate is expiring in less than 30 days. | warning | |
| KubeClientCertificateExpiration | This alert fires when the Kubernetes API client certificate is expiring in less than 7 days. | critical |
coredns
| Parameter | Description | Severity | Interval |
|---|---|---|---|
| CoreDNSPanic | This alert fires if CoreDNS total panic count increased by at least 1 in the last 10 minutes. | warning | |
| CoreDNSRequestsLatency | This alert fires if CoreDNS 99th percentile requests latency was higher than 100ms in the last 10 minutes. | warning | 10m |
| CoreDNSHealthRequestsLatency | This alert fires if CoreDNS 99th percentile health requests latency was higher than 10ms in the last 10 minutes. | warning | 10m |
| CoreDNSProxyRequestsLatency | This alert fires if CoreDNS 99th percentile proxy requests latency was higher than 500ms in the last 10 minutes. | warning | 10m |
etcd3
| Parameter | Description | Severity | Interval |
|---|---|---|---|
| EtcdInsufficientMembers | This alert fires if less than half of Etcd cluster members were online in the last 3 minutes. | critical | 3m |
| EtcdNoLeader | This alert fires if the Etcd cluster had no leader in the last minute. | critical | 1m |
| EtcdHighNumberOfLeaderChanges | This alert fires if the Etcd cluster changed leader more than 3 times in the last hour. | warning | |
| EtcdHighNumberOfFailedProposals | This alert fires if there were more than 5 proposal failure in the last hour. | warning | |
| EtcdHighFsyncDurations | This alert fires if the WAL fsync 99th percentile latency was higher than 0.5s in the last 10 minutes. | warning | 10m |
| EtcdHighCommitDurations | This alert fires if the backend commit 99th percentile latency was higher than 0.25s in the last 10 minutes. | warning | 10m |