Kubeadm-sm
This package provides monitoring for the following Kubernetes components:
- kubelet
- coredns
- api-server
- kube-control-manager
- kube-scheduler
- etcd
These are components needed to deliver a functioning Kubernetes cluster. If you want to learn more about these components please follow the official documentation of Kubernetes.
Requirements
- Kubernetes >=
1.20.0
- Kustomize =
v3.3.0
- prometheus-operator
Configuration
Prometheus scrapes Kubernetes component metrics on port metrics
with following
intervals:
- kube-control-manager:
30s
- coredns:
15s
- etcd:
15s
- api-server:
30s
- kubelet:
30s
- kube-scheduler:
30s
- Dashboards shipped:
coredns
: CoreDNS < 1.7.0api-server
: Kubernetes / API servercluster-total
: Kubernetes / Networking / Clusterkubelet
: Kubernetes / Kubeletnamespace-by-pod
: Kubernetes / Networking / Namespace (Pods)namespace-by-workload
: Kubernetes / Networking / Namespace (Workload)persistent-volumes-usage
: Kubernetes / Persistent Volumespod-total
: Kubernetes / Networking / Podworkload-total
: Kubernetes / Networking / Workloadcontroller-manager
: Kubernetes / Controller Manageretcd
: Etcdscheduler
: Kubernetes / Scheduler
Alerts
The followings alerts are already defined for this package.
kubernetes-absent-kubeadm
Parameter | Description | Severity | Interval |
---|---|---|---|
KubeControllerManagerDown | This alert fires if Prometheus target discovery was not able to reach the kube-controller-manager in the last 15 minutes. | critical | 15m |
KubeSchedulerDown | This alert fires if Prometheus target discovery was not able to reach the kube-scheduler in the last 15 minutes. | critical | 15m |
KubeClientCertificateExpiration | This alert fires when the Kubernetes API client certificate is expiring in less than 30 days. | warning | |
KubeClientCertificateExpiration | This alert fires when the Kubernetes API client certificate is expiring in less than 7 days. | critical |
coredns
Parameter | Description | Severity | Interval |
---|---|---|---|
CoreDNSPanic | This alert fires if CoreDNS total panic count increased by at least 1 in the last 10 minutes. | warning | |
CoreDNSRequestsLatency | This alert fires if CoreDNS 99th percentile requests latency was higher than 100ms in the last 10 minutes. | warning | 10m |
CoreDNSHealthRequestsLatency | This alert fires if CoreDNS 99th percentile health requests latency was higher than 10ms in the last 10 minutes. | warning | 10m |
CoreDNSProxyRequestsLatency | This alert fires if CoreDNS 99th percentile proxy requests latency was higher than 500ms in the last 10 minutes. | warning | 10m |
etcd3
Parameter | Description | Severity | Interval |
---|---|---|---|
EtcdInsufficientMembers | This alert fires if less than half of Etcd cluster members were online in the last 3 minutes. | critical | 3m |
EtcdNoLeader | This alert fires if the Etcd cluster had no leader in the last minute. | critical | 1m |
EtcdHighNumberOfLeaderChanges | This alert fires if the Etcd cluster changed leader more than 3 times in the last hour. | warning | |
EtcdHighNumberOfFailedProposals | This alert fires if there were more than 5 proposal failure in the last hour. | warning | |
EtcdHighFsyncDurations | This alert fires if the WAL fsync 99th percentile latency was higher than 0.5s in the last 10 minutes. | warning | 10m |
EtcdHighCommitDurations | This alert fires if the backend commit 99th percentile latency was higher than 0.25s in the last 10 minutes. | warning | 10m |