Use case: The kubernetes.io/container/restart_count system SLI metric provides the number of times a container has restarted. This chart may be useful to identify if a container is crashing/restarting frequently. The specific service container can be filtered out by metrics labels for a specific service's container monitoring.
The following shows using the kubernetes.io/container/restart_count metric for the Cassandra container. You can use this metric for any of the containers in the table above.
Resource types
k8s_container
Metric
kubernetes.io/container/restart_count
Filter By
namespace_name = apigee and container_name =~ .*cassandra.*
Group By
cluster_name, namespace_name, pod_name, container_name, and all k8s_container resource type labels
Aggregator
sum
Alert consideration
If a container is restarting frequently, further investigation is needed for the root cause. There are multiple reasons a container can restart, such as OOMKilled, data disk full, and configuration issues, to name a few.
Alert threshold
Depends on the SLO for the installation. For example: For production, trigger an event notification, If a container restarts more often than 5 times within 30 minutes.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-04-24 UTC."],[[["This document outlines how to monitor an Apigee Hybrid deployment, covering key areas like traffic, database, Apigee control plane, and infrastructure."],["Apigee Hybrid uses Service Level Indicator (SLI) metrics, categorized by Resource Types such as `k8s_container`, `ProxyV2`, and `TargetV2`, to assess application and system service performance."],["Alert thresholds for monitoring should be customized based on traffic patterns and Service Level Agreements (SLAs), with the document recommending the use of \"Warning\" and \"Critical\" levels for notifications and alerting."],["Monitoring API traffic involves analyzing request and response counts, error rates, and latencies for both Proxies and Targets, with specific metrics and example queries provided for each."],["Cassandra database health can be assessed by monitoring read and write request rates and latencies, while the Apigee control plane's health can be assessed using synchronizer metrics that track the number of requests made to and responses received from it."]]],[]]