Use case: The kubernetes.io/container/restart_count system SLI metric provides the number of times a container has restarted. This chart may be useful to identify if a container is crashing/restarting frequently. The specific service container can be filtered out by metrics labels for a specific service's container monitoring.
The following shows using the kubernetes.io/container/restart_count metric for the Cassandra container. You can use this metric for any of the containers in the table above.
Resource types
k8s_container
Metric
kubernetes.io/container/restart_count
Filter By
namespace_name = apigee and container_name =~ .*cassandra.*
Group By
cluster_name, namespace_name, pod_name, container_name, and all k8s_container resource type labels
Aggregator
sum
Alert consideration
If a container is restarting frequently, further investigation is needed for the root cause. There are multiple reasons a container can restart, such as OOMKilled, data disk full, and configuration issues, to name a few.
Alert threshold
Depends on the SLO for the installation. For example: For production, trigger an event notification, If a container restarts more often than 5 times within 30 minutes.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-04-24 UTC."],[[["This document guides Apigee Hybrid cluster administrators and Org admins on monitoring their deployments using Service Level Indicator (SLI) metrics."],["Apigee Hybrid monitoring is categorized into traffic, database, control plane, and infrastructure, each providing specific metrics for analysis."],["Alert thresholds should be customized based on traffic patterns, Service Level Objectives (SLOs), and Service Level Agreements (SLAs), rather than relying on predetermined values."],["Three resource types, k8s_container, ProxyV2, and TargetV2, are commonly used in Google Cloud Monitoring for Apigee Hybrid metrics, each with specific labels for effective monitoring."],["Monitoring traffic involves analyzing request/response counts and latencies using Proxy and Target SLI metrics, alongside policy latencies, to understand API traffic performance and potential issues."]]],[]]