Skip to content

Latest commit

 

History

History
30 lines (27 loc) · 3.84 KB

File metadata and controls

30 lines (27 loc) · 3.84 KB

Metrics

TGI exposes multiple metrics that can be collected via the /metrics Prometheus endpoint. These metrics can be used to monitor the performance of TGI, autoscale deployment and to help identify bottlenecks.

The following metrics are exposed:

Metric NameDescriptionTypeUnit
tgi_batch_current_max_tokensMaximum tokens for the current batchGaugeCount
tgi_batch_current_sizeCurrent batch sizeGaugeCount
tgi_batch_decode_durationTime spent decoding a batch per method (prefill or decode)HistogramSeconds
tgi_batch_filter_durationTime spent filtering batches and sending generated tokens per method (prefill or decode)HistogramSeconds
tgi_batch_forward_durationBatch forward duration per method (prefill or decode)HistogramSeconds
tgi_batch_inference_countInference calls per method (prefill or decode)CounterCount
tgi_batch_inference_durationBatch inference durationHistogramSeconds
tgi_batch_inference_successNumber of successful inference calls per method (prefill or decode)CounterCount
tgi_batch_next_sizeBatch size of the next batchHistogramCount
tgi_queue_sizeCurrent queue sizeGaugeCount
tgi_request_countTotal number of requestsCounterCount
tgi_request_durationTotal time spent processing the request (e2e latency)HistogramSeconds
tgi_request_generated_tokensGenerated tokens per requestHistogramCount
tgi_request_inference_durationRequest inference durationHistogramSeconds
tgi_request_input_lengthInput token length per requestHistogramCount
tgi_request_max_new_tokensMaximum new tokens per requestHistogramCount
tgi_request_mean_time_per_token_durationMean time per token per request (inter-token latency)HistogramSeconds
tgi_request_queue_durationTime spent in the queue per requestHistogramSeconds
tgi_request_skipped_tokensSpeculated tokens per requestHistogramCount
tgi_request_successNumber of successful requestsCounter
tgi_request_validation_durationTime spent validating the requestHistogramSeconds
close