Metrics for Guardrails#

Metrics give a low-overhead, aggregate view of the behavior of the NeMo Guardrails library in production. While tracing answers what happens on a particular request, metrics answer how the NeMo Guardrails library has behaved over the last five minutes.

The IORails engine emits OpenTelemetry metrics inline as requests flow through it. These metrics are independent of tracing, so you can enable either signal alone or both together.

Experimental Feature

Metrics currently require the opt-in IORails engine. To enable IORails, set NEMO_GUARDRAILS_IORAILS_ENGINE=1. IORails is an early-release feature, and metric names can change as the OpenTelemetry GenAI semantic conventions evolve.

With metrics, you can:

Track request volume, error rate, and latency for SLO dashboards.
Monitor how many requests are buffered and in-flight.
Measure downstream LLM token usage and operation latency for cost and performance analysis.
Alert on rejected requests, blocked requests, or rising error rates.

Engine Support#

Engine	Metrics
IORails	Preview support. All metrics described on this page are emitted by the opt-in `IORails` engine.
LLMRails	Not supported. For LLMRails observability, use tracing; refer to Tracing Guardrails.

Independent of Tracing#

Metrics and tracing are configured separately and can be toggled independently. A common production pattern is metrics-only: lightweight aggregate signals without the cost of full trace export.

tracing:
  enabled: false

metrics:
  enabled: true

The two signals share the same opentelemetry-api dependency (installed with pip install nemoguardrails[tracing]), but otherwise have separate SDK configuration in your application code: a TracerProvider for traces and a MeterProvider for metrics.

Metric Categories#

Two families of metrics are emitted.

Family	Prefix	Purpose
Request-level	`guardrails.*`	Volume, latency, errors, blocked requests, queue and stream saturation.
LLM client-side	`gen_ai.client.*`	Per-LLM-call token usage, operation duration, streaming chunk timing. Follows the OpenTelemetry GenAI semantic conventions.

For the full list of metric names, types, labels, and units, refer to Metric Reference.

Important Considerations#

Library and SDK split. The NeMo Guardrails library depends on the OpenTelemetry API only. Your application configures the SDK: MeterProvider, exporters, and periodic readers. Without a MeterProvider, the API returns a no-op meter and all emissions are silently discarded. This is the same library-instrumentation pattern used by the tracing path.
Evolving GenAI standards. The OpenTelemetry GenAI semantic conventions are still under active development. Metric names, labels, and bucket boundaries can change as the spec matures. Pin your opentelemetry-sdk version and review release notes before upgrading.
Cardinality. Labels on the emitted metrics are deliberately low-cardinality (rail type, error class name, model name, provider name, token type). Avoid adding views in your SDK that introduce high-cardinality dimensions like user IDs or request IDs.
Performance. The hot-path overhead is bounded. Counters and histograms are recorded with simple atomic operations. The cost is dominated by SDK-level batching and export, which the host application controls.

Contents#

Enable Guardrails Metrics — Minimal setup to enable metrics using the OpenTelemetry SDK with console output.
OpenTelemetry Metrics Integration — Production-ready OpenTelemetry SDK configuration with OTLP and Prometheus exporters.
Metric Reference — Full reference for every metric: name, instrument type, unit, labels, and emission semantics.
Guardrails Metrics — Common metrics issues and solutions.