Metrics for Guardrails#

Metrics give a low-overhead, aggregate view of the behavior of the NeMo Guardrails library in production. While tracing answers what happens on a particular request, metrics answer how the NeMo Guardrails library has behaved over the last five minutes.

The IORails engine emits OpenTelemetry metrics inline as requests flow through it. These metrics are independent of tracing, so you can enable either signal alone or both together.

Experimental Feature

Metrics currently require the opt-in IORails engine. To enable IORails, set NEMO_GUARDRAILS_IORAILS_ENGINE=1. IORails is an early-release feature, and metric names can change as the OpenTelemetry GenAI semantic conventions evolve.

With metrics, you can:

  • Track request volume, error rate, and latency for SLO dashboards.

  • Monitor how many requests are buffered and in-flight.

  • Measure downstream LLM token usage and operation latency for cost and performance analysis.

  • Alert on rejected requests, blocked requests, or rising error rates.

Engine Support#

Engine

Metrics

IORails

Preview support. All metrics described on this page are emitted by the opt-in IORails engine.

LLMRails

Not supported. For LLMRails observability, use tracing; refer to Tracing Guardrails.

Independent of Tracing#

Metrics and tracing are configured separately and can be toggled independently. A common production pattern is metrics-only: lightweight aggregate signals without the cost of full trace export.

tracing:
  enabled: false

metrics:
  enabled: true

The two signals share the same opentelemetry-api dependency (installed with pip install nemoguardrails[tracing]), but otherwise have separate SDK configuration in your application code: a TracerProvider for traces and a MeterProvider for metrics.

Metric Categories#

Two families of metrics are emitted.

Family

Prefix

Purpose

Request-level

guardrails.*

Volume, latency, errors, blocked requests, queue and stream saturation.

LLM client-side

gen_ai.client.*

Per-LLM-call token usage, operation duration, streaming chunk timing. Follows the OpenTelemetry GenAI semantic conventions.

For the full list of metric names, types, labels, and units, refer to Metric Reference.

Important Considerations#

  • Library and SDK split. The NeMo Guardrails library depends on the OpenTelemetry API only. Your application configures the SDK: MeterProvider, exporters, and periodic readers. Without a MeterProvider, the API returns a no-op meter and all emissions are silently discarded. This is the same library-instrumentation pattern used by the tracing path.

  • Evolving GenAI standards. The OpenTelemetry GenAI semantic conventions are still under active development. Metric names, labels, and bucket boundaries can change as the spec matures. Pin your opentelemetry-sdk version and review release notes before upgrading.

  • Cardinality. Labels on the emitted metrics are deliberately low-cardinality (rail type, error class name, model name, provider name, token type). Avoid adding views in your SDK that introduce high-cardinality dimensions like user IDs or request IDs.

  • Performance. The hot-path overhead is bounded. Counters and histograms are recorded with simple atomic operations. The cost is dominated by SDK-level batching and export, which the host application controls.

Contents#