Capturing Message Content#
By default, guardrails spans carry only metadata: durations, token counts, finish reasons, and which rails activated. The prompts and responses themselves are not recorded. Content capture is an opt-in feature that adds the actual user inputs, model outputs, and rail inputs to your spans, so you can debug blocked prompts, investigate false positives in safety rails, and see exactly what a model received and returned.
Experimental Feature
The inline content-capture behavior described on this page is emitted by the opt-in IORails engine.
To enable IORails, set NEMO_GUARDRAILS_IORAILS_ENGINE=1.
IORails is an early-release feature, and the captured attribute and event names follow the OpenTelemetry GenAI semantic conventions, which are still under active development and can change.
Warning
Enabling content capture writes user inputs and model outputs to your telemetry backend. This may include personally identifiable information (PII) and other sensitive data. Only enable it when necessary, restrict access to the backend that receives the spans, and ensure compliance with your data-protection obligations.
Enabling Content Capture#
Content capture is controlled by the enable_content_capture field in the tracing section of config.yml:
tracing:
enabled: true
enable_content_capture: true # default: false
adapters:
- name: OpenTelemetry
Content is only captured when tracing is also enabled — there is no point recording content onto spans that are never exported.
Environment Variable Override#
Applies to the IORails engine only — see Engine Support below.
The OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT environment variable overrides the config field in both directions.
This gives operators a single OpenTelemetry-standard switch to flip capture across all services, regardless of what each deployed config.yml says.
|
Result |
|---|---|
|
Capture is forced on, even if |
|
Capture is forced off, even if |
Unset, empty, or any other value |
Falls through to the |
Values are case-insensitive, and surrounding whitespace is ignored.
Output Format#
Applies to the IORails engine only — see Engine Support below.
The gen_ai.* content captured on the LLM calls is emitted in one of two forms, selected by the OTEL_SEMCONV_STABILITY_OPT_IN environment variable.
This variable holds a comma-separated list of opt-in tokens.
The format selector applies only to this gen_ai.* content; the guardrails.request.* and guardrails.rail.* attributes described under Where Content Is Captured are always emitted as plain span attributes regardless of its value.
|
Format |
What is emitted |
|---|---|---|
Contains |
JSON span attributes |
Structured, JSON-encoded span attributes following the latest experimental OpenTelemetry GenAI conventions. |
Unset, or does not contain the token (default) |
Legacy span events |
One span event per message, following the earlier GenAI event conventions. |
JSON Span Attributes#
When OTEL_SEMCONV_STABILITY_OPT_IN contains gen_ai_latest_experimental, content is recorded as JSON-encoded span attributes:
Attribute |
Contents |
|---|---|
|
The non-system input messages, each as |
|
The assistant output, as a single role-wrapped message. |
|
The system messages, as a flat list of |
Each attribute is set only when it has content, so a backend can distinguish “no system instructions” from an empty string.
Legacy Span Events#
By default — when OTEL_SEMCONV_STABILITY_OPT_IN is unset or does not include the token — content is recorded as span events instead:
Event |
Emitted for |
|---|---|
|
Each system message. |
|
Each user message. |
|
Each assistant message in the input. |
|
Each tool message in the input. |
|
The assistant output (the response). |
Roles outside this set (for example, the legacy function role) are skipped; function-call events are not yet captured.
Note
This format selection is independent of the tracing.span_format config field.
span_format (opentelemetry or legacy) selects the span structure used by the LLMRails post-hoc tracing adapter, whereas OTEL_SEMCONV_STABILITY_OPT_IN selects how IORails encodes captured content on its inline spans.
Where Content Is Captured#
When capture is active, content lands on the following spans.
Span |
Kind |
Captured content |
|---|---|---|
|
SERVER |
|
|
CLIENT |
The input messages and output of every LLM call — both the main LLM and the per-rail-action LLM calls (for example, content-safety models) — using the |
|
INTERNAL |
|
The request span deliberately uses its own guardrails.request.* attributes rather than the gen_ai.* names.
On a block path the two diverge: the LLM CLIENT span records the raw model response, while the SERVER span records what the caller actually received — the refusal message.
Reusing gen_ai.output.messages on both would put different values under the same name and confuse a backend correlating them.
Because the request span and the LLM spans belong to the same trace, a backend correlates the outer guardrails request with the inner model calls through trace and span context (the trace_id and parent-child span_id relationships).
The shared gen_ai.* names across the LLM CLIENT spans do not establish that link — names repeat across requests — but they make the captured content easier to interpret once the spans are correlated.
Streaming#
For streamed responses, output chunks are accumulated and the captured output is written once, at the end of the stream. The recorded output is exactly what reached the consumer. If an output rail blocks mid-stream, the captured output reflects the truncated stream plus any injected error response — not text the caller never received. When nothing is delivered, no output is recorded.
Engine Support#
Engine |
Content capture |
|---|---|
IORails |
Preview support. The full behavior on this page — environment-variable resolution, JSON-attribute or legacy-event format selection, and capture on the request, LLM, and rail spans — is emitted by the opt-in |
LLMRails |
The |
Important Considerations#
Privacy first. Captured spans contain raw prompts and responses. Treat the receiving backend as holding sensitive data, and prefer enabling capture in development or in scoped investigations rather than broadly in production.
No truncation. Content is captured in full; there is no size limit or truncation knob. Size your exporters and backend accordingly, especially for large inputs or long streamed responses.
Evolving GenAI standards. The OpenTelemetry GenAI semantic conventions are still under active development. Attribute names, event names, and structures can change as the specification matures.
Performance. Extensive telemetry collection can affect performance, especially with large inputs and outputs. The hot-path cost is dominated by SDK-level batching and export, which your application controls.