orchestrator

`orchestrator` ¶

Preflight execution entry point.

run_preflight is the single public entry point; _run_registry handles per-check gating, failure isolation, and result aggregation.

Functions:

Name	Description
`run_preflight`	Execute all pre-flight checks against the training split.

Attributes:

Name	Type	Description
`CRASH_CODE`		Issue code used when a check raises from `enabled()` or `run()`.

`CRASH_CODE = 'preflight.check_crash'` `module-attribute` ¶

Issue code used when a check raises from enabled() or run().

`run_preflight(data, config, metadata, *, registry=None, stages=None)` ¶

Execute all pre-flight checks against the training split.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	The training split produced by `Holdout.train_test_split`. On a full run this is also post-PII replacement; on `--validate` PII replacement is skipped. Row counts, group sizes, and column statistics reflect this partition, not the original input dataset.	required
`config`	`SafeSynthesizerParameters`	Resolved configuration (`AutoConfigResolver` already ran).	required
`metadata`	`ModelMetadata`	Model metadata (tokenizer and context length).	required
`stages`	`frozenset[PreflightStage] \| None`	Optional subset of stages to execute. Used when callers need early DataFrame validation before later processing has produced the final training split.	`None`

Returns:

Type	Description
`PreflightReport`	A structured `PreflightReport`.

Source code in src/nemo_safe_synthesizer/preflight/orchestrator.py

@traced("preflight", category=LogCategory.USER)
def run_preflight(
    data: pd.DataFrame,
    config: SafeSynthesizerParameters,
    metadata: ModelMetadata,
    *,
    registry: PreflightRegistry | None = None,
    stages: frozenset[PreflightStage] | None = None,
) -> PreflightReport:
    """Execute all pre-flight checks against the training split.

    Args:
        data: The training split produced by ``Holdout.train_test_split``.
            On a full run this is also post-PII replacement; on
            ``--validate`` PII replacement is skipped. Row counts, group
            sizes, and column statistics reflect this partition, not the
            original input dataset.
        config: Resolved configuration (``AutoConfigResolver`` already ran).
        metadata: Model metadata (tokenizer and context length).
        stages: Optional subset of stages to execute. Used when callers
            need early DataFrame validation before later processing has
            produced the final training split.

    Returns:
        A structured ``PreflightReport``.
    """
    effective_registry = _registry.get_registry() if registry is None else registry
    _warn_unknown_disabled_checks(config, effective_registry)

    ctx = PreflightContext(data=data, config=config, metadata=metadata)
    report = PreflightReport(checks=_run_registry(ctx, effective_registry, stages=stages))
    n_checks = len(report.checks)
    n_skipped = sum(1 for c in report.checks if c.status == "skipped")
    n_errors = len(report.errors)
    n_warns = len(report.warnings)
    logger.user.info(
        f"Preflight: {n_checks - n_skipped} check(s) ran, {n_skipped} skipped — "
        f"{n_errors} error(s), {n_warns} warning(s)",
    )
    logger.runtime.debug(
        "Preflight complete",
        extra={
            "errors": len(report.errors),
            "warnings": len(report.warnings),
        },
    )
    return report

orchestrator

orchestrator ¶

CRASH_CODE = 'preflight.check_crash' module-attribute ¶

run_preflight(data, config, metadata, *, registry=None, stages=None) ¶

`orchestrator` ¶

`CRASH_CODE = 'preflight.check_crash'` `module-attribute` ¶

`run_preflight(data, config, metadata, *, registry=None, stages=None)` ¶