Skip to content

Validator Parameters

When creating a ValidationColumnConfig, two parameters are used to define the validator: validator_type and validator_config. The validator_type parameter can be set to either code, local_callable or remote. The validator_config accompanying each of these is, respectively:

Classes:

Name Description
CodeValidatorParams

Configuration for code validation. Supports Python and SQL code validation.

LocalCallableValidatorParams

Configuration for local callable validation. Expects a function to be passed that validates the data.

RemoteValidatorParams

Configuration for remote validation. Sends data to a remote endpoint for validation.

CodeValidatorParams

Bases: ConfigBase

Configuration for code validation. Supports Python and SQL code validation.

Attributes:

Name Type Description
code_lang CodeLang

The language of the code to validate. Supported values include: python, sql:sqlite, sql:postgres, sql:mysql, sql:tsql, sql:bigquery, sql:ansi.

LocalCallableValidatorParams

Bases: ConfigBase

Configuration for local callable validation. Expects a function to be passed that validates the data.

Attributes:

Name Type Description
validation_function Any

Function (Callable[[pd.DataFrame], pd.DataFrame]) to validate the data. Output must contain a column is_valid of type bool.

output_schema Optional[dict[str, Any]]

The JSON schema for the local callable validator's output. If not provided, the output will not be validated.

RemoteValidatorParams

Bases: ConfigBase

Configuration for remote validation. Sends data to a remote endpoint for validation.

Attributes:

Name Type Description
endpoint_url str

The URL of the remote endpoint.

output_schema Optional[dict[str, Any]]

The JSON schema for the remote validator's output. If not provided, the output will not be validated.

timeout float

The timeout for the HTTP request in seconds. Defaults to 30.0.

max_retries int

The maximum number of retry attempts. Defaults to 3.

retry_backoff float

The backoff factor for the retry delay in seconds. Defaults to 2.0.

max_parallel_requests int

The maximum number of parallel requests to make. Defaults to 4.