Troubleshooting NeMo Evaluator¶

Use this documentation to troubleshoot issues that can arise when you work with NVIDIA NeMo Evaluator.

Tip

You can get metric logs or get benchmark logs for COMPLETED or FAILED jobs and use them to help troubleshoot.

Hugging Face Error¶

Some benchmark evaluations may require Hugging Face access to the respective dataset or model tokenizer. If your job fails with the following errors, visit https://huggingface.co/ and log in to request access to the dataset or model.

datasets.exceptions.DatasetNotFoundError: Dataset 'allenai/wildguardmix' is a gated dataset on the Hub. Visit the dataset page at https://huggingface.co/datasets/allenai/wildguardmix to ask for access.

GatedRepoError: 403 Client Error.

Cannot access gated repo for url https://huggingface.co/<model>/resolve/main/tokenizer_config.json.
Your request to access model <model> is awaiting a review from the repo authors.

Unsupported Judge Model¶

LLM-as-a-Judge evaluates the quality of another model's output using an evaluation prompt and an evaluation criteria. The prompt applies structure to the judge's output which is then parsed by the evaluation criteria to generate a metrics score.

Not all models make good judges. If the judge produces inconsistent output and does not follow the format expected by the evaluation criteria, the evaluation can fail with parsing errors. This is commonly observed for smaller models.

Incoming request body={'messages': [{'content': 'The output string did not satisfy the constraints given in the prompt. Fix the output string and return it.\nPlease return the output in a JSON format that complies with the following schema as specified in JSON Schema:\n{"properties": {"text": {"title": "Text", "type": "string"}}, "required": ["text"], "title": "StringIO", "type": "object"}

Dataset {dataset} is not in the expected format; it needs to have the files_url property set¶

This means that either the files_url is not provided as part of the dataset specification in the config, or that the files_url is not provided in the expected format. The dataset must be a JSON object with the files_url property set, pointing to the path of the file in the NeMo Data Store in the format: hf://datasets/<dataset-namespace>/<dataset-namespace>/<file-path>.

Error connecting to inference server¶

This means that for a custom evaluation, the target LLM endpoint is unable to connect.

Inference SSL Error¶

An evaluation job that uses an HTTPS model endpoint can fail if the endpoint certificate or DNS name is not trusted by the local environment. Verify that the model URL is reachable from the host running NeMo Platform and that the endpoint presents a valid certificate for its hostname.

Error: HTTPSConnectionPool(host="<NIM Proxy URL>", port=443): Max retries exceeded with url: /v1/chat/completions (Caused by SSLError(SSLError))

Error occurred while checking the existence of file {file_ref} on NeMo Data Store¶

This could mean that the dataset is not specified correctly, or that the NeMo Data Store itself is unresponsive.

Verify that the files URL is correct and that the dataset and file exists in the NeMo Data Store.
Verify that the NeMo Data Store is responsive and reachable.

If the error contains the string Dataset {file_ref} is not present on datastore, it means that the datastore is responsive, but the file reference does not exist.

Evaluation Job Takes a Long Time¶

The time that an Evaluation job takes can vary from a few minutes to many hours, depending on the target model, config, and other factors. As long as the status is RUNNING, your job is still running. If there is a problem with your job, you will see UNAVAILABLE or FAILED.

Job cannot be launched¶

This means that one of the pre-launch validations has failed. The error contains the details about the checks that failed.

What is EVALUATOR_BASE_URL?¶

EVALUATOR_BASE_URL is a placeholder for the URL of the evaluator API in examples. For local setup, the platform API defaults to http://localhost:8080. If you changed the platform URL, use the value configured in NMP_BASE_URL or in your CLI context.

Advanced Troubleshooting¶

To troubleshoot an evaluation job that has failed, download the evaluation result archive and inspect the job logs.

These are advanced troubleshooting steps that should only be done after all other troubleshooting fails.

Evaluation Job Logs¶

To download the log files, use the download-results endpoint. This endpoint downloads the result directory containing configuration files, logs, and evaluation results for a specific evaluation run. The result directory is packaged and provided as a downloadable archive.

To download the evaluation results directory, use the following code.

curl -X 'GET' \
 '<BASE_URL>/v1/evaluation/jobs/<job-id>/download-results' \
 -H 'accept: application/json' \
 -o result.zip

After the download is complete, the log files are available inside the result.zip file. Log files can be found in the results folder with the file extension *.log.

Skip validation checks¶

When you launch an evaluation job, NeMo Evaluator performs availability checks (for example, checking if the dataset and files exist in NeMo Data Store). To speed up job launch, or due to strict constraints of validation checks, you can pass the query parameter skip_validation_checks during job launch.

Use the following code to create an evaluation job that skips validation checks.

curlPython

curl -X 'POST' \
'https://${EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"namespace": "my-organization",
"target": "<my-target-namespace/my-target-name>",
"config": "<my-config-namespace/my-config-name>"
}'

data = {
    "namespace": "my-organization",
    "target": "<my-target-namespace/my-target-name>",
    "config": "<my-config-namespace/my-config-name>",
}

endpoint = f"{EVALUATOR_BASE_URL}/v1/evaluation/jobs?skip_validation_checks=True"

response = requests.post(endpoint, json=data).json()