LibTrace¶

LibTrace is a recipe for building domain-specific reasoning data from library APIs. It harvests docstrings, labels applicability+relevance, generates problems, solves them with a boxed-answer prompt, and gathers solutions.

The same workflow applies to chemistry, physics, and biology—swap the domain inputs and output paths.

For the full walkthrough, command examples, and configuration details see the recipes/libtrace/README.md in the repository.

Pipeline overview¶

Harvest library docs — extract public API docstrings from the sandbox container
Prepare inference JSONL — convert docs into an LLM-ready input file
Label applicability + relevance — LLM classifies each doc entry
Filter — keep only applicable, high-relevance entries
Generate domain problems — LLM creates problems based on filtered docs
Collect generated problems — merge and deduplicate across seeds
Solve problems — LLM solves with generic/general-boxed prompt and sandbox code execution
Gather solutions — compute stats and sample for training

Files¶

All LibTrace scripts live in recipes/libtrace/scripts/. Prompt templates are in recipes/libtrace/prompts/.