LibTrace¶
LibTrace is a recipe for building domain-specific reasoning data from library APIs. It harvests docstrings, labels applicability+relevance, generates problems, solves them with a boxed-answer prompt, and gathers solutions.
The same workflow applies to chemistry, physics, and biology—swap the domain inputs and output paths.
For the full walkthrough, command examples, and configuration details see the recipes/libtrace/README.md in the repository.
Pipeline overview¶
- Harvest library docs — extract public API docstrings from the sandbox container
- Prepare inference JSONL — convert docs into an LLM-ready input file
- Label applicability + relevance — LLM classifies each doc entry
- Filter — keep only applicable, high-relevance entries
- Generate domain problems — LLM creates problems based on filtered docs
- Collect generated problems — merge and deduplicate across seeds
- Solve problems — LLM solves with
generic/general-boxedprompt and sandbox code execution - Gather solutions — compute stats and sample for training
Files¶
All LibTrace scripts live in recipes/libtrace/scripts/.
Prompt templates are in recipes/libtrace/prompts/.