# CLAUDE.md — llmqt ## Project overview `llmqt` (LLM Query Tester) is a single-file Python CLI that batch-tests multiple LLM models against a set of queries. Results are written as Markdown files with per-query stats and optional reasoning sections. ## Structure ``` llmqt/ llmqt.py # entire implementation — single module pyproject.toml # build/install config; declares `llmqt` console script example_test.yaml # MUST be kept up to date with every config format change example_system_prompt.md # system prompt used by example_test.yaml README.md CLAUDE.md .gitignore ``` ## Installation ```bash pip install -e . ``` Registers the `llmqt` entry point from `pyproject.toml` so the command works from any directory. ## CLI signature ``` llmqt [config2.yaml ...] ``` - **First argument**: path to a `.md` file containing the system prompt (resolved from CWD) - **Remaining arguments**: one or more test config files (YAML or JSON) ## Environment variables | Variable | Required | Purpose | |------------------|----------|------------------------------------------------------| | `OPENAI_API_KEY` | Yes | API key | | `OPENAI_API_BASE`| No | Custom base URL for OpenAI-compatible endpoints | `OPENAI_BASE_URL` is also accepted as an alias for `OPENAI_API_BASE`. ## Config file format (YAML or JSON) **IMPORTANT: whenever the config format changes, update `example_test.yaml` to reflect it.** The system prompt is **not** part of the config file — it is passed as the first CLI argument. ### YAML example ```yaml models: - gpt-4o-mini - gpt-4o queries: - "First query text" - "Second query text" ``` ### JSON equivalent ```json { "models": ["gpt-4o-mini", "gpt-4o"], "queries": ["First query text", "Second query text"] } ``` ### Field reference | Field | Type | Description | |-----------|-----------------|------------------------------------------------------| | `models` | list of strings | Model names; any OpenAI-compatible identifier | | `queries` | list of strings | Queries sent to each model in listed order | ## Execution logic ``` for each config file: for each model: for each query: POST to API (with timing), wait for response write /.md (in CWD) ``` Output directory is always relative to the **current working directory**, not the config file location. This lets the user run `llmqt ~/configs/prompt.md ~/configs/test1.yaml` from any writable directory and have outputs land there. ## Filename sanitization Model names are sanitized for filesystem safety: characters outside `[A-Za-z0-9._- ]` are replaced with `_`. E.g. `anthropic/claude-3` → `anthropic_claude-3.md`. ## Reasoning detection Checked in this order: 1. `message.reasoning_content` attribute (DeepSeek API / some OpenAI-compatible endpoints) 2. `...` tags in the response content (DeepSeek R1, QwQ open-source models) If reasoning is found it is stripped from the answer and rendered in a separate section. ## Output format per model file ```markdown # **Config:** `test1.yaml` ## Statistics | Query | Elapsed | Prompt tok | Completion tok | Total tok | tok/s | |-------|---------|------------|----------------|-----------|-------| | 1 | 1.2s | 45 | 120 | 165 | 100.0 | | Total | 1.2s | 45 | 120 | 165 | 100.0 | --- ## Query 1 > *1.2s · 120 completion tokens · 100.0 tok/s* ### Reasoning ← only present when reasoning was detected ### Response --- ``` ## Dependencies - `openai >= 1.0.0` — API client - `pyyaml >= 6.0` — YAML parsing (imported lazily; JSON works without it)