# llmqt — LLM Query Tester Batch-test multiple LLM models against a set of queries. Results are saved as nicely formatted Markdown files — one per model — including per-query stats and a summary table. ## Install ```bash pip install -e . ``` This installs the `llmqt` command into your PATH. ## Setup Export your API credentials: ```bash export OPENAI_API_KEY=your_key_here export OPENAI_API_BASE=https://your-endpoint/v1 # optional, for custom/local endpoints ``` ## Usage ```bash llmqt [config2.yaml ...] ``` Examples: ```bash llmqt prompt.md test1.yaml llmqt prompt.md test1.yaml test2.yaml test3.json ``` Outputs are written to `.//.md` in the current working directory. ## Config file format YAML (`.yaml` / `.yml`) and JSON (`.json`) are both supported. ```yaml models: - gpt-4o-mini - gpt-4o queries: - "What is the capital of France?" - "Explain TCP vs UDP." - "Write a Python prime-checker function." ``` See [example_test.yaml](example_test.yaml) and [example_system_prompt.md](example_system_prompt.md). ## Output format For `llmqt prompt.md test1.yaml` with models `gpt-4o-mini` and `gpt-4o`: ``` test1/ gpt-4o-mini.md gpt-4o.md ``` Each file contains: - A **statistics table** (elapsed time, prompt/completion tokens, tok/s per query + totals) - For each query: the query text, per-query stats, optional **Reasoning** section (if the model returns chain-of-thought), and the **Response** ### Reasoning detection Reasoning content is extracted automatically from: - The `reasoning_content` field on the message (DeepSeek API style) - `...` tags in the response content (DeepSeek R1 / QwQ open-source style) ## Execution order ``` for each config file: for each model: for each query → POST to API, wait for response write /.md in CWD ```