Initial commit: llmqt LLM Query Tester

Single-file Python CLI to batch-test multiple LLM models with predefined queries. Supports YAML/JSON config, reasoning detection (<think> tags and reasoning_content field), per-query token/speed stats, and graceful API error handling. Install with `pip install -e .` to get the `llmqt` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:25:34 +02:00
commit a45ced89de
7 changed files with 542 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,82 @@
+# llmqt — LLM Query Tester
+
+Batch-test multiple LLM models against a set of queries. Results are saved as nicely formatted Markdown files — one per model — including per-query stats and a summary table.
+
+## Install
+
+```bash
+pip install -e .
+```
+
+This installs the `llmqt` command into your PATH.
+
+## Setup
+
+Export your API credentials:
+
+```bash
+export OPENAI_API_KEY=your_key_here
+export OPENAI_API_BASE=https://your-endpoint/v1  # optional, for custom/local endpoints
+```
+
+## Usage
+
+```bash
+llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]
+```
+
+Examples:
+
+```bash
+llmqt prompt.md test1.yaml
+llmqt prompt.md test1.yaml test2.yaml test3.json
+```
+
+Outputs are written to `./<config_stem>/<model_name>.md` in the current working directory.
+
+## Config file format
+
+YAML (`.yaml` / `.yml`) and JSON (`.json`) are both supported.
+
+```yaml
+models:
+  - gpt-4o-mini
+  - gpt-4o
+
+queries:
+  - "What is the capital of France?"
+  - "Explain TCP vs UDP."
+  - "Write a Python prime-checker function."
+```
+
+See [example_test.yaml](example_test.yaml) and [example_system_prompt.md](example_system_prompt.md).
+
+## Output format
+
+For `llmqt prompt.md test1.yaml` with models `gpt-4o-mini` and `gpt-4o`:
+
+```
+test1/
+  gpt-4o-mini.md
+  gpt-4o.md
+```
+
+Each file contains:
+
+- A **statistics table** (elapsed time, prompt/completion tokens, tok/s per query + totals)
+- For each query: the query text, per-query stats, optional **Reasoning** section (if the model returns chain-of-thought), and the **Response**
+
+### Reasoning detection
+
+Reasoning content is extracted automatically from:
+- The `reasoning_content` field on the message (DeepSeek API style)
+- `<think>...</think>` tags in the response content (DeepSeek R1 / QwQ open-source style)
+
+## Execution order
+
+```
+for each config file:
+  for each model:
+    for each query → POST to API, wait for response
+    write <config_stem>/<model>.md in CWD
+```