Initial commit: llmqt LLM Query Tester

Single-file Python CLI to batch-test multiple LLM models with predefined queries. Supports YAML/JSON config, reasoning detection (<think> tags and reasoning_content field), per-query token/speed stats, and graceful API error handling. Install with `pip install -e .` to get the `llmqt` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 12:25:34 +02:00
commit a45ced89de
7 changed files with 542 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,145 @@
+# CLAUDE.md — llmqt
+
+## Project overview
+
+`llmqt` (LLM Query Tester) is a single-file Python CLI that batch-tests multiple LLM models
+against a set of queries. Results are written as Markdown files with per-query stats and
+optional reasoning sections.
+
+## Structure
+
+```
+llmqt/
+  llmqt.py                  # entire implementation — single module
+  pyproject.toml            # build/install config; declares `llmqt` console script
+  example_test.yaml         # MUST be kept up to date with every config format change
+  example_system_prompt.md  # system prompt used by example_test.yaml
+  README.md
+  CLAUDE.md
+  .gitignore
+```
+
+## Installation
+
+```bash
+pip install -e .
+```
+
+Registers the `llmqt` entry point from `pyproject.toml` so the command works from any directory.
+
+## CLI signature
+
+```
+llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]
+```
+
+- **First argument**: path to a `.md` file containing the system prompt (resolved from CWD)
+- **Remaining arguments**: one or more test config files (YAML or JSON)
+
+## Environment variables
+
+| Variable         | Required | Purpose                                              |
+|------------------|----------|------------------------------------------------------|
+| `OPENAI_API_KEY` | Yes      | API key                                              |
+| `OPENAI_API_BASE`| No       | Custom base URL for OpenAI-compatible endpoints      |
+
+`OPENAI_BASE_URL` is also accepted as an alias for `OPENAI_API_BASE`.
+
+## Config file format (YAML or JSON)
+
+**IMPORTANT: whenever the config format changes, update `example_test.yaml` to reflect it.**
+
+The system prompt is **not** part of the config file — it is passed as the first CLI argument.
+
+### YAML example
+
+```yaml
+models:
+  - gpt-4o-mini
+  - gpt-4o
+
+queries:
+  - "First query text"
+  - "Second query text"
+```
+
+### JSON equivalent
+
+```json
+{
+  "models": ["gpt-4o-mini", "gpt-4o"],
+  "queries": ["First query text", "Second query text"]
+}
+```
+
+### Field reference
+
+| Field     | Type            | Description                                          |
+|-----------|-----------------|------------------------------------------------------|
+| `models`  | list of strings | Model names; any OpenAI-compatible identifier        |
+| `queries` | list of strings | Queries sent to each model in listed order           |
+
+## Execution logic
+
+```
+for each config file:
+  for each model:
+    for each query:
+      POST to API (with timing), wait for response
+    write <config_stem>/<model_name>.md  (in CWD)
+```
+
+Output directory is always relative to the **current working directory**, not the config file
+location. This lets the user run `llmqt ~/configs/prompt.md ~/configs/test1.yaml` from any
+writable directory and have outputs land there.
+
+## Filename sanitization
+
+Model names are sanitized for filesystem safety: characters outside `[A-Za-z0-9._- ]` are
+replaced with `_`. E.g. `anthropic/claude-3` → `anthropic_claude-3.md`.
+
+## Reasoning detection
+
+Checked in this order:
+1. `message.reasoning_content` attribute (DeepSeek API / some OpenAI-compatible endpoints)
+2. `<think>...</think>` tags in the response content (DeepSeek R1, QwQ open-source models)
+
+If reasoning is found it is stripped from the answer and rendered in a separate section.
+
+## Output format per model file
+
+```markdown
+# <model name>
+
+**Config:** `test1.yaml`
+
+## Statistics
+
+| Query | Elapsed | Prompt tok | Completion tok | Total tok | tok/s |
+|-------|---------|------------|----------------|-----------|-------|
+| 1     | 1.2s    | 45         | 120            | 165       | 100.0 |
+| Total | 1.2s    | 45         | 120            | 165       | 100.0 |
+
+---
+
+## Query 1
+
+> <query text>
+
+*1.2s · 120 completion tokens · 100.0 tok/s*
+
+### Reasoning        ← only present when reasoning was detected
+
+<reasoning text>
+
+### Response
+
+<answer text>
+
+---
+```
+
+## Dependencies
+
+- `openai >= 1.0.0` — API client
+- `pyyaml >= 6.0` — YAML parsing (imported lazily; JSON works without it)