Files

Jaroslav Benes a45ced89de Initial commit: llmqt LLM Query Tester

Single-file Python CLI to batch-test multiple LLM models with predefined
queries. Supports YAML/JSON config, reasoning detection (<think> tags and
reasoning_content field), per-query token/speed stats, and graceful API
error handling. Install with `pip install -e .` to get the `llmqt` command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 12:25:34 +02:00

3.9 KiB

Raw Permalink Blame History

CLAUDE.md — llmqt

Project overview

llmqt (LLM Query Tester) is a single-file Python CLI that batch-tests multiple LLM models against a set of queries. Results are written as Markdown files with per-query stats and optional reasoning sections.

Structure

llmqt/
  llmqt.py                  # entire implementation — single module
  pyproject.toml            # build/install config; declares `llmqt` console script
  example_test.yaml         # MUST be kept up to date with every config format change
  example_system_prompt.md  # system prompt used by example_test.yaml
  README.md
  CLAUDE.md
  .gitignore

Installation

pip install -e .

Registers the llmqt entry point from pyproject.toml so the command works from any directory.

CLI signature

llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]

First argument: path to a .md file containing the system prompt (resolved from CWD)
Remaining arguments: one or more test config files (YAML or JSON)

Environment variables

Variable	Required	Purpose
`OPENAI_API_KEY`	Yes	API key
`OPENAI_API_BASE`	No	Custom base URL for OpenAI-compatible endpoints

OPENAI_BASE_URL is also accepted as an alias for OPENAI_API_BASE.

Config file format (YAML or JSON)

IMPORTANT: whenever the config format changes, update example_test.yaml to reflect it.

The system prompt is not part of the config file — it is passed as the first CLI argument.

YAML example

models:
  - gpt-4o-mini
  - gpt-4o

queries:
  - "First query text"
  - "Second query text"

JSON equivalent

{
  "models": ["gpt-4o-mini", "gpt-4o"],
  "queries": ["First query text", "Second query text"]
}

Field reference

Field	Type	Description
`models`	list of strings	Model names; any OpenAI-compatible identifier
`queries`	list of strings	Queries sent to each model in listed order

Execution logic

for each config file:
  for each model:
    for each query:
      POST to API (with timing), wait for response
    write <config_stem>/<model_name>.md  (in CWD)

Output directory is always relative to the current working directory, not the config file location. This lets the user run llmqt ~/configs/prompt.md ~/configs/test1.yaml from any writable directory and have outputs land there.

Filename sanitization

Model names are sanitized for filesystem safety: characters outside [A-Za-z0-9._- ] are replaced with _. E.g. anthropic/claude-3 → anthropic_claude-3.md.

Reasoning detection

Checked in this order:

message.reasoning_content attribute (DeepSeek API / some OpenAI-compatible endpoints)
<think>...</think> tags in the response content (DeepSeek R1, QwQ open-source models)

If reasoning is found it is stripped from the answer and rendered in a separate section.

Output format per model file

# <model name>

**Config:** `test1.yaml`

## Statistics

| Query | Elapsed | Prompt tok | Completion tok | Total tok | tok/s |
|-------|---------|------------|----------------|-----------|-------|
| 1     | 1.2s    | 45         | 120            | 165       | 100.0 |
| Total | 1.2s    | 45         | 120            | 165       | 100.0 |

---

## Query 1

> <query text>

*1.2s · 120 completion tokens · 100.0 tok/s*

### Reasoning        ← only present when reasoning was detected

<reasoning text>

### Response

<answer text>

---

Dependencies

openai >= 1.0.0 — API client
pyyaml >= 6.0 — YAML parsing (imported lazily; JSON works without it)

3.9 KiB Raw Permalink Blame History