Go to file

Jaroslav Benes a45ced89de Initial commit: llmqt LLM Query Tester

Single-file Python CLI to batch-test multiple LLM models with predefined
queries. Supports YAML/JSON config, reasoning detection (<think> tags and
reasoning_content field), per-query token/speed stats, and graceful API
error handling. Install with `pip install -e .` to get the `llmqt` command.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 12:25:34 +02:00

.gitignore

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

CLAUDE.md

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

example_system_prompt.md

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

example_test.yaml

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

llmqt.py

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

pyproject.toml

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

README.md

Initial commit: llmqt LLM Query Tester

2026-04-08 12:25:34 +02:00

README.md

llmqt — LLM Query Tester

Batch-test multiple LLM models against a set of queries. Results are saved as nicely formatted Markdown files — one per model — including per-query stats and a summary table.

Install

pip install -e .

This installs the llmqt command into your PATH.

Setup

Export your API credentials:

export OPENAI_API_KEY=your_key_here
export OPENAI_API_BASE=https://your-endpoint/v1  # optional, for custom/local endpoints

Usage

llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]

Examples:

llmqt prompt.md test1.yaml
llmqt prompt.md test1.yaml test2.yaml test3.json

Outputs are written to ./<config_stem>/<model_name>.md in the current working directory.

Config file format

YAML (.yaml / .yml) and JSON (.json) are both supported.

models:
  - gpt-4o-mini
  - gpt-4o

queries:
  - "What is the capital of France?"
  - "Explain TCP vs UDP."
  - "Write a Python prime-checker function."

See example_test.yaml and example_system_prompt.md.

Output format

For llmqt prompt.md test1.yaml with models gpt-4o-mini and gpt-4o:

test1/
  gpt-4o-mini.md
  gpt-4o.md

Each file contains:

A statistics table (elapsed time, prompt/completion tokens, tok/s per query + totals)
For each query: the query text, per-query stats, optional Reasoning section (if the model returns chain-of-thought), and the Response

Reasoning detection

Reasoning content is extracted automatically from:

The reasoning_content field on the message (DeepSeek API style)
<think>...</think> tags in the response content (DeepSeek R1 / QwQ open-source style)

Execution order

for each config file:
  for each model:
    for each query → POST to API, wait for response
    write <config_stem>/<model>.md in CWD