llmqt/README.md

# llmqt — LLM Query Tester

Batch-test multiple LLM models against a set of queries. Results are saved as nicely formatted Markdown files — one per model — including per-query stats and a summary table.

## Install

```bash
pip install -e .
```

This installs the `llmqt` command into your PATH.

## Setup

Export your API credentials:

```bash
export OPENAI_API_KEY=your_key_here
export OPENAI_API_BASE=https://your-endpoint/v1  # optional, for custom/local endpoints
```

## Usage

```bash
llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]
```

Examples:

```bash
llmqt prompt.md test1.yaml
llmqt prompt.md test1.yaml test2.yaml test3.json
```

Outputs are written to `./<config_stem>/<model_name>.md` in the current working directory.

## Config file format

YAML (`.yaml` / `.yml`) and JSON (`.json`) are both supported.

```yaml
models:
  - gpt-4o-mini
  - gpt-4o

queries:
  - "What is the capital of France?"
  - "Explain TCP vs UDP."
  - "Write a Python prime-checker function."
```

See [example_test.yaml](example_test.yaml) and [example_system_prompt.md](example_system_prompt.md).

## Output format

For `llmqt prompt.md test1.yaml` with models `gpt-4o-mini` and `gpt-4o`:

```
test1/
  gpt-4o-mini.md
  gpt-4o.md
```

Each file contains:

- A **statistics table** (elapsed time, prompt/completion tokens, tok/s per query + totals)
- For each query: the query text, per-query stats, optional **Reasoning** section (if the model returns chain-of-thought), and the **Response**

### Reasoning detection

Reasoning content is extracted automatically from:
- The `reasoning_content` field on the message (DeepSeek API style)
- `<think>...</think>` tags in the response content (DeepSeek R1 / QwQ open-source style)

## Execution order

```
for each config file:
  for each model:
    for each query → POST to API, wait for response
    write <config_stem>/<model>.md in CWD
```