Initial commit: llmqt LLM Query Tester
Single-file Python CLI to batch-test multiple LLM models with predefined queries. Supports YAML/JSON config, reasoning detection (<think> tags and reasoning_content field), per-query token/speed stats, and graceful API error handling. Install with `pip install -e .` to get the `llmqt` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
82
README.md
Normal file
82
README.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# llmqt — LLM Query Tester
|
||||
|
||||
Batch-test multiple LLM models against a set of queries. Results are saved as nicely formatted Markdown files — one per model — including per-query stats and a summary table.
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
This installs the `llmqt` command into your PATH.
|
||||
|
||||
## Setup
|
||||
|
||||
Export your API credentials:
|
||||
|
||||
```bash
|
||||
export OPENAI_API_KEY=your_key_here
|
||||
export OPENAI_API_BASE=https://your-endpoint/v1 # optional, for custom/local endpoints
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
```bash
|
||||
llmqt prompt.md test1.yaml
|
||||
llmqt prompt.md test1.yaml test2.yaml test3.json
|
||||
```
|
||||
|
||||
Outputs are written to `./<config_stem>/<model_name>.md` in the current working directory.
|
||||
|
||||
## Config file format
|
||||
|
||||
YAML (`.yaml` / `.yml`) and JSON (`.json`) are both supported.
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- gpt-4o-mini
|
||||
- gpt-4o
|
||||
|
||||
queries:
|
||||
- "What is the capital of France?"
|
||||
- "Explain TCP vs UDP."
|
||||
- "Write a Python prime-checker function."
|
||||
```
|
||||
|
||||
See [example_test.yaml](example_test.yaml) and [example_system_prompt.md](example_system_prompt.md).
|
||||
|
||||
## Output format
|
||||
|
||||
For `llmqt prompt.md test1.yaml` with models `gpt-4o-mini` and `gpt-4o`:
|
||||
|
||||
```
|
||||
test1/
|
||||
gpt-4o-mini.md
|
||||
gpt-4o.md
|
||||
```
|
||||
|
||||
Each file contains:
|
||||
|
||||
- A **statistics table** (elapsed time, prompt/completion tokens, tok/s per query + totals)
|
||||
- For each query: the query text, per-query stats, optional **Reasoning** section (if the model returns chain-of-thought), and the **Response**
|
||||
|
||||
### Reasoning detection
|
||||
|
||||
Reasoning content is extracted automatically from:
|
||||
- The `reasoning_content` field on the message (DeepSeek API style)
|
||||
- `<think>...</think>` tags in the response content (DeepSeek R1 / QwQ open-source style)
|
||||
|
||||
## Execution order
|
||||
|
||||
```
|
||||
for each config file:
|
||||
for each model:
|
||||
for each query → POST to API, wait for response
|
||||
write <config_stem>/<model>.md in CWD
|
||||
```
|
||||
Reference in New Issue
Block a user