Initial commit: llmqt LLM Query Tester
Single-file Python CLI to batch-test multiple LLM models with predefined queries. Supports YAML/JSON config, reasoning detection (<think> tags and reasoning_content field), per-query token/speed stats, and graceful API error handling. Install with `pip install -e .` to get the `llmqt` command. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
145
CLAUDE.md
Normal file
145
CLAUDE.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# CLAUDE.md — llmqt
|
||||
|
||||
## Project overview
|
||||
|
||||
`llmqt` (LLM Query Tester) is a single-file Python CLI that batch-tests multiple LLM models
|
||||
against a set of queries. Results are written as Markdown files with per-query stats and
|
||||
optional reasoning sections.
|
||||
|
||||
## Structure
|
||||
|
||||
```
|
||||
llmqt/
|
||||
llmqt.py # entire implementation — single module
|
||||
pyproject.toml # build/install config; declares `llmqt` console script
|
||||
example_test.yaml # MUST be kept up to date with every config format change
|
||||
example_system_prompt.md # system prompt used by example_test.yaml
|
||||
README.md
|
||||
CLAUDE.md
|
||||
.gitignore
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
Registers the `llmqt` entry point from `pyproject.toml` so the command works from any directory.
|
||||
|
||||
## CLI signature
|
||||
|
||||
```
|
||||
llmqt <system_prompt.md> <config1.yaml> [config2.yaml ...]
|
||||
```
|
||||
|
||||
- **First argument**: path to a `.md` file containing the system prompt (resolved from CWD)
|
||||
- **Remaining arguments**: one or more test config files (YAML or JSON)
|
||||
|
||||
## Environment variables
|
||||
|
||||
| Variable | Required | Purpose |
|
||||
|------------------|----------|------------------------------------------------------|
|
||||
| `OPENAI_API_KEY` | Yes | API key |
|
||||
| `OPENAI_API_BASE`| No | Custom base URL for OpenAI-compatible endpoints |
|
||||
|
||||
`OPENAI_BASE_URL` is also accepted as an alias for `OPENAI_API_BASE`.
|
||||
|
||||
## Config file format (YAML or JSON)
|
||||
|
||||
**IMPORTANT: whenever the config format changes, update `example_test.yaml` to reflect it.**
|
||||
|
||||
The system prompt is **not** part of the config file — it is passed as the first CLI argument.
|
||||
|
||||
### YAML example
|
||||
|
||||
```yaml
|
||||
models:
|
||||
- gpt-4o-mini
|
||||
- gpt-4o
|
||||
|
||||
queries:
|
||||
- "First query text"
|
||||
- "Second query text"
|
||||
```
|
||||
|
||||
### JSON equivalent
|
||||
|
||||
```json
|
||||
{
|
||||
"models": ["gpt-4o-mini", "gpt-4o"],
|
||||
"queries": ["First query text", "Second query text"]
|
||||
}
|
||||
```
|
||||
|
||||
### Field reference
|
||||
|
||||
| Field | Type | Description |
|
||||
|-----------|-----------------|------------------------------------------------------|
|
||||
| `models` | list of strings | Model names; any OpenAI-compatible identifier |
|
||||
| `queries` | list of strings | Queries sent to each model in listed order |
|
||||
|
||||
## Execution logic
|
||||
|
||||
```
|
||||
for each config file:
|
||||
for each model:
|
||||
for each query:
|
||||
POST to API (with timing), wait for response
|
||||
write <config_stem>/<model_name>.md (in CWD)
|
||||
```
|
||||
|
||||
Output directory is always relative to the **current working directory**, not the config file
|
||||
location. This lets the user run `llmqt ~/configs/prompt.md ~/configs/test1.yaml` from any
|
||||
writable directory and have outputs land there.
|
||||
|
||||
## Filename sanitization
|
||||
|
||||
Model names are sanitized for filesystem safety: characters outside `[A-Za-z0-9._- ]` are
|
||||
replaced with `_`. E.g. `anthropic/claude-3` → `anthropic_claude-3.md`.
|
||||
|
||||
## Reasoning detection
|
||||
|
||||
Checked in this order:
|
||||
1. `message.reasoning_content` attribute (DeepSeek API / some OpenAI-compatible endpoints)
|
||||
2. `<think>...</think>` tags in the response content (DeepSeek R1, QwQ open-source models)
|
||||
|
||||
If reasoning is found it is stripped from the answer and rendered in a separate section.
|
||||
|
||||
## Output format per model file
|
||||
|
||||
```markdown
|
||||
# <model name>
|
||||
|
||||
**Config:** `test1.yaml`
|
||||
|
||||
## Statistics
|
||||
|
||||
| Query | Elapsed | Prompt tok | Completion tok | Total tok | tok/s |
|
||||
|-------|---------|------------|----------------|-----------|-------|
|
||||
| 1 | 1.2s | 45 | 120 | 165 | 100.0 |
|
||||
| Total | 1.2s | 45 | 120 | 165 | 100.0 |
|
||||
|
||||
---
|
||||
|
||||
## Query 1
|
||||
|
||||
> <query text>
|
||||
|
||||
*1.2s · 120 completion tokens · 100.0 tok/s*
|
||||
|
||||
### Reasoning ← only present when reasoning was detected
|
||||
|
||||
<reasoning text>
|
||||
|
||||
### Response
|
||||
|
||||
<answer text>
|
||||
|
||||
---
|
||||
```
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `openai >= 1.0.0` — API client
|
||||
- `pyyaml >= 6.0` — YAML parsing (imported lazily; JSON works without it)
|
||||
Reference in New Issue
Block a user