Files

Jaroslav Benes 083cbb1fa7 Initial project structure

Scaffold all modules, route stubs, data models, and config.
No logic implemented yet — all core methods raise NotImplementedError.
Establishes the full directory layout matching the architecture in CLAUDE.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 14:48:48 +02:00

24 KiB

Raw Blame History

Fellowship

Fellowship is an API middleware server that sits between an OpenAI-compatible LLM backend and any client project. It orchestrates a fellowship of bots — managing their identities, system prompts, turn-taking, conversation flow, and interaction with human participants.

Fellowship speaks OpenAI-compatible protocol toward the LLM backend only. Its own API toward client projects uses a custom format suited to multi-participant sessions.

The design goal is a general-purpose, extensible API with many options — not a hardcoded scenario. Each feature should be buildable and expandable in future versions without breaking existing sessions.

Repository Structure

main — stable releases
dev — active development, merged into main on release

Terminology

Bot — an LLM agent with its own name, system prompt, and optional per-bot settings (model, temperature, role). Bots are the core participants of any session. Each bot sees only the conversation messages — never another bot's system prompt or internal reasoning.

Talker — a human participant who can read and send messages into a session. Multiple talkers can be connected to the same session simultaneously, all sharing the same conversation.

Observer — a human participant who can only read the conversation. Observers receive the full history on connect and all subsequent events, but cannot send messages. There is no limit on concurrent observers.

Member — collective term for any participant in a session: bots, talkers, and observers.

Session — the container for a single conversation. Holds the configuration, all bots, conversation history, and connected members. Identified by a session token.

Session Token — an opaque string returned on session creation, used to connect to or manage the session.

Turn — a single message produced by one member (bot or talker). The session advances turn by turn.

Loop — the autonomous turn engine that drives bot turns without waiting for human input. Only active in autonomous mode.

Orchestrator — a hidden internal LLM call (not a visible bot) that decides which bot speaks next in orchestrated turn order. Unlike bots, the orchestrator receives the full conversation history plus all bot system prompts — giving it a complete picture of each bot's personality and role to make informed routing decisions. Can also signal session end when a task is complete.

Context — the conversation history as assembled for a specific bot's next prompt. Fellowship constructs this per-bot, including only messages — no foreign system prompts.

History — the full ordered log of all turns in a session, cached server-side. Delivered to any member on connect as a replay.

Prompt — the complete input sent to the LLM for a bot's turn: global system prompt + bot system prompt + context.

Core Concept

A client initializes a session by specifying bots and configuration. Fellowship returns a session token. Members (talkers and observers) connect using that token and receive the full history replay followed by live events.

Whether the session loop starts immediately depends on the participation mode:

In autonomous mode the loop starts immediately on session creation — no member needs to be connected.
In reactive and collaborative modes the loop is triggered by the first talker message.

Session Lifecycle

1. Initialize

Client sends POST /session/create with:

List of bots and their configuration
Global system prompt (optional, injected for all bots)
Session options (participation mode, turn order, limits, etc.)
LLM backend URL and API key (or server default is used)

Server responds with a session token and session metadata. The loop starts immediately if in autonomous mode.

2. Connect

Members connect using the session token:

Talkers connect via WebSocket — they can send messages and receive events
Observers connect via WebSocket or SSE — receive-only
On connect, the server first sends a history event with the full conversation so far, then streams live events from that point forward
Multiple talkers and any number of observers can be connected simultaneously

3. Terminate

Session ends when:

Any client calls DELETE /session/:token
A configured limit is reached (max_turns, max_time)
The orchestrator signals task completion (if orchestrator_end is enabled)

Bot Configuration (per bot)

name          - Display name and identity within the conversation
system_prompt - Individual personality, instructions, and role
model         - (optional) Override the LLM model for this bot
temperature   - (optional) Per-bot temperature override
role          - (optional) Semantic hint: "expert", "critic", "summarizer", etc.

Session Options

All options are set at session creation. The API is intentionally option-rich to support general use cases. Options not yet implemented should be accepted and ignored gracefully, with their planned status documented.

Participation Mode

Defines whether and how human talkers are involved.

autonomous — bots only, no talker input. Loop starts immediately. Observers can watch.
reactive — bots respond to talker messages. Loop starts on first talker message. No autonomous bot-to-bot turns between talker messages.
collaborative — talkers and bots share the conversation. Bots may also converse among themselves between talker messages. Loop starts on first talker message.

Talker Limits

max_talkers: N — maximum number of simultaneous talker connections (default: 1)
Observers are always unlimited
Talker messages are processed in arrival order (queue). It is structurally impossible for two messages to land at the same position — first in, first processed.
Talker messages carry the talker's display name so all members (bots included) know who said what.

Turn Order (bots only)

Applies to bot turns. Talker turns are always injected as they arrive.

round_robin — bots cycle in fixed order: Bot1 → Bot2 → Bot3 → Bot1 → ... No exceptions, no skipping.
orchestrated — an orchestrator LLM call decides which bot speaks next
- Requires 3 or more bots
- The orchestrator receives the full conversation and all bot system prompts so it can make an informed decision about who would most naturally or usefully respond
- Adds one extra LLM call per turn
- Can also signal session end when a task is complete

History Rectification

Fellowship prompts bots strictly one at a time. However, a talker message can arrive while a bot is still generating. Without rectification this produces an out-of-order history that makes no logical sense to subsequent bots.

When a bot's LLM call is dispatched, its slot in history is reserved at that moment. Any talker messages that arrive during generation are queued and inserted after that reserved slot. When the LLM responds, the bot's message fills the reserved slot. The result is a logically coherent history regardless of when messages arrived.

Example without rectification (broken):

Talker One:  Today is a wonderful day.
Talker Two:  I don't think so.          ← arrived while Bot One was generating
Bot One:     I absolutely agree.        ← appended at end, out of order

Example with rectification (correct):

Talker One:  Today is a wonderful day.
Bot One:     I absolutely agree.        ← slot reserved at dispatch, filled on response
Talker Two:  I don't think so.          ← follows naturally
Bot Two:     Why so gloomy, Talker Two?

Options:

rectify_history: true — enable rectification (default)
rectify_history: false — disable, messages appended strictly in arrival/completion order

Goal

goal: string — optional natural language description of what the session should accomplish
If set, the goal is included in the orchestrator's system prompt so it can monitor whether it has been reached
The orchestrator's end_session tool is only available when a goal is set — without a goal, the orchestrator cannot end the session on its own
Without a goal, session termination requires an explicit API call or a configured limit to be reached

Session End Conditions (any combination)

max_turns: N — end after N total bot turns
max_time: N — end after N seconds from session creation
max_context_tokens: N — end when total context (full chat history + the largest system prompt) reaches N tokens; useful for staying within model context limits when summarization is disabled
Orchestrator end_session tool — only usable when a goal is set
Explicit API call — DELETE /session/:token from the connecting project
No limit set and no goal — session runs until explicitly terminated

Token Streaming

stream_tokens: false — bot responses delivered as complete messages (default)
stream_tokens: true — bot responses streamed token-by-token (opt-in, lower latency)

Context Handling

shared_context — all bots see the full message history (default)
scoped_context — each bot only sees messages it was directly involved in

Each bot's prompt always contains:

Global system prompt (if set)
Bot's own system prompt
Context — messages only, no foreign system prompts or reasoning

Context Summarization

Controls what happens when the total context (full chat history + the system prompt with the most tokens) approaches the model's context limit.

summarize_context: false — session auto-ends when context limit is reached (default)
summarize_context: true — when the limit is approached, Fellowship compacts the older portion of the chat into a summary, retaining a recent tail of messages intact. The full chat log is always preserved server-side; only the LLM input is compacted. Future turns receive: system prompt + summary + tail.

Token counting is tracked continuously so Fellowship knows when to act before the limit is hit.

Memory

memory: none — fully isolated, no persistence (default)
memory: new — create a new persistent memory store for this session
memory: inherit:<session_token> — load and continue memory from a prior session
Memory is injected into each bot's prompt at the start of the context

Session Connection

Transports

WebSocket — primary transport for talkers and observers:

WS /session/:token/connect?role=talker — can send and receive
WS /session/:token/connect?role=observer — receive only
On connect: history event replays the full conversation, then live events follow

SSE — lightweight observe-only alternative:

GET /session/:token/stream — receive only, same history + live event flow

Event types (server → member)

{ type: "history",      messages: [...] }
{ type: "turn_start",   bot: "Alice", turn: 3 }
{ type: "bot_message",  bot: "Alice", content: "...", turn: 3 }
{ type: "token",        bot: "Alice", token: "...",   turn: 3 }    // stream_tokens only
{ type: "turn_end",     bot: "Alice", turn: 3, tokens: 142 }
{ type: "talker_message", talker_id: "...", content: "...", turn: 4 }
{ type: "member_joined",  role: "observer" | "talker" }
{ type: "member_left",    role: "observer" | "talker" }
{ type: "session_paused" }
{ type: "session_resumed" }
{ type: "session_end",  reason: "max_turns" | "max_time" | "max_context" | "orchestrator" | "client_request" }
{ type: "error",        message: "..." }

Message types (client → server, talker WebSocket only)

{ type: "user_message", content: "..." }
{ type: "ping" }

Internal Architecture

Session Store — in-memory cache of all active sessions and full history; keyed by session token
Session Loop — the core driver per session. Runs continuously, checking for new talker messages and advancing bot turns one prompt at a time. Never dispatches two LLM calls simultaneously. Starts immediately in autonomous mode, or on first talker message in reactive and collaborative modes.
Message Queue — incoming talker messages are enqueued and processed by the loop in arrival order
LLM Client — OpenAI-compatible HTTP client; configurable base URL + API key per session
Turn Engine — given the current state, determines the next bot (via round_robin or orchestrator), constructs its prompt, dispatches the LLM call, and writes the response to history
Orchestrator — optional LLM call fed the full conversation and all bot system prompts; returns the name of the next bot to speak, and optionally a session-end signal
Context Manager — assembles message-only history per bot (no foreign system prompts)
Connection Hub — WebSocket/SSE fan-out; broadcasts events to all connected members of a session
Memory Store — SQLite database for cross-session memory and optional session persistence

API Endpoints

POST   /session/create              Initialize session, return token
GET    /session/:token              Session status, config, turn count, connected members
DELETE /session/:token              End session

POST   /session/:token/pause        Pause the session loop
POST   /session/:token/resume       Resume a paused session loop

WS     /session/:token/connect      Connect as talker or observer (role param)
GET    /session/:token/stream       SSE observe-only stream

GET    /session/:token/history      Full conversation history (REST)

Pause and Resume

A paused session stops the loop completely — no LLM calls are made. Connected members remain connected and will receive a session_paused event. On resume, the loop picks up where it left off and members receive a session_resumed event. Talker messages received while paused are queued and processed after resume.

Docs

OpenAPI 3.x spec auto-generated from server code, served at /openapi.json and /docs. Framework choice should make this natural (FastAPI, Hono/Elysia, Axum+utoipa, etc.). Markdown guides live in /docs/ in the repository.

LLM Prompt Harness

Fellowship constructs all prompts internally. No client ever sends a raw prompt to the LLM.

Bot Prompts

Each bot prompt is assembled as:

Global system prompt (if set)
Bot's own system prompt
Conversation context (messages only — no foreign system prompts, no orchestrator output)

Bots are standard chat completions. Their output is appended to history as-is.

Orchestrator Prompt

The orchestrator is a stateless LLM call — not a bot, never part of the conversation history. It is called fresh each time a routing decision is needed.

Each orchestrator call receives:

Its own system prompt (explains its role, lists available tools, provides bot roster with names and system prompts)
The current conversation history formatted for context

The orchestrator responds with a tool call. Any text it outputs alongside the tool call is discarded — only the tool call matters.

Orchestrator Tools

select_speaker(bot_name: string)
  — Fellowship will prompt that bot next.

hold()
  — Do not prompt any bot this turn. Loop waits for the next talker message before
    asking the orchestrator again. Used when user messages imply bots should stay silent.

end_session(reason: string)
  — Fellowship ends the session. Only available when a goal is set for the session.

Fellowship acts on the tool call and ignores everything else. The orchestrator's system prompt includes an overview of how Fellowship works, the full bot roster with system prompts, the session goal (if set), and instructions to watch for talker messages that imply bots should not respond.

Server Configuration (.env)

Fellowship is configured via a .env file at the server root. Session creation can override some of these per-session.

LLM_BASE_URL          — OpenAI-compatible backend URL (e.g. http://localhost:8080/v1)
LLM_API_KEY           — API key (can be a dummy value for local backends)

DEFAULT_BOT_MODEL     — Default model used for all bots unless overridden per-bot
DEFAULT_ORCHESTRATOR_MODEL — Model used for orchestrator calls (can differ from bot model)

MAX_BOTS_PER_SESSION  — Server-side hard cap on bots per session
SESSION_TTL_DEFAULT   — Default idle timeout in seconds if not set per-session

Per-session overrides for model and backend URL can be provided in POST /session/create.

Logging and Debug

Always-on Logging

Fellowship writes structured logs to logs/{YYYY-MM-DD}.log regardless of any settings. Each log entry includes a timestamp, session token (truncated), event type, and relevant details. Log files rotate daily.

Logged events include: session created/ended, each LLM call dispatched and completed, orchestrator tool calls, member connections/disconnections, errors, pause/resume signals.

Debug Mode

Debug mode can be enabled server-wide in .env or per-session in POST /session/create.

DEBUG=true   — enable debug mode server-wide

When debug is enabled, connected members also receive debug events over their WebSocket/SSE connection in addition to normal events:

{ type: "debug", category: "llm_call",        data: { bot: "Alice", prompt_tokens: 312 } }
{ type: "debug", category: "orchestrator",     data: { tool: "select_speaker", bot: "Bob" } }
{ type: "debug", category: "loop",             data: { state: "waiting_for_talker" } }
{ type: "debug", category: "context_tokens",   data: { total: 1840, limit: 4096 } }
{ type: "debug", category: "rectification",    data: { slot: 7, queued_messages: 1 } }

This allows client projects to display or log Fellowship internals without needing to read server log files directly.

Development Rules

These rules apply to all code written for Fellowship. They exist to keep the codebase consistent, maintainable, and safe to build upon across versions.

Language and Runtime

Python 3.11+
Async throughout — no blocking I/O calls on the event loop. Use asyncio, httpx (async), aiosqlite. If CPU-bound work is needed, offload to a thread pool executor.
All code must pass a type checker (pyright or mypy in strict mode).

Project Structure

fellowship/
  api/
    routes/         — FastAPI route definitions only; no business logic here
    models/         — Pydantic request and response models
    events.py       — All WebSocket/SSE event type definitions
  core/
    session.py      — Session data structure and state management
    loop.py         — Session loop logic
    turn_engine.py  — Bot prompt construction and turn execution
    orchestrator.py — Orchestrator call and tool call parsing
    context.py      — Context assembly and summarization logic
    rectifier.py    — History rectification logic
    queue.py        — Talker message queue
  llm/
    client.py       — All LLM HTTP calls, OpenAI-compatible format
  store/
    session_store.py — In-memory session cache
    memory_store.py  — SQLite-backed cross-session memory
  hub/
    connection_hub.py — WebSocket/SSE fan-out to connected members
  config.py         — Pydantic Settings, loads from .env
  logging.py        — Logging setup and structured log helpers
tests/
  unit/             — Tests per module, no external dependencies
  integration/      — Tests against a mock LLM server
docs/               — Markdown guides and examples
logs/               — Runtime log files (gitignored)
.env                — Local config (gitignored)
.env.example        — Committed template with placeholder values

Each module has one responsibility matching the architecture. No module reaches into another module's internals — only through its public interface.

Code Rules

Pydantic for all data structures. Every request body, response, event, and internal data model that crosses a module boundary is a Pydantic model. No raw dicts passed between components.

Type hints everywhere. All function signatures — arguments and return types. No Any unless genuinely unavoidable and commented why.

No business logic in routes. Route handlers validate input (handled by Pydantic) and call into core/. They do not contain loop logic, LLM calls, or history manipulation.

All LLM calls go through llm/client.py. No module calls the LLM backend directly. This keeps the OpenAI-compatible protocol isolated in one place.

History is append-only except during rectification. The only time a history slot is modified after creation is when a reserved rectification slot is filled by the bot response that claimed it. Nothing else mutates past history.

The session loop must never crash. The loop catches all exceptions internally, logs them, emits an error event to members, and continues. A single failed LLM call does not end the session unless a limit has been reached or the error is unrecoverable. What counts as unrecoverable must be explicitly decided and documented.

No hardcoded values. All configuration (URLs, model names, limits, timeouts) comes from config.py which reads from .env. Magic numbers in code are a bug.

Unknown session options are accepted and ignored. If a client sends an option that Fellowship doesn't recognize, log it as a warning and continue. Do not error. This preserves forward compatibility.

Git Workflow

main — stable, tested releases only. Never commit directly.
dev — active development. Feature branches are cut from here and merged back here.
Branch naming: feature/short-description, fix/short-description
Commit messages: imperative mood, present tense. Describe what the commit does, not what you did. Example: Add orchestrator hold tool support not Added hold tool.
Merge to dev via pull request. Squash commits if the branch history is noisy.
Merge dev to main only when a meaningful set of features is stable and tested. Tag releases on main.

Testing Rules

Every module in core/ and llm/ must have corresponding unit tests.
Unit tests must not make real LLM calls. Use a mock LLM server or patched responses.
Integration tests live in tests/integration/ and test full session flows against a mock LLM server.
A test must exist before a feature is considered done.
Tests are run on every merge to dev.

Error Handling

Errors internal to the session loop are caught, logged, and emitted as error events to members — they do not propagate up.
Errors in route handlers return structured JSON: { "error": "...", "code": "..." } with an appropriate HTTP status code.
LLM call failures are retried once with a short delay before being treated as an error. The retry count and delay are configurable.
Never silence an exception without logging it.

Logging Rules

Use Python's standard logging module. Configure it in fellowship/logging.py.
All logs go to logs/{YYYY-MM-DD}.log. Rotate daily. Console output in development.
Log levels: DEBUG for internal loop state, LLM prompts/responses; INFO for session lifecycle events; WARNING for unknown options, retries, fallbacks; ERROR for caught failures.
Every log line that relates to a session must include the session token (first 8 chars is enough).
Log files are gitignored.

API Versioning

All routes are prefixed /v1/. Example: POST /v1/session/create.
Breaking changes to the API require a new version prefix. Additive changes (new optional fields, new event types) do not.

Docs Rules

FastAPI route decorators must include a summary and description so the auto-generated OpenAPI spec is useful.
Pydantic models must include field descriptions via Field(description="...").
When a new session option is added, it must be documented in CLAUDE.md and in the OpenAPI spec before the PR is merged.
docs/ contains human-readable Markdown guides. At minimum: quick-start, session options reference, event types reference, common patterns.

Notes

Fellowship is a structural layer — it does not interpret conversation content.
The name reflects a group of distinct characters, each with their own voice, working together.
Any project that can make HTTP/WebSocket requests can use Fellowship regardless of language.
Options not yet implemented in a given version are accepted, ignored gracefully, and noted in docs as planned.

24 KiB Raw Blame History