Glyphoxa is configured through a single YAML file. This document is the authoritative reference for every configuration field, provider option, and runtime behaviour related to configuration.
📖 Overview
Config file format
Glyphoxa reads a YAML configuration file at startup. The YAML decoder operates in strict mode (KnownFields(true)) – any unrecognised key in the file causes a hard parse error. This catches typos early.
Specifying the config file
Pass the path via the -config CLI flag:
glyphoxa -config /etc/glyphoxa/config.yaml
The default is config.yaml in the working directory.
Binary mode
Select the binary mode with the --mode flag:
glyphoxa --mode=gateway --config /etc/glyphoxa/config.yaml
| Mode | Description |
|---|---|
full (default) | Single-process self-hosted deployment |
gateway | Multi-tenant session orchestrator with admin API |
worker | Voice pipeline executor (created by gateway) |
mcp-gateway | Shared MCP tool server |
In --mode=gateway and --mode=worker, several environment variables configure the distributed system. See Deployment: Environment Variables for the full list.
Environment variable fallbacks
Glyphoxa itself does not read environment variables for configuration overrides. However, several provider SDKs honour their own environment variables when no api_key is set in the config file:
| Provider | Environment Variable |
|---|---|
openai (LLM, embeddings) | OPENAI_API_KEY |
anthropic | ANTHROPIC_API_KEY |
gemini | GEMINI_API_KEY / GOOGLE_API_KEY |
deepseek | DEEPSEEK_API_KEY |
mistral | MISTRAL_API_KEY |
groq | GROQ_API_KEY |
These are read by the upstream SDKs, not by Glyphoxa’s config loader. If you set api_key explicitly in the YAML, it takes precedence.
🔄 Hot Reload
The configuration file can be edited while Glyphoxa is running. A background config watcher detects changes and applies them without a restart.
How it works
- The watcher polls the config file (no
fsnotifydependency). - Default polling interval: 5 seconds.
- On each tick the file’s mtime is checked first (cheap). If mtime has not changed, no further work is done.
- If mtime changed, the file is read and its SHA-256 hash is compared to the previously loaded config. Touching the file without changing content is a no-op.
- The new file is fully parsed and validated before it replaces the old config. If validation fails, the old config is retained and a warning is logged.
- When a valid change is detected, the
onChangecallback receives both the old and newConfigvalues.
What can be hot-reloaded
The Diff function tracks which changes are safe to apply at runtime:
| Change | Hot-reloaded? | Notes |
|---|---|---|
server.log_level | ✅ Yes | Takes effect immediately |
NPC personality | ✅ Yes | System prompt updated on next interaction |
NPC voice (provider, voice_id, pitch, speed) | ✅ Yes | Applied to new TTS sessions |
NPC budget_tier | ✅ Yes | Changes tool filtering immediately |
| Adding a new NPC | ✅ Yes | NPC becomes available without restart |
| Removing an NPC | ✅ Yes | NPC is unloaded |
| Provider changes (api_key, model, etc.) | ❌ No | Requires restart |
server.listen_addr / server.tls | ❌ No | Requires restart |
providers.audio.* | ❌ No | Requires restart |
memory.* | ❌ No | Requires restart |
mcp.servers | ❌ No | Requires restart |
campaign.* | ❌ No | Requires restart |
🗂️ Complete Field Reference
server – Server Settings
| Field | Type | Default | Description |
|---|---|---|---|
server.listen_addr | string | "" | TCP address to listen on (e.g., ":8080"). Empty means the server does not bind an HTTP listener. |
server.log_level | string | "info" | Log verbosity. Valid values: debug, info, warn, error. Hot-reloadable. |
server.tls | object | null | TLS configuration block. When omitted or null, the server runs plain HTTP. |
server.tls.cert_file | string | – | Path to PEM-encoded TLS certificate. Required if tls is set. |
server.tls.key_file | string | – | Path to PEM-encoded TLS private key. Required if tls is set. |
server:
listen_addr: ":8080"
log_level: info
tls:
cert_file: /etc/ssl/glyphoxa.crt
key_file: /etc/ssl/glyphoxa.key
providers – AI Provider Configuration
Each provider slot follows the same ProviderEntry schema:
| Field | Type | Default | Description |
|---|---|---|---|
name | string | "" | Registered provider implementation name (e.g., "openai", "deepgram"). Leave empty to disable the pipeline stage. |
api_key | string | "" | Authentication key for the provider’s API. Some providers fall back to environment variables when empty. |
base_url | string | "" | Override the provider’s default API endpoint. Leave empty to use the built-in default. |
model | string | "" | Model name within the provider (e.g., "gpt-4o", "nova-3"). |
options | map[string]any | {} | Provider-specific settings not covered by the standard fields. See Provider-Specific Options below. |
providers.llm – Large Language Model
Used for NPC reasoning in cascaded and sentence_cascade engine modes.
Registered providers: openai, anthropic, ollama, gemini, deepseek, mistral, groq, llamacpp, llamafile
providers:
llm:
name: openai
api_key: sk-...
model: gpt-4o
options:
max_tokens: 1024
providers.stt – Speech-to-Text
Transcribes player audio in cascaded engine mode.
Registered providers: deepgram, whisper, whisper-native
providers:
stt:
name: deepgram
api_key: dg-...
model: nova-3
options:
language: en-US
providers.tts – Text-to-Speech
Synthesises NPC voice responses in cascaded engine mode.
Registered providers: elevenlabs, coqui
providers:
tts:
name: elevenlabs
api_key: el-...
model: eleven_multilingual_v2
options:
output_format: pcm_48000
providers.s2s – Speech-to-Speech
End-to-end voice model that replaces the STT + LLM + TTS pipeline when an NPC uses engine: s2s.
Registered providers: openai-realtime, gemini-live
providers:
s2s:
name: openai-realtime
api_key: sk-...
model: gpt-4o-realtime-preview
providers.embeddings – Embedding Model
Used by the memory layer for semantic retrieval (pgvector).
Registered providers: openai, ollama
providers:
embeddings:
name: openai
api_key: sk-...
model: text-embedding-3-small
providers.vad – Voice Activity Detection
Determines when a player is speaking. Runs locally, no API key required.
Registered providers: silero
providers:
vad:
name: silero
options:
frame_size_ms: 30
speech_threshold: 0.5
silence_threshold: 0.35
providers.audio – Audio Platform
Connects Glyphoxa to a voice channel. The audio provider determines which voice transport is used and, for Discord, also creates the bot that provides slash commands and DM permissions.
Registered providers: discord, webrtc
Audio: discord
Creates a Discord bot, connects to the gateway, and provides voice channel transport plus slash commands (/session, /npc, /entity, /campaign).
| Field | Type | Description |
|---|---|---|
api_key | string | Required. Discord bot token (e.g., "Bot MTIz..."). Obtain from https://discord.com/developers/applications. |
options.guild_id | string | Required. Target Discord guild (server) ID. Alpha: one guild per bot instance. |
options.dm_role_id | string | Discord role ID for Dungeon Master permissions. Users with this role can execute privileged slash commands. When empty, all users are treated as DMs (useful for development). |
providers:
audio:
name: discord
api_key: "Bot MTIz..."
options:
guild_id: "123456789012345678"
dm_role_id: "987654321098765432"
Audio: webrtc
Enables browser-based voice sessions via WebRTC. No Discord account required. Currently in alpha – the PeerTransport interface abstracts the pion/webrtc integration.
| Field | Type | Default | Description |
|---|---|---|---|
options.stun_servers | []string | ["stun:stun.l.google.com:19302"] | STUN server URLs for ICE negotiation. |
options.sample_rate | int | 48000 | Audio sample rate in Hz. |
providers:
audio:
name: webrtc
options:
stun_servers:
- "stun:stun.l.google.com:19302"
sample_rate: 48000
npcs – NPC Definitions
An array of NPC configurations. Each entry describes a single NPC’s personality, voice, engine mode, and tool access.
| Field | Type | Default | Description |
|---|---|---|---|
name | string | – | Required. The NPC’s in-world display name (e.g., "Greymantle the Sage"). Must be unique. |
personality | string | "" | Free-text persona description injected into the LLM system prompt. Supports multi-line YAML. Hot-reloadable. |
voice | object | – | TTS voice profile for this NPC. See sub-fields below. Hot-reloadable. |
voice.provider | string | "" | TTS provider name (e.g., "elevenlabs", "coqui"). Should match providers.tts.name. |
voice.voice_id | string | "" | Provider-specific voice identifier. |
voice.pitch_shift | float | 0 | Pitch adjustment in the range [-10, +10]. 0 means default. |
voice.speed_factor | float | 0 | Speaking rate in the range [0.5, 2.0]. 1.0 means default; 0 means use provider default. |
engine | string | "" | Conversation pipeline mode. Valid values: cascaded (STT + LLM + TTS), s2s (end-to-end speech model), sentence_cascade (experimental dual-model). |
knowledge_scope | []string | [] | Topic domains the NPC is knowledgeable about. Used for routing player questions and building retrieval queries. |
tools | []string | [] | MCP tool names this NPC is permitted to invoke. |
budget_tier | string | "" | Constrains which MCP tools are offered based on latency. Valid values: fast (<=500ms), standard (<=1500ms), deep (all tools). Hot-reloadable. |
cascade_mode | string | "off" | Controls the dual-model sentence cascade. Only effective when engine is sentence_cascade. Valid values: off, auto, always. |
cascade | object | null | Sentence cascade engine settings. Only used when engine is sentence_cascade. |
cascade.fast_model | string | "" | Model for generating the opener sentence (fast, small model). Uses default LLM provider if empty. |
cascade.strong_model | string | "" | Model for generating the substantive continuation (large model). Uses default LLM provider if empty. |
cascade.opener_instruction | string | "" | Appended to the fast model’s system prompt. Uses a built-in instruction if empty. |
npcs:
- name: Greymantle the Sage
personality: |
You are Greymantle, an ancient and enigmatic wizard...
voice:
provider: elevenlabs
voice_id: pNInz6obpgDQGcFmaJgB
pitch_shift: -2.0
speed_factor: 0.85
engine: cascaded
budget_tier: standard
knowledge_scope:
- ancient history
- arcane magic
tools:
- lookup_spell
- query_lore_database
Engine Cross-Validation
The config validator enforces that the required providers are configured for each engine mode:
| Engine | Required Providers |
|---|---|
cascaded | providers.llm, providers.tts |
sentence_cascade | providers.llm, providers.tts |
s2s | providers.s2s |
memory – Long-Term Memory
| Field | Type | Default | Description |
|---|---|---|---|
memory.postgres_dsn | string | "" | PostgreSQL connection string for the pgvector memory store. Example: "postgres://user:pass@localhost:5432/glyphoxa?sslmode=disable". When empty, long-term memory is unavailable. |
memory.embedding_dimensions | int | 0 | Vector dimension for the embeddings column. Must match the model configured in providers.embeddings. Common values: 1536 (text-embedding-3-small), 3072 (text-embedding-3-large), 768 (nomic-embed-text). Defaults to 1536 if embeddings are configured but this field is unset. |
memory:
postgres_dsn: postgres://glyphoxa:secret@localhost:5432/glyphoxa?sslmode=disable
embedding_dimensions: 1536
mcp – Model Context Protocol Tool Servers
| Field | Type | Default | Description |
|---|---|---|---|
mcp.servers | []object | [] | List of MCP servers to connect to. |
Each server entry:
| Field | Type | Default | Description |
|---|---|---|---|
name | string | – | Required. Unique human-readable identifier (used in logs). |
transport | string | "" | Connection mechanism. Valid values: stdio, streamable-http. |
command | string | "" | Executable (with arguments) for stdio transport. Required when transport is stdio. Ignored for streamable-http. |
url | string | "" | MCP endpoint URL for streamable-http transport (e.g., "https://mcp.example.com/mcp"). Required when transport is streamable-http. Ignored for stdio. |
env | map[string]string | {} | Environment variables injected into the subprocess for stdio transport. |
auth | object | null | Authentication for streamable-http servers. Ignored for stdio. |
auth.token | string | "" | Static Bearer token sent in the Authorization header. Mutually exclusive with auth.oauth. |
auth.oauth | object | null | OAuth 2.1 client-credentials configuration. When set, auth.token is ignored. |
auth.oauth.client_id | string | "" | OAuth 2.1 client identifier. |
auth.oauth.client_secret | string | "" | OAuth 2.1 client secret. |
auth.oauth.token_url | string | "" | Authorization server’s token endpoint. |
auth.oauth.scopes | []string | [] | OAuth scopes to request. |
mcp:
servers:
# Stdio transport -- Glyphoxa spawns the process
- name: local-tools
transport: stdio
command: /usr/local/bin/mcp-tools --config /etc/mcp-tools.json
env:
MCP_LOG_LEVEL: info
# Streamable HTTP transport with static token auth
- name: web-search
transport: streamable-http
url: https://mcp.example.com/search
auth:
token: "Bearer sk-mcp-..."
# Streamable HTTP transport with OAuth 2.1
- name: enterprise-tools
transport: streamable-http
url: https://mcp.corp.example.com/mcp
auth:
oauth:
client_id: glyphoxa
client_secret: super-secret
token_url: https://auth.corp.example.com/oauth/token
scopes:
- mcp:tools
- mcp:read
campaign – Campaign Data
| Field | Type | Default | Description |
|---|---|---|---|
campaign.name | string | "" | Campaign’s human-readable name (e.g., "Curse of Strahd"). |
campaign.system | string | "" | Game system identifier (e.g., "dnd5e", "pf2e"). |
campaign.entity_files | []string | [] | Paths to YAML files containing entity definitions loaded at startup. Paths are resolved relative to the config file’s directory. |
campaign.vtt_imports | []object | [] | VTT export files to import at startup. |
campaign.vtt_imports[].path | string | – | Filesystem path to the VTT export file. |
campaign.vtt_imports[].format | string | – | VTT platform. Supported values: "foundry", "roll20". |
campaign:
name: Curse of Strahd
system: dnd5e
entity_files:
- entities/npcs.yaml
- entities/locations.yaml
vtt_imports:
- path: exports/foundry-actors.json
format: foundry
🧩 Provider-Specific Options
The options map in each provider entry accepts provider-specific keys. These are consumed by the provider factory functions at startup.
LLM Providers
All LLM providers (openai, anthropic, gemini, ollama, deepseek, mistral, groq, llamacpp, llamafile) use the standard api_key, base_url, and model fields. The options map is passed through to the underlying any-llm-go library but has no Glyphoxa-specific keys at this time.
| Option Key | Type | Default | Description |
|---|---|---|---|
max_tokens | int | provider default | Maximum tokens in the completion response. Forwarded via the completion request, not the provider constructor. |
STT: deepgram
| Option Key | Type | Default | Description |
|---|---|---|---|
language | string | "en" | BCP-47 language code (e.g., "en-US", "de-DE"). |
The model field sets the Deepgram model (default: "nova-3").
STT: whisper
Connects to a running whisper.cpp HTTP server (whisper-server).
| Option Key | Type | Default | Description |
|---|---|---|---|
language | string | "en" | BCP-47 language code for transcription. |
base_url is required – it must point to the whisper.cpp server (e.g., "http://localhost:8080"). The model field is forwarded as a hint to the server.
STT: whisper-native
Uses whisper.cpp via CGO bindings – no HTTP server needed. The model file is loaded directly into memory.
| Option Key | Type | Default | Description |
|---|---|---|---|
language | string | "en" | BCP-47 language code for transcription. |
model_path | string | – | Filesystem path to the .bin model file. Also accepted via the model field. |
TTS: elevenlabs
| Option Key | Type | Default | Description |
|---|---|---|---|
output_format | string | "pcm_16000" | Audio output format. Common values: "pcm_16000", "pcm_24000", "pcm_48000". |
The model field sets the ElevenLabs model ID (default: "eleven_flash_v2_5").
TTS: coqui
Connects to a locally-running Coqui TTS or XTTS v2 server.
| Option Key | Type | Default | Description |
|---|---|---|---|
language | string | "en" | BCP-47 language code sent to the TTS server. |
api_mode | string | "standard" | Server API mode. "standard" for the standard Coqui TTS Docker image; "xtts" for the XTTS v2 API server. XTTS mode enables voice cloning. |
base_url is required – it must point to the Coqui server (e.g., "http://localhost:5002" for standard, "http://localhost:8002" for XTTS).
S2S: openai-realtime
| Option Key | Type | Default | Description |
|---|---|---|---|
| (none) | No provider-specific options. Uses api_key, model, and base_url. |
Default model: "gpt-4o-realtime-preview".
Available voices: alloy, ash, ballad, coral, echo, sage, shimmer, verse.
S2S: gemini-live
| Option Key | Type | Default | Description |
|---|---|---|---|
| (none) | No provider-specific options. Uses api_key, model, and base_url. |
Default model: "gemini-2.0-flash-live-001".
Available voices: Aoede, Charon, Fenrir, Kore, Puck.
Embeddings: openai
| Option Key | Type | Default | Description |
|---|---|---|---|
| (none) | No provider-specific options. Uses api_key, model, and base_url. |
Default model: "text-embedding-3-small" (1536 dimensions).
Embeddings: ollama
| Option Key | Type | Default | Description |
|---|---|---|---|
| (none) | No provider-specific options. Uses base_url and model. |
Default base URL: "http://localhost:11434". Well-known dimension mappings: nomic-embed-text (768), mxbai-embed-large (1024), all-minilm (384). Unknown models are auto-probed on first use.
VAD: silero
| Option Key | Type | Default | Description |
|---|---|---|---|
frame_size_ms | int | 30 | Audio frame duration in milliseconds (e.g., 10, 20, 30). |
speech_threshold | float | 0.5 | Probability above which a frame is classified as speech. Range: [0.0, 1.0]. |
silence_threshold | float | 0.35 | Probability below which an active speech segment is considered ended. Range: [0.0, 1.0]. Must be <= speech_threshold. |
🚀 Minimal Configuration
The smallest valid config to get Glyphoxa running with a single NPC in cascaded mode:
server:
listen_addr: ":8080"
providers:
llm:
name: openai
api_key: sk-...
model: gpt-4o
stt:
name: deepgram
api_key: dg-...
model: nova-3
tts:
name: elevenlabs
api_key: el-...
npcs:
- name: Tavern Keeper
personality: You are a friendly tavern keeper.
voice:
voice_id: pNInz6obpgDQGcFmaJgB
engine: cascaded
To run with the S2S pipeline instead (no separate STT/TTS needed):
server:
listen_addr: ":8080"
providers:
s2s:
name: openai-realtime
api_key: sk-...
model: gpt-4o-realtime-preview
npcs:
- name: Tavern Keeper
personality: You are a friendly tavern keeper.
voice:
voice_id: alloy
engine: s2s
📄 Full Example
A fully annotated example configuration is maintained at configs/example.yaml. Copy it, rename it to config.yaml, and fill in your API keys to get started.
🔗 See Also
docs/getting-started.md– First-run quickstart guidedocs/providers.md– Deep dive into each provider’s capabilities and trade-offsdocs/deployment.md– Production deployment patterns (Docker, systemd, Kubernetes)configs/example.yaml– Annotated example configuration file