Glyphoxa is configured through a single YAML file. This document is the authoritative reference for every configuration field, provider option, and runtime behaviour related to configuration.


📖 Overview

Config file format

Glyphoxa reads a YAML configuration file at startup. The YAML decoder operates in strict mode (KnownFields(true)) – any unrecognised key in the file causes a hard parse error. This catches typos early.

Specifying the config file

Pass the path via the -config CLI flag:

glyphoxa -config /etc/glyphoxa/config.yaml

The default is config.yaml in the working directory.

Binary mode

Select the binary mode with the --mode flag:

glyphoxa --mode=gateway --config /etc/glyphoxa/config.yaml
Mode Description
full (default) Single-process self-hosted deployment
gateway Multi-tenant session orchestrator with admin API
worker Voice pipeline executor (created by gateway)
mcp-gateway Shared MCP tool server

In --mode=gateway and --mode=worker, several environment variables configure the distributed system. See Deployment: Environment Variables for the full list.

Environment variable fallbacks

Glyphoxa itself does not read environment variables for configuration overrides. However, several provider SDKs honour their own environment variables when no api_key is set in the config file:

Provider Environment Variable
openai (LLM, embeddings) OPENAI_API_KEY
anthropic ANTHROPIC_API_KEY
gemini GEMINI_API_KEY / GOOGLE_API_KEY
deepseek DEEPSEEK_API_KEY
mistral MISTRAL_API_KEY
groq GROQ_API_KEY

These are read by the upstream SDKs, not by Glyphoxa’s config loader. If you set api_key explicitly in the YAML, it takes precedence.


🔄 Hot Reload

The configuration file can be edited while Glyphoxa is running. A background config watcher detects changes and applies them without a restart.

How it works

  1. The watcher polls the config file (no fsnotify dependency).
  2. Default polling interval: 5 seconds.
  3. On each tick the file’s mtime is checked first (cheap). If mtime has not changed, no further work is done.
  4. If mtime changed, the file is read and its SHA-256 hash is compared to the previously loaded config. Touching the file without changing content is a no-op.
  5. The new file is fully parsed and validated before it replaces the old config. If validation fails, the old config is retained and a warning is logged.
  6. When a valid change is detected, the onChange callback receives both the old and new Config values.

What can be hot-reloaded

The Diff function tracks which changes are safe to apply at runtime:

Change Hot-reloaded? Notes
server.log_level ✅ Yes Takes effect immediately
NPC personality ✅ Yes System prompt updated on next interaction
NPC voice (provider, voice_id, pitch, speed) ✅ Yes Applied to new TTS sessions
NPC budget_tier ✅ Yes Changes tool filtering immediately
Adding a new NPC ✅ Yes NPC becomes available without restart
Removing an NPC ✅ Yes NPC is unloaded
Provider changes (api_key, model, etc.) ❌ No Requires restart
server.listen_addr / server.tls ❌ No Requires restart
providers.audio.* ❌ No Requires restart
memory.* ❌ No Requires restart
mcp.servers ❌ No Requires restart
campaign.* ❌ No Requires restart

🗂️ Complete Field Reference

server – Server Settings

Field Type Default Description
server.listen_addr string "" TCP address to listen on (e.g., ":8080"). Empty means the server does not bind an HTTP listener.
server.log_level string "info" Log verbosity. Valid values: debug, info, warn, error. Hot-reloadable.
server.tls object null TLS configuration block. When omitted or null, the server runs plain HTTP.
server.tls.cert_file string Path to PEM-encoded TLS certificate. Required if tls is set.
server.tls.key_file string Path to PEM-encoded TLS private key. Required if tls is set.
server:
  listen_addr: ":8080"
  log_level: info
  tls:
    cert_file: /etc/ssl/glyphoxa.crt
    key_file: /etc/ssl/glyphoxa.key

providers – AI Provider Configuration

Each provider slot follows the same ProviderEntry schema:

Field Type Default Description
name string "" Registered provider implementation name (e.g., "openai", "deepgram"). Leave empty to disable the pipeline stage.
api_key string "" Authentication key for the provider’s API. Some providers fall back to environment variables when empty.
base_url string "" Override the provider’s default API endpoint. Leave empty to use the built-in default.
model string "" Model name within the provider (e.g., "gpt-4o", "nova-3").
options map[string]any {} Provider-specific settings not covered by the standard fields. See Provider-Specific Options below.

providers.llm – Large Language Model

Used for NPC reasoning in cascaded and sentence_cascade engine modes.

Registered providers: openai, anthropic, ollama, gemini, deepseek, mistral, groq, llamacpp, llamafile

providers:
  llm:
    name: openai
    api_key: sk-...
    model: gpt-4o
    options:
      max_tokens: 1024

providers.stt – Speech-to-Text

Transcribes player audio in cascaded engine mode.

Registered providers: deepgram, whisper, whisper-native

providers:
  stt:
    name: deepgram
    api_key: dg-...
    model: nova-3
    options:
      language: en-US

providers.tts – Text-to-Speech

Synthesises NPC voice responses in cascaded engine mode.

Registered providers: elevenlabs, coqui

providers:
  tts:
    name: elevenlabs
    api_key: el-...
    model: eleven_multilingual_v2
    options:
      output_format: pcm_48000

providers.s2s – Speech-to-Speech

End-to-end voice model that replaces the STT + LLM + TTS pipeline when an NPC uses engine: s2s.

Registered providers: openai-realtime, gemini-live

providers:
  s2s:
    name: openai-realtime
    api_key: sk-...
    model: gpt-4o-realtime-preview

providers.embeddings – Embedding Model

Used by the memory layer for semantic retrieval (pgvector).

Registered providers: openai, ollama

providers:
  embeddings:
    name: openai
    api_key: sk-...
    model: text-embedding-3-small

providers.vad – Voice Activity Detection

Determines when a player is speaking. Runs locally, no API key required.

Registered providers: silero

providers:
  vad:
    name: silero
    options:
      frame_size_ms: 30
      speech_threshold: 0.5
      silence_threshold: 0.35

providers.audio – Audio Platform

Connects Glyphoxa to a voice channel. The audio provider determines which voice transport is used and, for Discord, also creates the bot that provides slash commands and DM permissions.

Registered providers: discord, webrtc

Audio: discord

Creates a Discord bot, connects to the gateway, and provides voice channel transport plus slash commands (/session, /npc, /entity, /campaign).

Field Type Description
api_key string Required. Discord bot token (e.g., "Bot MTIz..."). Obtain from https://discord.com/developers/applications.
options.guild_id string Required. Target Discord guild (server) ID. Alpha: one guild per bot instance.
options.dm_role_id string Discord role ID for Dungeon Master permissions. Users with this role can execute privileged slash commands. When empty, all users are treated as DMs (useful for development).
providers:
  audio:
    name: discord
    api_key: "Bot MTIz..."
    options:
      guild_id: "123456789012345678"
      dm_role_id: "987654321098765432"
Audio: webrtc

Enables browser-based voice sessions via WebRTC. No Discord account required. Currently in alpha – the PeerTransport interface abstracts the pion/webrtc integration.

Field Type Default Description
options.stun_servers []string ["stun:stun.l.google.com:19302"] STUN server URLs for ICE negotiation.
options.sample_rate int 48000 Audio sample rate in Hz.
providers:
  audio:
    name: webrtc
    options:
      stun_servers:
        - "stun:stun.l.google.com:19302"
      sample_rate: 48000

npcs – NPC Definitions

An array of NPC configurations. Each entry describes a single NPC’s personality, voice, engine mode, and tool access.

Field Type Default Description
name string Required. The NPC’s in-world display name (e.g., "Greymantle the Sage"). Must be unique.
personality string "" Free-text persona description injected into the LLM system prompt. Supports multi-line YAML. Hot-reloadable.
voice object TTS voice profile for this NPC. See sub-fields below. Hot-reloadable.
voice.provider string "" TTS provider name (e.g., "elevenlabs", "coqui"). Should match providers.tts.name.
voice.voice_id string "" Provider-specific voice identifier.
voice.pitch_shift float 0 Pitch adjustment in the range [-10, +10]. 0 means default.
voice.speed_factor float 0 Speaking rate in the range [0.5, 2.0]. 1.0 means default; 0 means use provider default.
engine string "" Conversation pipeline mode. Valid values: cascaded (STT + LLM + TTS), s2s (end-to-end speech model), sentence_cascade (experimental dual-model).
knowledge_scope []string [] Topic domains the NPC is knowledgeable about. Used for routing player questions and building retrieval queries.
tools []string [] MCP tool names this NPC is permitted to invoke.
budget_tier string "" Constrains which MCP tools are offered based on latency. Valid values: fast (<=500ms), standard (<=1500ms), deep (all tools). Hot-reloadable.
cascade_mode string "off" Controls the dual-model sentence cascade. Only effective when engine is sentence_cascade. Valid values: off, auto, always.
cascade object null Sentence cascade engine settings. Only used when engine is sentence_cascade.
cascade.fast_model string "" Model for generating the opener sentence (fast, small model). Uses default LLM provider if empty.
cascade.strong_model string "" Model for generating the substantive continuation (large model). Uses default LLM provider if empty.
cascade.opener_instruction string "" Appended to the fast model’s system prompt. Uses a built-in instruction if empty.
npcs:
  - name: Greymantle the Sage
    personality: |
      You are Greymantle, an ancient and enigmatic wizard...
    voice:
      provider: elevenlabs
      voice_id: pNInz6obpgDQGcFmaJgB
      pitch_shift: -2.0
      speed_factor: 0.85
    engine: cascaded
    budget_tier: standard
    knowledge_scope:
      - ancient history
      - arcane magic
    tools:
      - lookup_spell
      - query_lore_database

Engine Cross-Validation

The config validator enforces that the required providers are configured for each engine mode:

Engine Required Providers
cascaded providers.llm, providers.tts
sentence_cascade providers.llm, providers.tts
s2s providers.s2s

memory – Long-Term Memory

Field Type Default Description
memory.postgres_dsn string "" PostgreSQL connection string for the pgvector memory store. Example: "postgres://user:pass@localhost:5432/glyphoxa?sslmode=disable". When empty, long-term memory is unavailable.
memory.embedding_dimensions int 0 Vector dimension for the embeddings column. Must match the model configured in providers.embeddings. Common values: 1536 (text-embedding-3-small), 3072 (text-embedding-3-large), 768 (nomic-embed-text). Defaults to 1536 if embeddings are configured but this field is unset.
memory:
  postgres_dsn: postgres://glyphoxa:secret@localhost:5432/glyphoxa?sslmode=disable
  embedding_dimensions: 1536

mcp – Model Context Protocol Tool Servers

Field Type Default Description
mcp.servers []object [] List of MCP servers to connect to.

Each server entry:

Field Type Default Description
name string Required. Unique human-readable identifier (used in logs).
transport string "" Connection mechanism. Valid values: stdio, streamable-http.
command string "" Executable (with arguments) for stdio transport. Required when transport is stdio. Ignored for streamable-http.
url string "" MCP endpoint URL for streamable-http transport (e.g., "https://mcp.example.com/mcp"). Required when transport is streamable-http. Ignored for stdio.
env map[string]string {} Environment variables injected into the subprocess for stdio transport.
auth object null Authentication for streamable-http servers. Ignored for stdio.
auth.token string "" Static Bearer token sent in the Authorization header. Mutually exclusive with auth.oauth.
auth.oauth object null OAuth 2.1 client-credentials configuration. When set, auth.token is ignored.
auth.oauth.client_id string "" OAuth 2.1 client identifier.
auth.oauth.client_secret string "" OAuth 2.1 client secret.
auth.oauth.token_url string "" Authorization server’s token endpoint.
auth.oauth.scopes []string [] OAuth scopes to request.
mcp:
  servers:
    # Stdio transport -- Glyphoxa spawns the process
    - name: local-tools
      transport: stdio
      command: /usr/local/bin/mcp-tools --config /etc/mcp-tools.json
      env:
        MCP_LOG_LEVEL: info

    # Streamable HTTP transport with static token auth
    - name: web-search
      transport: streamable-http
      url: https://mcp.example.com/search
      auth:
        token: "Bearer sk-mcp-..."

    # Streamable HTTP transport with OAuth 2.1
    - name: enterprise-tools
      transport: streamable-http
      url: https://mcp.corp.example.com/mcp
      auth:
        oauth:
          client_id: glyphoxa
          client_secret: super-secret
          token_url: https://auth.corp.example.com/oauth/token
          scopes:
            - mcp:tools
            - mcp:read

campaign – Campaign Data

Field Type Default Description
campaign.name string "" Campaign’s human-readable name (e.g., "Curse of Strahd").
campaign.system string "" Game system identifier (e.g., "dnd5e", "pf2e").
campaign.entity_files []string [] Paths to YAML files containing entity definitions loaded at startup. Paths are resolved relative to the config file’s directory.
campaign.vtt_imports []object [] VTT export files to import at startup.
campaign.vtt_imports[].path string Filesystem path to the VTT export file.
campaign.vtt_imports[].format string VTT platform. Supported values: "foundry", "roll20".
campaign:
  name: Curse of Strahd
  system: dnd5e
  entity_files:
    - entities/npcs.yaml
    - entities/locations.yaml
  vtt_imports:
    - path: exports/foundry-actors.json
      format: foundry

🧩 Provider-Specific Options

The options map in each provider entry accepts provider-specific keys. These are consumed by the provider factory functions at startup.

LLM Providers

All LLM providers (openai, anthropic, gemini, ollama, deepseek, mistral, groq, llamacpp, llamafile) use the standard api_key, base_url, and model fields. The options map is passed through to the underlying any-llm-go library but has no Glyphoxa-specific keys at this time.

Option Key Type Default Description
max_tokens int provider default Maximum tokens in the completion response. Forwarded via the completion request, not the provider constructor.

STT: deepgram

Option Key Type Default Description
language string "en" BCP-47 language code (e.g., "en-US", "de-DE").

The model field sets the Deepgram model (default: "nova-3").

STT: whisper

Connects to a running whisper.cpp HTTP server (whisper-server).

Option Key Type Default Description
language string "en" BCP-47 language code for transcription.

base_url is required – it must point to the whisper.cpp server (e.g., "http://localhost:8080"). The model field is forwarded as a hint to the server.

STT: whisper-native

Uses whisper.cpp via CGO bindings – no HTTP server needed. The model file is loaded directly into memory.

Option Key Type Default Description
language string "en" BCP-47 language code for transcription.
model_path string Filesystem path to the .bin model file. Also accepted via the model field.

TTS: elevenlabs

Option Key Type Default Description
output_format string "pcm_16000" Audio output format. Common values: "pcm_16000", "pcm_24000", "pcm_48000".

The model field sets the ElevenLabs model ID (default: "eleven_flash_v2_5").

TTS: coqui

Connects to a locally-running Coqui TTS or XTTS v2 server.

Option Key Type Default Description
language string "en" BCP-47 language code sent to the TTS server.
api_mode string "standard" Server API mode. "standard" for the standard Coqui TTS Docker image; "xtts" for the XTTS v2 API server. XTTS mode enables voice cloning.

base_url is required – it must point to the Coqui server (e.g., "http://localhost:5002" for standard, "http://localhost:8002" for XTTS).

S2S: openai-realtime

Option Key Type Default Description
(none)     No provider-specific options. Uses api_key, model, and base_url.

Default model: "gpt-4o-realtime-preview".

Available voices: alloy, ash, ballad, coral, echo, sage, shimmer, verse.

S2S: gemini-live

Option Key Type Default Description
(none)     No provider-specific options. Uses api_key, model, and base_url.

Default model: "gemini-2.0-flash-live-001".

Available voices: Aoede, Charon, Fenrir, Kore, Puck.

Embeddings: openai

Option Key Type Default Description
(none)     No provider-specific options. Uses api_key, model, and base_url.

Default model: "text-embedding-3-small" (1536 dimensions).

Embeddings: ollama

Option Key Type Default Description
(none)     No provider-specific options. Uses base_url and model.

Default base URL: "http://localhost:11434". Well-known dimension mappings: nomic-embed-text (768), mxbai-embed-large (1024), all-minilm (384). Unknown models are auto-probed on first use.

VAD: silero

Option Key Type Default Description
frame_size_ms int 30 Audio frame duration in milliseconds (e.g., 10, 20, 30).
speech_threshold float 0.5 Probability above which a frame is classified as speech. Range: [0.0, 1.0].
silence_threshold float 0.35 Probability below which an active speech segment is considered ended. Range: [0.0, 1.0]. Must be <= speech_threshold.

🚀 Minimal Configuration

The smallest valid config to get Glyphoxa running with a single NPC in cascaded mode:

server:
  listen_addr: ":8080"

providers:
  llm:
    name: openai
    api_key: sk-...
    model: gpt-4o
  stt:
    name: deepgram
    api_key: dg-...
    model: nova-3
  tts:
    name: elevenlabs
    api_key: el-...

npcs:
  - name: Tavern Keeper
    personality: You are a friendly tavern keeper.
    voice:
      voice_id: pNInz6obpgDQGcFmaJgB
    engine: cascaded

To run with the S2S pipeline instead (no separate STT/TTS needed):

server:
  listen_addr: ":8080"

providers:
  s2s:
    name: openai-realtime
    api_key: sk-...
    model: gpt-4o-realtime-preview

npcs:
  - name: Tavern Keeper
    personality: You are a friendly tavern keeper.
    voice:
      voice_id: alloy
    engine: s2s

📄 Full Example

A fully annotated example configuration is maintained at configs/example.yaml. Copy it, rename it to config.yaml, and fill in your API keys to get started.


🔗 See Also


This site uses Just the Docs, a documentation theme for Jekyll.