Guide for deploying Glyphoxa in production. Covers Docker Compose, binary releases, building from source, database setup, TLS, and operational considerations.
๐ฆ Overview
Glyphoxa can be deployed in three ways:
| Method | Best for | GPU support | Complexity |
|---|---|---|---|
| Docker Compose (recommended) | Most deployments โ local or cloud providers | NVIDIA (via container toolkit) | Low |
| Binary from release | Custom orchestration, systemd, bare-metal | Host-native (CUDA, Metal) | Medium |
| Build from source | Development, customisation, non-Linux platforms | Host-native | High |
| Kubernetes / Helm (multi-tenant) | SaaS, multi-tenant production | NVIDIA (via device plugin) | Medium |
All methods produce a binary with whisper.cpp statically linked and libdave dynamically linked. The binary requires a PostgreSQL database with the pgvector extension and a YAML configuration file.
๐ณ Docker Compose Deployment
The deployments/compose/ directory provides a ready-to-use Docker Compose stack with two operational modes.
Local mode (no API keys)
Uses Ollama (LLM + embeddings), native whisper.cpp (STT), Coqui TTS, and Silero VAD โ all running locally:
cd deployments/compose
cp config.local.yaml config.yaml
docker compose --profile local up -d
On first start, the bootstrap services automatically download:
- llama3.2 (~2 GB) and nomic-embed-text (~300 MB) via Ollama
- ggml-base.en.bin (~147 MB) whisper model
Cloud mode (API-based providers)
Uses cloud APIs (OpenAI, Deepgram, ElevenLabs, etc.) โ only PostgreSQL runs locally:
cd deployments/compose
cp config.yaml.example config.yaml
# Edit config.yaml โ add your API keys
docker compose up -d
Profiles
| Profile | Command | Services started |
|---|---|---|
| (default) | docker compose up | postgres, glyphoxa |
local | docker compose --profile local up | postgres, glyphoxa, ollama, ollama-bootstrap, whisper-model-download, tts |
alpha | docker compose --profile alpha up | adds prometheus, grafana |
Profiles can be combined: docker compose --profile local --profile alpha up starts the full stack with monitoring.
For complete setup details โ GPU acceleration, model selection, endpoints, and troubleshooting โ see deployments/compose/README.md.
Kubernetes / Helm Deployment
For multi-tenant SaaS deployments, Glyphoxa provides a Helm chart that deploys a gateway, worker job template, and optional MCP gateway.
See Multi-Tenant Architecture for the tenant model, admin API, and session orchestration details.
Prerequisites
- Kubernetes 1.28+
- Helm 3
- PostgreSQL with pgvector (external or via the bundled Bitnami subchart)
- Sealed Secrets controller (optional, for secret management)
Binary Modes
The Glyphoxa binary supports four modes:
| Mode | Flag | Description |
|---|---|---|
| Full | --mode=full | Single-process (default). Self-hosted, no admin API. |
| Gateway | --mode=gateway | Session orchestrator with admin API. Manages bots and routes sessions. |
| Worker | --mode=worker | Voice pipeline executor. Created as Kubernetes Jobs by the gateway. |
| MCP Gateway | --mode=mcp-gateway | Shared tool server for all workers. |
Quick Start (Shared Topology)
# Build dependencies
helm dependency build deploy/helm/glyphoxa/
# Install with shared topology
helm install glyphoxa deploy/helm/glyphoxa/ \
-f deploy/helm/glyphoxa/values-shared.yaml \
--set database.dsn="postgres://user:pass@db:5432/glyphoxa" \
--set gateway.adminKey="your-secret-key" \
-n glyphoxa --create-namespace
Dedicated Tenant
helm install glyphoxa-tenant-abc deploy/helm/glyphoxa/ \
-f deploy/helm/glyphoxa/values-dedicated.yaml \
--set database.dsn="postgres://user:pass@dedicated-db:5432/glyphoxa" \
-n glyphoxa-tenant-abc --create-namespace
Key Values
| Value | Default | Description |
|---|---|---|
topology | shared | shared or dedicated |
gateway.replicaCount | 3 | Gateway replicas (HA) |
gateway.adminKey | "" | Admin API bearer token |
gateway.ports.admin | 8081 | Admin API port |
gateway.ports.grpc | 50051 | gRPC port |
gateway.ports.observe | 9090 | Metrics/health port |
worker.resourceProfile | cloud | cloud, whisper-native, or local-llm |
worker.activeDeadlineSeconds | 14400 | Max session duration (4h) |
mcpGateway.enabled | true | Deploy MCP gateway |
networkPolicy.enabled | true | Enable NetworkPolicies |
autoscaling.enabled | false | HPA for gateway |
Environment Variables
Gateway and worker pods use these environment variables (set automatically by the Helm chart):
| Variable | Mode | Description |
|---|---|---|
GLYPHOXA_ADMIN_KEY | gateway | Admin API bearer token |
GLYPHOXA_GRPC_ADDR | gateway, worker | gRPC listen address |
GLYPHOXA_GATEWAY_ADDR | worker | Gateway gRPC address for callbacks |
GLYPHOXA_DATABASE_DSN | gateway, worker | PostgreSQL connection string |
GLYPHOXA_MCP_GATEWAY_URL | worker | MCP gateway HTTP URL |
Tier-Aware Node Scheduling
The Helm chart supports scheduling workloads onto tier-specific node pools:
# values-shared.yaml sets:
gateway.nodeSelector: { "glyphoxa.io/tier": "shared" }
gateway.tolerations: [{ key: "glyphoxa.io/tier", value: "shared", effect: "NoSchedule" }]
# values-dedicated.yaml sets:
gateway.nodeSelector: { "glyphoxa.io/tier": "dedicated" }
gateway.tolerations: [{ key: "glyphoxa.io/tier", value: "dedicated", effect: "NoSchedule" }]
When worker.resourceProfile is local-llm, GPU tolerations (nvidia.com/gpu) are automatically merged so workers land on GPU-equipped nodes.
NetworkPolicy
When networkPolicy.enabled=true, the chart deploys policies that:
- Allow gateway to receive admin API and gRPC traffic
- Allow workers to connect to gateway gRPC and MCP gateway HTTP
- Allow MCP gateway to receive traffic from workers only
- Block all other inter-pod communication
๐จ Building from Source
Prerequisites
- Go 1.26+ with
CGO_ENABLED=1 - C/C++ toolchain:
gcc,g++,cmake(for whisper.cpp) - System libraries:
libopus-dev(Debian/Ubuntu),opus(Arch/macOS) - ONNX Runtime: required for the Silero VAD provider
- libdave:
make dave-libs(for Discord DAVE E2EE voice encryption)
Install build dependencies:
# Debian / Ubuntu
sudo apt install -y build-essential cmake git libopus-dev pkg-config
# Arch Linux
sudo pacman -S base-devel cmake git opus
# macOS
brew install cmake opus pkg-config
Build the binary
# 1. Build whisper.cpp static library
make whisper-libs
# 2. Install libdave shared library
make dave-libs
# 3. Set environment for CGO linking
export C_INCLUDE_PATH=/tmp/whisper-install/include
export LIBRARY_PATH=/tmp/whisper-install/lib
export PKG_CONFIG_PATH="$HOME/.local/lib/pkgconfig:$PKG_CONFIG_PATH"
export LD_LIBRARY_PATH="$HOME/.local/lib:$LD_LIBRARY_PATH"
export CGO_ENABLED=1
# 4. Build Glyphoxa
make build
The binary is output to ./bin/glyphoxa. It requires libdave.so at runtime (installed via make dave-libs).
Run the binary
./bin/glyphoxa -config /path/to/config.yaml
๐ฆ GoReleaser
Tagged releases (v*) are built automatically by GitHub Actions using GoReleaser.
What the release pipeline produces
- Statically-linked binaries for
linux/amd64andlinux/arm64 - tar.gz archives named
glyphoxa_<version>_linux_<arch>.tar.gz - SHA-256 checksums in
checksums.txt - Multi-arch Docker image pushed to
ghcr.io/mrwong99/glyphoxa
Docker image tags
Each release produces the following tags:
ghcr.io/mrwong99/glyphoxa:<version> # e.g. 0.3.0
ghcr.io/mrwong99/glyphoxa:<major>.<minor> # e.g. 0.3
The main branch builds a separate image via the multi-stage Dockerfile (used in the Compose stack by default).
Supported architectures
| Architecture | Binary | Docker image |
|---|---|---|
linux/amd64 | Yes | Yes |
linux/arm64 | Yes | Yes |
Both architectures cross-compile whisper.cpp and libopus as static libraries during the CI build. The release Docker image uses gcr.io/distroless/cc-debian12:nonroot as its base (includes glibc/libstdc++ required by libdave) โ no shell, no package manager, minimal attack surface.
๐ฎ GPU Setup
GPU acceleration is optional but strongly recommended for local inference providers.
Ollama (LLM)
Ollama automatically detects NVIDIA GPUs. Ensure the NVIDIA Container Toolkit is installed, then enable GPU passthrough in docker-compose.yml:
ollama:
image: ollama/ollama:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
This block is already present in the provided Compose file.
Coqui TTS
The Compose stack uses the GPU-enabled Coqui image (ghcr.io/coqui-ai/tts) with --use_cuda true. For CPU-only deployments, switch to the CPU image and remove the flag:
tts:
image: ghcr.io/coqui-ai/tts-cpu
command: ["--model_name", "tts_models/en/ljspeech/vits", "--port", "5002"]
# Remove the deploy.resources block
whisper.cpp (native STT)
whisper.cpp is compiled into the Glyphoxa binary. GPU acceleration depends on how the library was built:
| Backend | Build flag | Platform |
|---|---|---|
| CPU only | (default) | All |
| CUDA | -DGGML_CUDA=ON | Linux (NVIDIA) |
| Metal | -DGGML_METAL=ON | macOS (Apple Silicon) |
To build with CUDA support locally:
git clone --depth 1 https://github.com/ggml-org/whisper.cpp.git /tmp/whisper-src
cd /tmp/whisper-src
cmake -B build \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DGGML_CUDA=ON \
-DWHISPER_BUILD_EXAMPLES=OFF \
-DWHISPER_BUILD_TESTS=OFF
cmake --build build --config Release -j$(nproc)
Then set C_INCLUDE_PATH and LIBRARY_PATH to point at the built artifacts and rebuild Glyphoxa with make build.
Note: The Docker multi-stage build (
Dockerfile) compiles whisper.cpp in CPU-only mode. For GPU-accelerated whisper in containers, build a custom image with the CUDA flag enabled and the appropriate NVIDIA base image.
๐๏ธ PostgreSQL Production Setup
Glyphoxa uses PostgreSQL with the pgvector extension for semantic memory and knowledge graph storage.
Required extension
The pgvector extension must be available in your PostgreSQL installation. Glyphoxa runs CREATE EXTENSION IF NOT EXISTS vector automatically during schema migration.
For the Docker Compose setup, the pgvector/pgvector:pg17 image includes the extension pre-installed and the init.sql script enables it:
CREATE EXTENSION IF NOT EXISTS vector;
Schema initialisation
Glyphoxa auto-migrates its schema on startup using idempotent DDL (CREATE TABLE IF NOT EXISTS / CREATE INDEX IF NOT EXISTS). The following tables are created:
| Table | Purpose |
|---|---|
session_entries | Short-term conversation log (L1 memory) |
entities | Knowledge graph nodes (NPCs, locations, items) |
relationships | Knowledge graph edges between entities |
chunks | Semantic retrieval vectors (L2 memory, pgvector) |
npc_definitions | Persistent NPC configurations |
Recommended indices
All indices are created automatically by the migration. Key indices for query performance:
-- Session entries
CREATE INDEX idx_session_entries_session_id ON session_entries (session_id);
CREATE INDEX idx_session_entries_timestamp ON session_entries (timestamp);
CREATE INDEX idx_session_entries_session_timestamp ON session_entries (session_id, timestamp);
CREATE INDEX idx_session_entries_fts ON session_entries USING GIN (to_tsvector('english', text));
-- Knowledge graph
CREATE INDEX idx_entities_type ON entities (type);
CREATE INDEX idx_entities_name ON entities (name);
CREATE INDEX idx_rel_source ON relationships (source_id);
CREATE INDEX idx_rel_target ON relationships (target_id);
-- Vector search (HNSW)
CREATE INDEX idx_chunks_embedding ON chunks USING hnsw (embedding vector_cosine_ops);
The HNSW index on the chunks table is critical for vector similarity search performance. It is created with the default pgvector HNSW parameters; for large deployments (>100k chunks), consider tuning m and ef_construction:
CREATE INDEX idx_chunks_embedding ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 24, ef_construction = 200);
Embedding dimensions
The embedding_dimensions config value must match the model configured in providers.embeddings. Common values:
| Model | Dimensions |
|---|---|
nomic-embed-text (Ollama) | 768 |
text-embedding-3-small (OpenAI) | 1536 |
text-embedding-3-large (OpenAI) | 3072 |
Backup considerations
- Regular backups: Use
pg_dumpor continuous archiving (WAL-based PITR) for production deployments. - Volume snapshots: For Docker deployments, back up the
pgdatavolume. - The
session_entriestable grows proportionally to game session length. Consider partitioning bysession_idor archiving old sessions for long-running campaigns. - The
chunkstable contains embedding vectors that are expensive to regenerate. Prioritise this table in backup strategies.
Connection string
Configure the DSN in your config file:
memory:
postgres_dsn: "postgres://glyphoxa:STRONG_PASSWORD@db-host:5432/glyphoxa?sslmode=require"
embedding_dimensions: 768
For production, always use sslmode=require or sslmode=verify-full.
๐ TLS Configuration
Glyphoxa supports TLS natively via the server.tls configuration block. When configured, the server listens on HTTPS instead of plain HTTP.
server:
listen_addr: ":8443"
log_level: info
tls:
cert_file: /etc/glyphoxa/tls/server.crt
key_file: /etc/glyphoxa/tls/server.key
| Field | Description |
|---|---|
server.tls.cert_file | Path to the PEM-encoded TLS certificate (or certificate chain) |
server.tls.key_file | Path to the PEM-encoded TLS private key |
When tls is omitted or null, the server runs plain HTTP.
For Docker deployments, mount the certificate files as read-only volumes:
glyphoxa:
volumes:
- ./config.yaml:/etc/glyphoxa/config.yaml:ro
- /etc/letsencrypt/live/example.com/fullchain.pem:/etc/glyphoxa/tls/server.crt:ro
- /etc/letsencrypt/live/example.com/privkey.pem:/etc/glyphoxa/tls/server.key:ro
Tip: In most production setups, TLS is terminated at a reverse proxy (nginx, Caddy, Traefik) or cloud load balancer rather than in Glyphoxa itself. Use the built-in TLS when Glyphoxa is exposed directly or when end-to-end encryption is required.
๐ Resource Requirements
Recommendations vary by deployment profile. All values are per-host minimums.
Minimal (cloud providers only)
Glyphoxa + PostgreSQL. All inference handled by cloud APIs.
| Resource | Recommendation |
|---|---|
| CPU | 2 cores |
| Memory | 1 GB |
| Storage | 5 GB (database + config) |
| GPU | Not required |
| Network | Low latency to cloud API endpoints |
Standard (mixed local + cloud)
Glyphoxa + PostgreSQL + Ollama (LLM). STT/TTS via cloud APIs.
| Resource | Recommendation |
|---|---|
| CPU | 4 cores |
| Memory | 8 GB (4 GB for Ollama with 3B model) |
| Storage | 15 GB (models + database) |
| GPU | Recommended (6+ GB VRAM for 3B model) |
Full local (no cloud APIs)
Glyphoxa + PostgreSQL + Ollama + Coqui TTS. Whisper.cpp runs in-process.
| Resource | Recommendation |
|---|---|
| CPU | 8 cores |
| Memory | 16 GB |
| Storage | 30 GB (LLM models ~2 GB, embedding model ~300 MB, TTS models, whisper models, database) |
| GPU | Strongly recommended (8+ GB VRAM) |
| Shared memory | 256 MB+ for PostgreSQL (shm_size in Docker) |
Scaling notes
- Concurrent sessions: Each active voice session maintains a WebSocket connection and a VAD pipeline. Budget ~200 MB per concurrent session for audio buffers and inference state.
- Ollama: Memory usage scales with model size. The 3B llama3.2 model uses ~2 GB in VRAM (GPU) or ~4 GB in RAM (CPU). Larger models (7B, 13B) require proportionally more.
- whisper.cpp: The
base.enmodel uses ~150 MB. Thelarge-v3model uses ~3 GB. Memory usage is per-inference, not persistent. - PostgreSQL: For campaigns with thousands of session entries, increase
shared_buffersandwork_memfrom the PostgreSQL defaults.
๐ฅ Health Checks
Glyphoxa exposes two health endpoints for load balancer integration and container orchestration.
Liveness probe: /healthz
Always returns 200 OK when the process is running and serving HTTP:
curl http://localhost:8080/healthz
{"status": "ok"}
Readiness probe: /readyz
Returns 200 OK only when all registered dependency checks pass. Returns 503 Service Unavailable if any check fails:
curl http://localhost:8080/readyz
{"status": "ok", "checks": {"database": "ok", "providers": "ok"}}
Failed check example:
{"status": "fail", "checks": {"database": "fail: connection refused", "providers": "ok"}}
Each individual check has a 5-second timeout.
Kubernetes / Docker configuration
# Kubernetes liveness probe
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
# Kubernetes readiness probe
readinessProbe:
httpGet:
path: /readyz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
# Docker Compose healthcheck
glyphoxa:
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8080/healthz"]
interval: 10s
timeout: 3s
retries: 5
Note: The release Docker image is based on distroless and does not include
curlorwget. For Docker healthchecks, use a sidecar or the Glyphoxa binary itself as a health probe client if needed. Kubernetes probes work natively via httpGet.
โ Production Checklist
Security hardening
- Use strong, unique PostgreSQL credentials (not the default
glyphoxa/glyphoxa) - Set
sslmode=requireorsslmode=verify-fullon the PostgreSQL DSN - Enable TLS on Glyphoxa or terminate TLS at a reverse proxy
- Store API keys and secrets outside the config file (environment variables or a secrets manager)
- Run the container as non-root (the distroless image uses
nonrootby default) - Restrict network access โ Ollama, Coqui TTS, and PostgreSQL should not be exposed publicly
- Set
providers.audio.options.dm_role_idto restrict privileged slash commands
Log configuration
- Set
server.log_leveltoinfofor production (usedebugonly for troubleshooting) - Forward logs to a centralised log aggregator (stdout-based; works with Docker logging drivers, Fluentd, Loki)
Monitoring
- Configure Prometheus to scrape the
/metricsendpoint - Set up Grafana dashboards (pre-built dashboards in
deployments/compose/grafana/) - Enable the
alphaCompose profile for built-in Prometheus + Grafana:docker compose --profile alpha up - Configure alerts for
/readyzfailures and high latency on voice pipeline stages - See observability.md for detailed metrics, tracing, and alerting setup
Backup & recovery
- Schedule regular
pg_dumpor configure WAL-based continuous archiving - Test restore procedures before going live
- Back up whisper model files and Ollama models (or document download steps for recovery)
Performance tuning
- Tune PostgreSQL
shared_buffers,work_mem, andeffective_cache_sizefor your workload - For large vector tables (>100k rows), tune the HNSW index parameters (
m,ef_construction) - If running Ollama on CPU, consider using a quantised model variant (e.g.,
llama3.2:q4_0) - Set Docker
shm_size: 256mor higher for PostgreSQL
๐ See Also
- getting-started.md โ developer setup and first run
- configuration.md โ full configuration reference
- observability.md โ metrics, tracing, and alerting
- troubleshooting.md โ common issues and solutions
- deployments/compose/README.md โ Docker Compose setup details