Overview
The integration test harness verifies the full Glyphoxa voice pipeline without a real voice platform (Discord, WebRTC, etc.). It uses a loopback audio connection that feeds pre-recorded PCM frames as input and captures all output frames for verification.
What it tests:
Audio in β VAD β STT β Orchestrator routing β NPC agent β Mixer β Audio out
What it doesnβt need:
- Discord bot token or voice channel
- Real STT/LLM/TTS API keys
- Running Kubernetes cluster
- Human in a voice channel
Architecture
Loopback Connection (pkg/audio/loopback/)
A test implementation of audio.Connection that:
- Streams pre-loaded PCM frames as participant input (simulating players speaking)
- Captures all output frames written by the mixer (NPC responses)
- Supports mid-session participant joins via
AddParticipant() - Provides
WaitForOutput(n, timeout)for synchronization
Mock Provider Stack
| Component | Mock | Behaviour |
|---|---|---|
| VAD | sequenceVADSession | Follows a scripted event sequence (silence β speech β silence) |
| STT | echoSTTProvider | Returns a fixed transcript after receiving audio |
| NPC Agent | respondingNPCAgent | Records calls + enqueues test audio to mixer |
| Mixer | mixer.PriorityMixer | Real mixer β not mocked |
| Connection | loopback.Connection | Real connection β loopback variant |
The real mixer and real audio pipeline (audioPipeline) are used. Only external dependencies (VAD model, STT service, LLM, TTS) are mocked.
Running the Tests
# Run all loopback integration tests
go test -race -count=1 -v -run 'TestPipelineLoopback' ./internal/app/
# Run a specific test
go test -race -count=1 -v -run 'TestPipelineLoopback_EndToEnd' ./internal/app/
# Run the loopback connection unit tests
go test -race -count=1 -v ./pkg/audio/loopback/...
# Run as part of the full suite
make test
Test Cases
TestPipelineLoopback_EndToEnd
Full pipeline test with a single participant. Verifies:
- 20 PCM frames are fed through the pipeline
- VAD detects a speech segment (frames 2-15)
- STT produces a transcript
- Orchestrator routes to the correct NPC
- NPC agent receives the transcript with correct speaker ID
- Mixer produces 5 output audio frames
TestPipelineLoopback_MultipleParticipants
Two simultaneous participants. Verifies:
- Both participants are processed independently
- Barge-in behaviour works correctly (one speaker may interrupt the other)
- At least one participantβs transcript reaches the NPC
TestPipelineLoopback_NoSpeech
All-silence input. Verifies:
- No STT sessions are opened
- No transcripts are routed
- No output audio is produced
TestPipelineLoopback_MidSessionJoin
Participant joins after the pipeline has started. Verifies:
OnParticipantChangecallback triggers worker creation- Late-joining participantβs audio is processed normally
- NPC responds to the newcomer
Extending the Test Harness
Adding Real Providers
To test with real STT/LLM/TTS (requires API keys), replace the mock providers:
// Example: use real energy VAD instead of sequence mock
vadEng, _ := energy.New()
// Example: use real Deepgram STT
sttProv, _ := deepgram.New(deepgram.Config{APIKey: os.Getenv("DEEPGRAM_API_KEY")})
The loopback connection works with any provider stack β just swap the mocks.
Custom Audio Input
Generate test PCM frames from a WAV file:
# Convert speech.wav to 16kHz mono PCM (little-endian int16)
ffmpeg -i speech.wav -f s16le -acodec pcm_s16le -ar 16000 -ac 1 speech.pcm
Then load in Go:
pcmData, _ := os.ReadFile("testdata/speech.pcm")
frameSize := 16000 * 30 / 1000 * 2 // 960 bytes per 30ms frame
var frames []audio.AudioFrame
for i := 0; i+frameSize <= len(pcmData); i += frameSize {
frames = append(frames, audio.AudioFrame{
Data: pcmData[i : i+frameSize],
SampleRate: 16000,
Channels: 1,
})
}
Testing with Real Audio + Real VAD
For a true smoke test with real speech recognition:
- Record a short WAV file with a test phrase
- Convert to PCM as above
- Use
energy.New()orsilero.New()for VAD - Use
deepgram.New()for STT - Use the
echoSTTProviderpattern but with the real provider - Verify the transcript matches the expected phrase
File Layout
pkg/audio/loopback/
βββ connection.go # Loopback Connection implementation
βββ connection_test.go # Unit tests for the connection
internal/app/
βββ pipeline_loopback_test.go # Integration tests (4 test cases)
Design Decision: Why Option B (Loopback)?
We evaluated four approaches:
| Option | Approach | Verdict |
|---|---|---|
| A | Mock at gRPC boundary | Too coupled to gateway internals |
| B | Loopback in full mode | Most practical β tests full pipeline, no external deps |
| C | Second Discord bot | Realistic but complex, needs extra bot token |
| D | Gateway debug endpoint | Requires modifying production code |
Option B was chosen because:
- The
audio.Connectioninterface is clean and mockable - The real mixer and pipeline wiring are tested (not just mocks)
- No network, no Discord, no API keys needed for CI
- Can be extended to use real providers when API keys are available