fix: Barge-in, Mute Interruption, and Transcript Truncation
Overview
Three related bugs share a root cause β the mixer has no feedback path to upstream components when playback is interrupted:
- Barge-in doesnβt stop NPCs reliably β
audio_pipeline.go:208callsmixer.Interrupt(PlayerBargeIn)but nevermixer.BargeIn(speakerID), so theOnBargeIncallback never fires. - Muting doesnβt stop current speech β
MuteAgent/MuteAllonly set a boolean flag; they donβt interrupt the mixer. - Transcript records full text β the cascade engine emits the complete LLM response regardless of whether audio was cut short. The agent also records
resp.Textina.messagesbefore audio even starts playing.
Brainstorm: docs/brainstorms/2026-03-15-barge-in-fix-brainstorm.md
Problem Statement
When a player talks over an NPC or a GM mutes an NPC mid-sentence, the NPC should stop talking immediately and the transcript should reflect only what was actually heard. Currently none of this works β audio may continue playing, and transcripts always contain the full generated text.
Proposed Solution
A coordinated fix across four layers: mixer interface, audio segment metadata, cascade engine feedback, and mute command wiring.
Key Architecture Decision: Engine-Side Text Tracking
The SpecFlow analysis identified a critical mismatch: the cascade engine produces one AudioSegment per NPC turn β all sentences stream through a single Audio <-chan []byte channel. The mixer has no concept of sentence boundaries in the audio stream.
Chosen approach: Track committed text in the cascade engine, not the mixer. The engine already knows sentence boundaries (it sends sentences to TTS one at a time via textCh). On interrupt, the engine snapshots its committed text and appends ....
This avoids refactoring the TTS streaming pipeline (one stream per turn) into multiple streams per sentence, which would be a much larger change for marginal benefit.
Data flow on interrupt:
Player speaks β VAD β mixer.BargeIn(speakerID)
βββ stops audio (closes cancelPlaying)
βββ clears queue (drains queued segments)
βββ fires OnBargeIn callback
βββ signals cascade engine via cancel channel
βββ engine snapshots committedText
βββ emits truncated transcript: committedText + "..."
βββ agent updates a.messages with truncated text
Scoping Decision: S2S Engine
The S2S engine (internal/engine/s2s/) has a separate interrupt path via SessionHandle.Interrupt(). Fixing it requires different plumbing (the S2S provider handles barge-in natively). Deferred to a follow-up issue to keep this change focused on the cascade path, which is the primary engine used in production.
Technical Approach
Phase 1: Mixer Interface β Add BargeIn and NPC-Scoped Interrupt
Problem: BargeIn(speakerID) exists on PriorityMixer but not on the audio.Mixer interface. Also, Interrupt(DMOverride) is NPC-agnostic β muting NPC-A while NPC-B is speaking would stop NPC-B.
Changes:
pkg/audio/mixer.go
- Add
BargeIn(speakerID string)to theMixerinterface. Semantics: interrupt current + clear queue + fireOnBargeIncallback. - Add
InterruptNPC(npcID string, reason InterruptReason)to theMixerinterface. Semantics: only interrupt if the currently playing segmentβsNPCIDmatches; remove queued segments with matchingNPCID. No-op if a different NPC is playing.
type Mixer interface {
Enqueue(segment *AudioSegment, priority int)
Interrupt(reason InterruptReason)
BargeIn(speakerID string) // NEW
InterruptNPC(npcID string, reason InterruptReason) // NEW
OnBargeIn(handler func(speakerID string))
SetGap(d time.Duration)
}
pkg/audio/mixer/mixer.go
InterruptNPC: checkm.playing.NPCID == npcIDbefore interrupting. Remove matching entries from queue.
pkg/audio/mock/mock.go
- Add
BargeInandInterruptNPCto the mock with call recording.
internal/app/audio_pipeline.go
- Line 208: change
p.mixer.Interrupt(audio.PlayerBargeIn)βp.mixer.BargeIn(speakerID).
Tests:
pkg/audio/mixer/mixer_test.go: testInterruptNPConly interrupts matching NPC; testBargeInfires handler + clears queue.
Phase 2: Playback Completion Callback on AudioSegment
Problem: The mixer knows whether a segment finished naturally or was interrupted, but has no way to communicate this upstream.
Changes:
pkg/audio/mixer.go β AudioSegment
Add a completion callback:
type AudioSegment struct {
NPCID string
Audio <-chan []byte
SampleRate int
Channels int
Priority int
streamErr atomic.Pointer[error]
// OnDone is called when the mixer finishes with this segment.
// interrupted is true if playback was cut short (barge-in or mute),
// false if the segment played to completion.
// Called on the mixer's dispatch goroutine; must not block.
// May be nil.
OnDone func(interrupted bool) // NEW
}
pkg/audio/mixer/mixer.go
- In
play(): callseg.OnDone(false)on natural completion (channel closed), callseg.OnDone(true)on cancel/done. - In
interruptLocked(): for queued segments being drained, callseg.OnDone(true)before draining. - Guard all calls with nil check.
Tests:
mixer_test.go: verifyOnDone(false)on natural playback,OnDone(true)on interrupt,OnDone(true)on queued segment drain.
Phase 3: Cascade Engine β Deferred Transcript with Truncation
Problem: The cascade engine emits the full generated text to the transcript channel unconditionally, and the agent records resp.Text in a.messages before audio plays.
Changes:
internal/engine/engine.go β Response
Add a field for the engine to communicate the final (possibly truncated) text:
type Response struct {
Text string
Audio <-chan []byte
SampleRate int
Channels int
ToolCalls []llm.ToolCall
streamErr atomic.Pointer[error]
// FinalText is closed when the definitive response text is available.
// Read FinalTextValue after <-FinalText returns.
// If playback completes naturally, FinalTextValue == Text.
// If interrupted, FinalTextValue is the truncated text with "..." suffix.
FinalText chan struct{} // NEW
FinalTextValue string // NEW β read after FinalText closes
}
internal/engine/cascade/cascade.go
- Track committed text atomically. Each time a sentence is sent to
textCh(TTS input), append it to acommittedTextaccumulator. - Create a cancel channel per Process call. Set
seg.OnDoneto signal this channel. - Single-model path: Set
OnDoneon the AudioSegment. On natural completion, setFinalTextValue = Textand closeFinalText. On interrupt, setFinalTextValue = committedText + "..."and closeFinalText. - Dual-model path: Same pattern. The background goroutine tracks
committedTextas it sends sentences totextCh. TheOnDonecallback snapshotscommittedText. - Transcript emission: Move transcript emission to happen after
OnDonefires (either path). EmitFinalTextValueinstead of the full generated text.
internal/agent/npc.go
- Defer message history recording. Instead of appending
resp.Textimmediately at line 304, launch a goroutine that waits on<-resp.FinalTextand then appendsresp.FinalTextValue. - This goroutine must hold
a.mubriefly to append β use a targeted lock acquisition, not holding the lock across the wait. - The goroutine must also respect context cancellation to avoid leaking.
Tests:
cascade_test.go: test that transcript emits truncated text whenOnDone(true)fires mid-generation. Test that transcript emits full text on natural completion. Test committed text tracking across sentence boundaries.npc_test.go: test thata.messagescontains truncated text after interrupt.
Phase 4: Mute Commands Interrupt Audio
Problem: MuteAgent/MuteAll only set flags; they donβt stop current speech.
Changes:
internal/agent/orchestrator/orchestrator.go
Add a mixer reference to the orchestrator:
type Orchestrator struct {
// ... existing fields ...
mixer audio.Mixer // NEW β may be nil (text-only mode)
}
- New option:
WithMixer(m audio.Mixer) Option. MuteAgent(id): after settingentry.muted = true, callo.mixer.InterruptNPC(id, audio.DMOverride)if mixer is non-nil.MuteAll(): after setting all flags, callo.mixer.Interrupt(audio.DMOverride)if mixer is non-nil.
Wiring β internal/app/ or internal/session/
- Pass the mixer when constructing the orchestrator (wherever
orchestrator.New()is called).
Tests:
orchestrator_test.go: testMuteAgentcallsInterruptNPCon the mock mixer. TestMuteAllcallsInterrupt(DMOverride). Test nil mixer is safe.
Phase 5: Barge-In Cancels In-Flight Generation
Problem: When barge-in fires, the cascade engineβs background goroutine (strong model + TTS) keeps running, wasting compute.
Changes:
internal/agent/npc.go
- Store a
context.CancelFuncfor the in-flight Process call. - Register the mixerβs
OnBargeInhandler to cancel this context. - On barge-in: cancel the generation context β strong model stream closes β TTS stops β audio channel closes.
type liveAgent struct {
// ... existing fields ...
genCancel context.CancelFunc // NEW β cancels in-flight generation
genMu sync.Mutex // NEW β guards genCancel
}
- In
HandleUtterance: create a child context with cancel, store cancel ina.genCancel. - Register
mixer.OnBargeIninNewAgentto calla.genCancel(). - Clear
genCancelafter Process completes naturally.
Tests:
npc_test.go: test that barge-in cancels in-flight context.
Acceptance Criteria
Functional Requirements
- [x]Talking over an NPC (VAD fires) stops the NPCβs audio immediately
- [x]
/npc mute <name>stops that NPCβs current speech immediately - [x]
/npc muteallstops all NPC speech immediately - [x]Muting NPC-A while NPC-B is speaking does NOT interrupt NPC-B
- [x]Interrupted NPC transcript shows truncated text with
...suffix - [x]Naturally completed NPC speech transcript shows full text (no
...) - [x]Agent message history (
a.messages) reflects truncated text after interrupt - [x]Barge-in cancels in-flight LLM/TTS generation (no wasted compute)
- [x]Queued NPC segments that were never played are drained and their
OnDone(true)fires
Non-Functional Requirements
- [x]All new code has
t.Parallel()on tests and subtests - [x]Race detector passes:
go test -race -count=1 ./... - [x]No locks held during blocking I/O (especially the
OnDonecallback) - [x]Compile-time interface assertions for updated
Mixerinterface - [x]Mock implementations updated for new interface methods
Quality Gates
- [x]
make checkpasses (fmt + vet + test) - [x]
make lintpasses - [x]Existing mixer tests still pass
- [x]Existing cascade engine tests still pass
- [x]Existing orchestrator tests still pass
Dependencies & Prerequisites
- No external dependencies. All changes are internal to the codebase.
- The
audio.Mixerinterface change affects:pkg/audio/mixer/mixer.go,pkg/audio/mock/mock.go, and any test doubles.
Risk Analysis & Mitigation
| Risk | Impact | Mitigation |
|---|---|---|
OnDone callback blocks mixer dispatch goroutine | Audio playback stalls | Document βmust not blockβ contract; keep callback lightweight (signal a channel) |
| Race between natural completion and interrupt | Double OnDone call | Use sync.Once wrapper in mixerβs play loop |
Agent a.messages append races with HandleUtterance | Corrupted history | Use targeted mutex acquisition in the deferred goroutine |
| Deferred transcript emission leaks goroutine | Resource leak | Respect context cancellation in the wait goroutine |
InterruptNPC iterates queue under lock | Latency spike on large queues | Queue is typically small (<16 segments); acceptable |
References & Research
Internal References
- Brainstorm:
docs/brainstorms/2026-03-15-barge-in-fix-brainstorm.md - Mixer interface:
pkg/audio/mixer.go:91-119 - PriorityMixer:
pkg/audio/mixer/mixer.go(BargeIn at line 176, Interrupt at line 151) - Audio pipeline barge-in:
internal/app/audio_pipeline.go:208 - Cascade transcript emission:
internal/engine/cascade/cascade.go:287-291 - Agent message recording:
internal/agent/npc.go:302-312 - Mute commands:
internal/discord/commands/npc.go:136-258 - Orchestrator mute:
internal/agent/orchestrator/orchestrator.go:175-343
Follow-Up Work
- S2S engine barge-in: wire
SessionHandle.Interrupt()on barge-in, implement transcript truncation for S2S providers - Queued-but-never-played NPC responses: decide whether to mark as βcancelledβ in transcript or silently drop