Date: 2026-03-22 Status: Ready for implementation Based on: Deep analysis of disgo v0.19.3 internals, Discord API docs, Glyphoxa codebase
Executive Summary
The voice state proxy approach is fully feasible using disgoβs existing public API with no fork or patches required. The voice WebSocket and UDP connections are 100% independent from the main Discord gateway β the main gateway is only needed to send Opcode 4 (join/leave voice channel) and receive VOICE_STATE_UPDATE / VOICE_SERVER_UPDATE dispatch events.
1. How disgo Voice Actually Works (Source-Level Analysis)
1.1 The Voice Connection Lifecycle
From voice/conn.go, the normal flow through conn.Open() is:
conn.Open(ctx, channelID, selfMute, selfDeaf)
β
βββ voiceStateUpdateFunc(ctx, guildID, &channelID, selfMute, selfDeaf)
β βββ This is bot.Client.UpdateVoiceState() β sends Opcode 4 via main gateway
β
βββ blocks on c.openedChan until SessionDescription arrives
βββ SessionDescription = final step of voice WebSocket handshake
Two main gateway dispatch events must arrive (routed by bot/handlers/voice_handlers.go):
- VOICE_STATE_UPDATE β
conn.HandleVoiceStateUpdate(event):- Sets
state.SessionIDandstate.ChannelID(conn.go:173-195) - If ChannelID is nil: closes gateway, UDP, signals
closedChan
- Sets
- VOICE_SERVER_UPDATE β
conn.HandleVoiceServerUpdate(event):- Sets
state.Tokenandstate.Endpoint(conn.go:197-213) - Launches goroutine:
c.gateway.Open(ctx, c.state)β connects voice WebSocket
- Sets
1.2 The Voice Gateway (Separate from Main Gateway)
From voice/gateway.go, gateway.Open() connects to wss://{endpoint}?v=8:
gateway.open(ctx, state)
β
βββ Dials wss://{endpoint}?v=8
βββ Starts listen goroutine
β β
β βββ Receives OpcodeHello β starts heartbeat(), sends identify()
β β βββ Identify payload: {server_id, user_id, session_id, token, max_dave_protocol_version}
β β
β βββ Receives OpcodeReady β sets SSRC, signals readyChan
β β βββ conn.handleMessage() opens UDP: c.udp.Open(ctx, d.IP, d.Port, d.SSRC)
β β βββ conn.handleMessage() sends OpcodeSelectProtocol with our IP/port
β β
β βββ Receives OpcodeSessionDescription β sets secret key, signals openedChan
β β βββ Voice connection is now READY
β β
β βββ Handles all DAVE opcodes (21-31) β entirely on voice gateway
β
βββ Returns nil on success
Key insight: The voice Identify payload (gateway.go:416-438) uses:
GuildIDβstate.GuildID(from NewConn)UserIDβstate.UserID(from NewConn)SessionIDβstate.SessionID(from HandleVoiceStateUpdate)Tokenβstate.Token(from HandleVoiceServerUpdate)MaxDaveProtocolVersionβ from DAVE session
None of these require a main gateway connection on the machine running the voice connection.
1.3 DAVE Encryption
DAVE is handled entirely within the voice gateway and UDP connection:
daveSessionis created inNewConn()fromDaveSessionCreateconfig option (conn_config.go:13)- All DAVE opcodes (PrepareTransition, ExecuteTransition, PrepareEpoch, MLS*) are handled in
gateway.listen()(gateway.go:575-611) - DAVE encryption/decryption happens in
udpConnImpl.Write()/ReadPacket()viadaveSession.Encrypt()/Decrypt()(udp_conn.go:259-284, 295-396) - DAVE works without the main gateway β just pass
voice.WithConnDaveSessionCreateFunc(golibdave.NewSession)
1.4 Voice Heartbeat
Self-contained in gateway.heartbeat() (gateway.go:364-413):
- Gets interval from OpcodeHello
- Runs in its own goroutine with
heartbeatCancelcontext - Reconnects on missed ACK
- No main gateway involvement
1.5 What NewConn Needs (and Doesnβt Need)
From voice/conn.go:63-85:
func NewConn(guildID snowflake.ID, userID snowflake.ID,
voiceStateUpdateFunc StateUpdateFunc,
removeConnFunc func(),
opts ...ConnConfigOpt) Conn
guildIDβ guild snowflakeuserIDβ botβs user snowflakevoiceStateUpdateFuncβ sends Opcode 4 via main gateway. For proxy: no-op functionremoveConnFuncβ cleanup callbackoptsβ config options (DAVE, logging, event handlers)
Does NOT need: bot.Client, gateway.Gateway, Discord token, HTTP client, cache
2. Data Flow in the Proxy Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gateway Pod β
β β
β ββ Main Discord Gateway (wss://gateway.discord.gg) βββββββ β
β β β β
β β 1. /start slash command arrives β β
β β 2. Gateway sends Opcode 4 (join voice channel) β β
β β 3. Discord replies with two dispatch events: β β
β β β’ VOICE_STATE_UPDATE β session_id β β
β β β’ VOICE_SERVER_UPDATE β token, endpoint β β
β β 4. captureVoiceCredentials() collects all three β β
β β 5. Gateway stays connected for slash commands β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β VoiceCredentials{session_id, token, endpoint, bot_user_id} β
β β β
βββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ
β gRPC StartSession (includes voice credentials)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Worker Pod (NO main Discord gateway connection) β
β β
β ββ VoiceProxyPlatform ββββββββββββββββββββββββββββββββββββ β
β β β β
β β 1. voice.NewConn(guildID, botUserID, noopFunc, ...) β β
β β 2. conn.HandleVoiceStateUpdate({SessionID, ChannelID}) β β
β β 3. conn.HandleVoiceServerUpdate({Token, Endpoint}) β β
β β ββ Triggers goroutine: gateway.Open(state) β β
β β ββ Connects wss://{endpoint}?v=8 β β
β β ββ Identify β Ready β UDP β SelectProtocol β β
β β ββ SessionDescription β openedChan signaled β β
β β 4. Voice ready! Audio flows directly: β β
β β Worker βUDPβ Discord Voice Server β β
β β β β
β β Pipeline: VAD β STT β LLM β TTS β Mixer β UDP β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
2.1 Mid-Session Voice Server Migration
Discord can migrate voice servers mid-session, sending a new VOICE_SERVER_UPDATE:
Discord β Gateway: VOICE_SERVER_UPDATE {new_token, new_endpoint, guild_id}
Gateway β Worker: UpdateVoiceServer gRPC {session_id, new_token, new_endpoint}
Worker: conn.HandleVoiceServerUpdate({Token: new_token, Endpoint: new_endpoint})
ββ voice gateway reconnects to new endpoint automatically
2.2 Session Teardown
Gateway: Receives /stop slash command
Gateway β Worker: StopSession gRPC {session_id}
Worker: conn.Close(ctx) β closes voice gateway + UDP
(voiceStateUpdateFunc is no-op, so no Opcode 4 sent)
Gateway: client.UpdateVoiceState(ctx, guildID, nil, false, false)
ββ Sends Opcode 4 with channelID=nil β bot leaves voice
3. Exact Implementation Plan
3.1 Extend StartSessionRequest (contract.go + proto)
internal/gateway/contract.go β add voice credential fields:
type StartSessionRequest struct {
SessionID string
TenantID string
CampaignID string
GuildID string
ChannelID string
LicenseTier string
NPCConfigs []NPCConfigMsg
BotToken string
// Voice proxy credentials (populated by gateway in distributed mode).
// When set, the worker connects directly to the Discord voice server
// without opening its own bot gateway connection.
VoiceSessionID string // from VOICE_STATE_UPDATE
VoiceToken string // from VOICE_SERVER_UPDATE
VoiceEndpoint string // from VOICE_SERVER_UPDATE
BotUserID string // bot's user snowflake (for voice.NewConn)
}
proto/glyphoxa/v1/session.proto β add fields to proto message:
message StartSessionRequest {
// ... existing fields 1-8 ...
string voice_session_id = 9; // Discord voice session ID
string voice_token = 10; // Discord voice server token
string voice_endpoint = 11; // Discord voice server endpoint
string bot_user_id = 12; // Bot's Discord user snowflake
}
3.2 Add UpdateVoiceServer RPC (proto + contract)
proto/glyphoxa/v1/session.proto:
message UpdateVoiceServerRequest {
string session_id = 1;
string token = 2;
string endpoint = 3;
}
message UpdateVoiceServerResponse {}
service SessionWorkerService {
// ... existing RPCs ...
rpc UpdateVoiceServer(UpdateVoiceServerRequest) returns (UpdateVoiceServerResponse);
}
internal/gateway/contract.go β extend WorkerClient:
type WorkerClient interface {
StartSession(ctx context.Context, req StartSessionRequest) error
StopSession(ctx context.Context, sessionID string) error
GetStatus(ctx context.Context) ([]SessionStatus, error)
UpdateVoiceServer(ctx context.Context, sessionID, token, endpoint string) error
}
3.3 Voice Credential Capture on Gateway (sessionctrl.go)
New method on GatewaySessionController:
// captureVoiceCredentials joins the voice channel via the gateway bot and
// captures the voice server credentials (session_id, token, endpoint) from
// the resulting VOICE_STATE_UPDATE and VOICE_SERVER_UPDATE dispatch events.
func (gc *GatewaySessionController) captureVoiceCredentials(
ctx context.Context, guildID, channelID string,
) (sessionID, token, endpoint, botUserID string, err error) {
gID, _ := snowflake.Parse(guildID)
chID, _ := snowflake.Parse(channelID)
type creds struct {
sessionID string
token string
endpoint string
}
credsCh := make(chan creds, 1)
var (
mu sync.Mutex
c creds
gotState bool
gotServer bool
)
// Temporary event listeners β removed after capture.
stateListener := bot.NewListenerFunc(func(e *events.GuildVoiceStateUpdate) {
if e.GuildID != gID || e.UserID != gc.gwBot.Client().ID() {
return
}
mu.Lock()
defer mu.Unlock()
c.sessionID = e.SessionID
gotState = true
if gotServer {
select {
case credsCh <- c:
default:
}
}
})
serverListener := bot.NewListenerFunc(func(e *events.VoiceServerUpdate) {
if e.GuildID != gID || e.Endpoint == nil {
return
}
mu.Lock()
defer mu.Unlock()
c.token = e.Token
c.endpoint = *e.Endpoint
gotServer = true
if gotState {
select {
case credsCh <- c:
default:
}
}
})
gc.gwBot.Client().AddEventListeners(stateListener, serverListener)
defer gc.gwBot.Client().RemoveEventListeners(stateListener, serverListener)
// Send Opcode 4 to join voice channel.
if err := gc.gwBot.Client().UpdateVoiceState(ctx, gID, &chID, false, false); err != nil {
return "", "", "", "", fmt.Errorf("send voice state update: %w", err)
}
select {
case vc := <-credsCh:
return vc.sessionID, vc.token, vc.endpoint,
gc.gwBot.Client().ID().String(), nil
case <-ctx.Done():
return "", "", "", "", fmt.Errorf("capture voice credentials: %w", ctx.Err())
}
}
Ordering guarantee: VOICE_STATE_UPDATE is dispatched by Discord before VOICE_SERVER_UPDATE (the state update creates the voice session, the server update assigns a voice server). Both may arrive in either order from disgoβs event dispatch perspective, but captureVoiceCredentials() handles both orderings with the gotState/gotServer flags.
3.4 Modify GatewaySessionController.Start()
Replace the suspend/resume dance with credential capture:
func (gc *GatewaySessionController) Start(ctx context.Context, req SessionStartRequest) error {
// ... existing validation + ValidateAndCreate ...
if gc.dispatcher != nil {
// Capture voice credentials BEFORE dispatching to worker.
// The gateway bot joins voice and stays connected for slash commands.
voiceCtx, voiceCancel := context.WithTimeout(ctx, 10*time.Second)
defer voiceCancel()
vsID, vToken, vEndpoint, botUserID, err := gc.captureVoiceCredentials(
voiceCtx, req.GuildID, req.ChannelID)
if err != nil {
_ = gc.orch.Transition(ctx, sessionID, SessionEnded, err.Error())
return fmt.Errorf("gateway: capture voice credentials: %w", err)
}
// Register ongoing listener for mid-session voice server changes.
gc.registerVoiceServerForwarder(sessionID, req.GuildID)
startReq := StartSessionRequest{
SessionID: sessionID,
TenantID: gc.tenantID,
CampaignID: gc.campaignID,
GuildID: req.GuildID,
ChannelID: req.ChannelID,
LicenseTier: gc.tier.String(),
BotToken: gc.botToken,
NPCConfigs: gc.npcConfigs,
VoiceSessionID: vsID,
VoiceToken: vToken,
VoiceEndpoint: vEndpoint,
BotUserID: botUserID,
}
// NOTE: No SuspendGateway() call! Gateway stays connected.
starter := func(callCtx context.Context, addr string) error {
// ... same as before ...
}
result, dispErr := gc.dispatcher.Dispatch(ctx, sessionID, gc.tenantID, starter)
if dispErr != nil {
// Leave voice on dispatch failure.
_ = gc.gwBot.Client().UpdateVoiceState(ctx, gID, nil, false, false)
gc.unregisterVoiceServerForwarder(sessionID)
// ... existing error handling (NO ResumeGateway needed) ...
}
// ...
}
// ...
}
3.5 Modify GatewaySessionController.Stop()
Remove ResumeGateway, add voice leave:
func (gc *GatewaySessionController) Stop(ctx context.Context, sessionID string) error {
// ... existing dispatcher.Stop() + orch.Transition() ...
// Leave the voice channel (send Opcode 4 with channelID=nil).
gc.mu.Lock()
for guildID, sid := range gc.active {
if sid == sessionID {
delete(gc.active, guildID)
gID, _ := snowflake.Parse(guildID)
if gc.gwBot != nil {
_ = gc.gwBot.Client().UpdateVoiceState(ctx, gID, nil, false, false)
}
break
}
}
gc.mu.Unlock()
// Clean up the voice server forwarder.
gc.unregisterVoiceServerForwarder(sessionID)
// NOTE: No ResumeGateway() β gateway never suspended!
return nil
}
3.6 Voice Server Forwarder (Mid-Session Reconnection)
New methods on GatewaySessionController:
// voiceForwarders tracks active VOICE_SERVER_UPDATE listeners per session.
// Added to GatewaySessionController struct:
// voiceForwarders map[string]func() // sessionID -> unregister func
// voiceForwardersMu sync.Mutex
// activeWorkers map[string]WorkerClient // sessionID -> worker client
func (gc *GatewaySessionController) registerVoiceServerForwarder(sessionID, guildID string) {
gID, _ := snowflake.Parse(guildID)
listener := bot.NewListenerFunc(func(e *events.VoiceServerUpdate) {
if e.GuildID != gID || e.Endpoint == nil {
return
}
gc.mu.Lock()
worker := gc.activeWorkers[sessionID]
gc.mu.Unlock()
if worker != nil {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := worker.UpdateVoiceServer(ctx, sessionID, e.Token, *e.Endpoint); err != nil {
slog.Error("gateway: failed to forward voice server update",
"session_id", sessionID, "err", err)
}
}
})
gc.gwBot.Client().AddEventListeners(listener)
gc.voiceForwardersMu.Lock()
gc.voiceForwarders[sessionID] = func() {
gc.gwBot.Client().RemoveEventListeners(listener)
}
gc.voiceForwardersMu.Unlock()
}
3.7 New VoiceProxyPlatform (pkg/audio/discord/voice_proxy.go)
package discord
import (
"context"
"fmt"
"log/slog"
"sync"
"github.com/MrWong99/glyphoxa/pkg/audio"
botgateway "github.com/disgoorg/disgo/gateway"
"github.com/disgoorg/disgo/voice"
"github.com/disgoorg/snowflake/v2"
)
var _ audio.Platform = (*VoiceProxyPlatform)(nil)
// VoiceProxyPlatform connects to a Discord voice server using pre-captured
// credentials (session_id, token, endpoint) from the gateway pod. The worker
// does NOT need its own Discord gateway connection.
type VoiceProxyPlatform struct {
conn voice.Conn
guildID snowflake.ID
botUserID snowflake.ID
readyCh chan struct{}
closeOnce sync.Once
}
// NewVoiceProxyPlatform creates a voice platform that connects using
// pre-captured credentials rather than its own Discord gateway.
func NewVoiceProxyPlatform(
guildIDStr, botUserIDStr string,
opts ...voice.ConnConfigOpt,
) (*VoiceProxyPlatform, error) {
guildID, err := snowflake.Parse(guildIDStr)
if err != nil {
return nil, fmt.Errorf("discord: parse guild ID %q: %w", guildIDStr, err)
}
botUserID, err := snowflake.Parse(botUserIDStr)
if err != nil {
return nil, fmt.Errorf("discord: parse bot user ID %q: %w", botUserIDStr, err)
}
vp := &VoiceProxyPlatform{
guildID: guildID,
botUserID: botUserID,
readyCh: make(chan struct{}, 1),
}
// No-op: the gateway pod handles Opcode 4 (join/leave voice channel).
noopStateUpdate := func(ctx context.Context, guildID snowflake.ID,
channelID *snowflake.ID, selfMute, selfDeaf bool) error {
return nil
}
allOpts := append([]voice.ConnConfigOpt{
voice.WithConnEventHandlerFunc(func(_ voice.Gateway, op voice.Opcode,
_ int, data voice.GatewayMessageData) {
if _, ok := data.(voice.GatewayMessageDataSessionDescription); ok {
select {
case vp.readyCh <- struct{}{}:
default:
}
}
}),
}, opts...)
vp.conn = voice.NewConn(guildID, botUserID, noopStateUpdate, func() {}, allOpts...)
return vp, nil
}
// Connect feeds pre-captured voice credentials into the connection, triggering
// the voice WebSocket + UDP handshake. The ctx governs the setup phase only.
func (vp *VoiceProxyPlatform) Connect(
ctx context.Context,
channelIDStr, voiceSessionID, voiceToken, voiceEndpoint string,
) (audio.Connection, error) {
channelID, err := snowflake.Parse(channelIDStr)
if err != nil {
return nil, fmt.Errorf("discord: parse channel ID: %w", err)
}
slog.Info("discord: voice proxy connecting",
"guild_id", vp.guildID,
"channel_id", channelID,
"endpoint", voiceEndpoint,
)
// Feed the credentials that the gateway captured.
// Order matters: HandleVoiceStateUpdate sets SessionID,
// HandleVoiceServerUpdate triggers gateway.Open() which needs SessionID.
vp.conn.HandleVoiceStateUpdate(botgateway.EventVoiceStateUpdate{
VoiceState: discord.VoiceState{
GuildID: vp.guildID,
ChannelID: &channelID,
UserID: vp.botUserID,
SessionID: voiceSessionID,
},
})
vp.conn.HandleVoiceServerUpdate(botgateway.EventVoiceServerUpdate{
Token: voiceToken,
GuildID: vp.guildID,
Endpoint: &voiceEndpoint,
})
// Wait for the voice WebSocket handshake to complete.
select {
case <-vp.readyCh:
slog.Info("discord: voice proxy connected", "guild_id", vp.guildID)
return newConnection(vp.conn, vp.guildID), nil
case <-ctx.Done():
vp.conn.Close(ctx)
return nil, fmt.Errorf("discord: voice proxy connect: %w", ctx.Err())
}
}
// UpdateVoiceServer handles mid-session voice server changes. Discord sends
// a new VOICE_SERVER_UPDATE when migrating voice servers. The gateway
// forwards this to the worker, which calls this method.
func (vp *VoiceProxyPlatform) UpdateVoiceServer(token, endpoint string) {
vp.conn.HandleVoiceServerUpdate(botgateway.EventVoiceServerUpdate{
Token: token,
GuildID: vp.guildID,
Endpoint: &endpoint,
})
}
// Close tears down the voice connection. It is safe to call more than once.
func (vp *VoiceProxyPlatform) Close() error {
vp.closeOnce.Do(func() {
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
vp.conn.Close(ctx)
slog.Info("discord: voice proxy closed", "guild_id", vp.guildID)
})
return nil
}
Note on imports: The Connect method references discord.VoiceState from github.com/disgoorg/disgo/discord. The actual import will be:
import discodiscord "github.com/disgoorg/disgo/discord"
And use discodiscord.VoiceState{...}.
3.8 Update workerFactory.CreateRuntime() (cmd/glyphoxa/worker_factory.go)
Replace step 3 (Discord voice connection):
// ββ 3. Discord voice connection ββββββββββββββββββββββββββββββββββββββββββ
if req.BotToken == "" {
// ... existing error handling ...
}
var platform interface{ Close() error }
var conn audio.Connection
if req.VoiceSessionID != "" && req.VoiceToken != "" && req.VoiceEndpoint != "" {
// Distributed mode with voice proxy: use pre-captured credentials.
proxyPlatform, err := discord.NewVoiceProxyPlatform(
req.GuildID, req.BotUserID,
voice.WithConnDaveSessionCreateFunc(golibdave.NewSession),
)
if err != nil {
if storeCloser != nil { _ = storeCloser() }
return nil, fmt.Errorf("worker: create voice proxy platform: %w", err)
}
conn, err = proxyPlatform.Connect(sessionCtx,
req.ChannelID, req.VoiceSessionID, req.VoiceToken, req.VoiceEndpoint)
if err != nil {
_ = proxyPlatform.Close()
if storeCloser != nil { _ = storeCloser() }
return nil, fmt.Errorf("worker: voice proxy connect to %s: %w", req.ChannelID, err)
}
platform = proxyPlatform
// Store the proxy platform for mid-session voice server updates.
// The WorkerHandler will call proxyPlatform.UpdateVoiceServer() when
// it receives an UpdateVoiceServer gRPC call.
} else {
// Full mode: open own gateway (existing code).
voicePlatform, err := discord.NewVoiceOnlyPlatform(sessionCtx, req.BotToken, req.GuildID,
discord.WithVoiceManagerOpts(voice.WithDaveSessionCreateFunc(golibdave.NewSession)),
)
if err != nil {
if storeCloser != nil { _ = storeCloser() }
return nil, fmt.Errorf("worker: create voice platform: %w", err)
}
conn, err = voicePlatform.Connect(sessionCtx, req.ChannelID)
if err != nil {
_ = voicePlatform.Close()
if storeCloser != nil { _ = storeCloser() }
return nil, fmt.Errorf("worker: connect to voice channel %s: %w", req.ChannelID, err)
}
platform = voicePlatform
}
3.9 Update gRPC Transport (grpctransport/client.go and server.go)
Client β add UpdateVoiceServer + pass voice credentials in StartSession:
func (c *Client) StartSession(ctx context.Context, req gateway.StartSessionRequest) error {
// ... existing NPC config mapping ...
return c.breaker.Execute(func() error {
_, err := c.client.StartSession(ctx, &pb.StartSessionRequest{
// ... existing fields ...
VoiceSessionId: req.VoiceSessionID,
VoiceToken: req.VoiceToken,
VoiceEndpoint: req.VoiceEndpoint,
BotUserId: req.BotUserID,
})
return err
})
}
func (c *Client) UpdateVoiceServer(ctx context.Context, sessionID, token, endpoint string) error {
return c.breaker.Execute(func() error {
_, err := c.client.UpdateVoiceServer(ctx, &pb.UpdateVoiceServerRequest{
SessionId: sessionID,
Token: token,
Endpoint: endpoint,
})
return err
})
}
Server β add handler for UpdateVoiceServer:
func (s *Server) UpdateVoiceServer(ctx context.Context, req *pb.UpdateVoiceServerRequest) (*pb.UpdateVoiceServerResponse, error) {
if err := s.handler.UpdateVoiceServer(ctx, req.GetSessionId(), req.GetToken(), req.GetEndpoint()); err != nil {
return nil, status.Errorf(codes.Internal, "update voice server: %v", err)
}
return &pb.UpdateVoiceServerResponse{}, nil
}
3.10 Update WorkerHandler for Voice Server Updates
The session.WorkerHandler needs a method to route UpdateVoiceServer to the right sessionβs proxy platform. This requires the VoiceProxyPlatform to be accessible from the runtime.
Add to session.Runtime:
type Runtime struct {
// ... existing fields ...
voiceProxy *discord.VoiceProxyPlatform // nil in full mode
}
func (r *Runtime) UpdateVoiceServer(token, endpoint string) {
if r.voiceProxy != nil {
r.voiceProxy.UpdateVoiceServer(token, endpoint)
}
}
Add to session.WorkerHandler:
func (h *WorkerHandler) UpdateVoiceServer(ctx context.Context, sessionID, token, endpoint string) error {
h.mu.RLock()
rt, ok := h.sessions[sessionID]
h.mu.RUnlock()
if !ok {
return fmt.Errorf("session %s not found", sessionID)
}
rt.UpdateVoiceServer(token, endpoint)
return nil
}
4. Potential Blockers & Mitigations
4.1 godave: Key Ratchet Race Condition (Open Issue)
Issue: βFix key ratchet race condition during DAVE epoch transitionsβ on disgoorg/godave.
Impact: Could cause audio drops during DAVE epoch transitions, regardless of architecture (proxy or direct).
Mitigation: This is a pre-existing issue not specific to the proxy approach. Monitor the godave repo for fixes. If it becomes critical, we can temporarily disable DAVE with voice.WithConnDaveSessionCreateFunc(godave.NewNoopSession).
4.2 godave: βfailed to read packetβ (Open Issue)
Impact: Another pre-existing DAVE reliability issue.
Mitigation: Same as above β not proxy-specific.
4.3 disgo: βData race in gateway implementationβ (Open Issue)
Impact: Affects the main gateway, not the voice gateway. Our gateway pod would be affected regardless of voice architecture.
Mitigation: Monitor the issue. The voice proxy actually reduces exposure since the worker doesnβt run the main gateway.
4.4 Gateway bot shows as βin voiceβ while worker handles audio
Impact: Cosmetic only. The gateway bot appears to be in the voice channel because it sent the Opcode 4 join. The worker connects to the voice WebSocket/UDP directly.
Mitigation: Not a problem β this is actually correct behavior from Discordβs perspective. The bot IS in the voice channel; the proxy architecture just moves the audio processing to a different machine.
4.5 External voice state changes
Scenario: Someone disconnects the bot from voice externally (admin kicks, etc.).
Flow: Discord sends VOICE_STATE_UPDATE with channelID=nil to the main gateway. The gateway receives this and should notify the worker to stop.
Implementation: The gateway registers a listener for VOICE_STATE_UPDATE where the bot user is removed from voice. When detected, it stops the session:
// In captureVoiceCredentials or registerVoiceServerForwarder:
disconnectListener := bot.NewListenerFunc(func(e *events.GuildVoiceStateUpdate) {
if e.GuildID != gID || e.UserID != gc.gwBot.Client().ID() {
return
}
if e.ChannelID == nil {
// Bot was disconnected from voice externally.
slog.Info("gateway: bot disconnected from voice externally",
"session_id", sessionID, "guild_id", guildID)
go gc.Stop(context.Background(), sessionID)
}
})
4.6 connImpl.Close() no-op for Opcode 4
Issue: connImpl.Close() calls voiceStateUpdateFunc(ctx, guildID, nil, false, false) to send Opcode 4 (leave voice). With our no-op function, this does nothing.
Impact: None β the gateway handles leaving voice in Stop(). The no-op is intentional.
Caveat: If the worker crashes without the gateway calling Stop(), the gateway bot will remain in the voice channel. The gateway should detect worker crash (via heartbeat timeout) and leave voice.
5. DAVE Protocol Details
5.1 DAVE Handshake Flow (All on Voice Gateway)
Voice Gateway Connect
β Identify (with max_dave_protocol_version=1)
β Ready (SSRC, IP, Port, Modes)
β DavePrepareTransition (transition_id, protocol_version)
β DaveMLSKeyPackage
β DaveMLSExternalSenderPackage
β DaveMLSProposals
β DaveMLSPrepareCommitTransition (transition_id, commit_message)
β DaveMLSCommitWelcome
β DaveMLSWelcome (transition_id, welcome_message)
β DaveTransitionReady (transition_id)
β DaveExecuteTransition (transition_id)
β SessionDescription (with dave_protocol_version)
All of this happens within voice/gateway.go:listen() and the godave.Session interface. The main Discord gateway is not involved at any point.
5.2 DAVE Mandatory Deadline
Per Discord API docs: βWe will only support E2EE calls starting on March 1st, 2026 for all audio and video conversations.β This is already past, so DAVE is mandatory. The proxy approach supports DAVE fully β just pass voice.WithConnDaveSessionCreateFunc(golibdave.NewSession) to NewVoiceProxyPlatform.
6. What to Delete After Migration
GatewayBot.SuspendGateway()β no longer calledGatewayBot.ResumeGateway()β no longer called- Suspend/resume logic in
sessionctrl.goβ replaced by credential capture VoiceOnlyPlatformβ keep for--mode=fullbackward compatibility, but itβs no longer used in distributed mode
7. Testing Strategy
7.1 Unit Tests
VoiceProxyPlatform:TestVoiceProxyPlatform_Connect_Successβ mock voice.Conn, verify HandleVoiceStateUpdate and HandleVoiceServerUpdate called with correct args, simulate SessionDescription eventTestVoiceProxyPlatform_Connect_ContextTimeoutβ verify cleanup on timeoutTestVoiceProxyPlatform_UpdateVoiceServerβ verify HandleVoiceServerUpdate forwardedTestVoiceProxyPlatform_Close_Idempotentβ verify double-close is safe
captureVoiceCredentials:TestCaptureVoiceCredentials_BothEventsβ simulate both events arrivingTestCaptureVoiceCredentials_Timeoutβ context cancellationTestCaptureVoiceCredentials_OrderIndependentβ server update before state update
GatewaySessionController.Start():TestStart_DistributedMode_CapturesCredentialsβ verify no SuspendGateway calledTestStart_FailedCapture_TransitionsToEndedβ verify cleanup
7.2 Integration Test
Manual test with a real Discord bot:
- Start gateway with
--mode=gateway - Start worker with
--mode=worker - Use
/startcommand β verify bot joins voice, NPCs respond - Verify gateway still handles slash commands while voice is active
- Use
/stopβ verify bot leaves voice - Test: start session, then have admin move bot out of voice β verify cleanup
7.3 Load Test
Start multiple concurrent sessions across different guilds to verify:
- Each session gets its own voice credentials
- Voice server forwarders donβt leak
- Cleanup is complete on stop
8. Implementation Order
| Step | File(s) | Estimated Effort |
|---|---|---|
| 1 | proto/glyphoxa/v1/session.proto + make proto | Small |
| 2 | internal/gateway/contract.go β extend types | Small |
| 3 | pkg/audio/discord/voice_proxy.go β new file | Medium |
| 4 | pkg/audio/discord/voice_proxy_test.go β tests | Medium |
| 5 | internal/gateway/sessionctrl.go β captureVoiceCredentials + modify Start/Stop | Medium |
| 6 | internal/gateway/sessionctrl_test.go β tests | Medium |
| 7 | internal/gateway/grpctransport/client.go β pass voice creds, add UpdateVoiceServer | Small |
| 8 | internal/gateway/grpctransport/server.go β add UpdateVoiceServer handler | Small |
| 9 | cmd/glyphoxa/worker_factory.go β use VoiceProxyPlatform when creds present | Small |
| 10 | internal/session/runtime.go β add voiceProxy field + UpdateVoiceServer method | Small |
| 11 | internal/session/worker_handler.go β route UpdateVoiceServer | Small |
| 12 | Remove SuspendGateway/ResumeGateway calls | Small |
| 13 | Manual integration test | - |
9. Key Disgo API Surface Used
| API | File | Purpose |
|---|---|---|
voice.NewConn(guildID, userID, stateUpdateFunc, removeFunc, opts...) | voice/conn.go:63 | Create voice connection without bot.Client |
conn.HandleVoiceStateUpdate(event) | voice/conn.go:173 | Feed session_id from gateway |
conn.HandleVoiceServerUpdate(event) | voice/conn.go:197 | Feed token+endpoint, triggers voice gateway connect |
voice.WithConnDaveSessionCreateFunc(fn) | voice/conn_config.go:106 | Enable DAVE E2EE |
voice.WithConnEventHandlerFunc(fn) | voice/conn_config.go:99 | Listen for SessionDescription |
gateway.EventVoiceStateUpdate | gateway/gateway_events.go:796 | Struct for state update event |
gateway.EventVoiceServerUpdate | gateway/gateway_events.go:804 | Struct for server update event |
bot.Client.UpdateVoiceState(ctx, guildID, channelID, mute, deaf) | bot/client.go:103 | Send Opcode 4 via main gateway |
bot.Client.ID() | bot/client.go:53 | Get botβs user snowflake |
bot.Client.AddEventListeners(listeners...) | bot package | Register event listeners |
bot.Client.RemoveEventListeners(listeners...) | bot package | Unregister event listeners |
All of these are exported public APIs β no internal/unexported access needed.