Key Data Flows:

Audio Flow:

  • Microphone → Audio Manager → WebSocket Manager → OpenAI
  • OpenAI → WebSocket Manager → Audio Manager → Speakers

Event Flow:

  • OpenAI → WebSocket Manager → Event Handler → Appropriate Component

Function Call Flow:

  • OpenAI → WebSocket Manager → Event Handler → Function Call Handler → Custom Tools
  • Custom Tools → Function Call Handler → WebSocket Manager → OpenAI

State Management:

  • All components update and read from State Manager
  • State Manager ensures consistency across components
graph TB
    subgraph Client Application
        Main[Main Application]
    end

    subgraph RealtimeAPI Core
        WSM[WebSocket Manager]
        EH[Event Handler]
        AM[Audio Manager]
        FCH[Function Call Handler]
        State[State Manager]
    end

    subgraph External Services
        OpenAI[OpenAI WebSocket API]
        Audio[Audio I/O Hardware]
        Tools[Custom Tools/Functions]
    end

     WebSocket Flow
    WSM <-->|WebSocket Messages| OpenAI
    
     Audio Flow
    AM <-->|Audio Stream| Audio
    AM -->|Audio Data| WSM

     State Updates
    State -.->|State Updates| WSM
    State -.->|State Updates| AM
    State -.->|State Updates| FCH

    classDef core fill:#f9f,stroke:#333,stroke-width:2px
    classDef external fill:#bbf,stroke:#333,stroke-width:2px
    class WSM,EH,AM,FCH,State core
    class OpenAI,Audio,Tools external

Basics of web socket for Python

How web socket is used for audio in Python?