This section provides various examples for integrating and using multiple cloud-based AI endpoints, such as OpenAI, DeepSeek, and others, for voice input processing, text-to-speech (TTS) and emotion detection. Whether you need to convert spoken language into text (ASR) or generate natural-sounding speech from text, these examples will help you interact with different cloud providers seamlessly.

Voice to Text Processing with OpenAI

This example uses your default audio in (microphone) and your default audio output (speaker). Please test both your microphone and speaker in your system settings to make sure they are connected and working. On a Mac, the system may request permission to on your audio - Allow permissions.
uv run src/run.py conversation
Especially on Linux, such as on Ubuntu 20.04 on the Nvidia Orin, audio support can be marginal. Expect some audio inputs and outputs to not work correctly, or to advertise incorrect hardware capabilities, such as USB microphones that report zero input channels etc.

Enumerating your Audio

You can enumerate available audio via the test script in /system_hw_test:
python test_audio.py

Testing Audio

You can provide test sentences to speak by adding the MockInput to the config file:
{
    "type": "MockInput",
    "config": {
        "input_name": "Voice Input"
    }
}
Then connect to the ws (wscat -c ws://localhost:8765) and type in the words you want the system to speak. This is useful to debug audio out issues and related settings such as chunk values.