rectangle-betaOM1

v1.0.1-beta.2

What's included

Beta release for the Docker image openmindagi/om1, which provides the full setup of OM1 service in one go without having the user to install dependencies separately.

Features

  • A huge performance improvement has been added.

  • Standardized the codebase to only support multimode configuration, removing the separate single-mode structure and related folders. Single-mode setups are still supported and are now automatically converted to multimode via the new runtime infrastructure.

  • Added support for monitoring and reporting the charging status of the Unitree Go2 robot.

  • Standardized generic sensor input types with robot-specific variants. New separate background processes added for Unitree G1, Go2 and Turtlebot4.

  • Docker now accepts OM_COMMAND to switch config.

  • Refactored ApproachingPerson background plugin to use Zenoh for person-approaching events.

  • Updated the ElevenLabs TTS integration to reduce latency by switching from JSON/base64 audio responses to a live audio streaming output, and changes the default ElevenLabs output format to PCM at 16kHz.

  • OM1 now supports Isaac Sim.

  • Improved test coverage across plugins.

  • Added support for LimX TRON

  • Ollama support added for local inference

  • Latest config version is now upgraded to v1.0.2

  • Documentation updates

    • We've updated the full autonomy documentation for G1 and Go2

    • Added documentation for Gazebo setup

    • Fixed typos and broken links across documentation

    • Refreshed docstrings throughout the codebase

  • Updated API endpoint documentation

  • Updated API pricing documentation and information regarding the new subscription plans

  • Introduced support for 'concurrent', 'sequential', and 'dependencies' action execution modes in orchestrator and configuration schemas

  • Added greeting conversation mode and state management

  • Added local support for Koroko and Riva model

  • Added person following mode

  • Improved unit test coverage for provider and input plugins

  • Openrouter support for LLama and Anthropic: Added compatibility with OpenRouter API, enabling seamless access to more AI providers, including Meta’s LLaMA and Anthropic Claude models. This allows flexible model selection for natural language processing, reasoning, and control tasks depending on performance or cost preferences.

  • Support multiple modes: We now support 5 different modes with Unitree Go2 full autonomy. Welcome mode - Initial greeting and user information gathering Conversation - Focused conversation and social interaction mode Slam - Autonomous navigation and mapping mode Navigation - Autonomous navigation mode Guard - Patrol and security monitoring mode

  • Support face blurring and detection: The OpenMind Privacy System is a real-time, on-device face detection and blurring module designed to protect personal identity during video capture and streaming. It runs entirely on the Unitree Go2 robot’s edge device, requiring no cloud or network connectivity. All frame processing happens locally — raw frames never leave the device. Only the processed, blurred output is stored or streamed. The module operates offline and maintains low latency suitable for real-time applications

  • Support multiple RTSP inputs: The OpenMind RTSP Ingest Pipeline manages multiple RTSP inputs, supporting three camera feeds and one microphone input for synchronized streaming. The top camera feed is processed through the OpenMind face recognition module for detection, overlay, and FPS monitoring, while the microphone (default_mic_aec) handles audio capture and streaming. All processed video and audio streams are ingested through the OpenMind API RTSP endpoint, enabling multi-source real-time data flow within the system.

  • Support echo cancellation and remote video streaming: Use our portal to remotely display your face in our dog backpack and talk to people directly.

  • Support navigation and mapping: The Navigation and Mapping enables OM1 to move intelligently within its environment using two core modes: Navigation Mode and Slam Mode. In Slam Mode, the robot explores its surroundings autonomously, using onboard sensors to build and continuously update internal maps for spatial awareness and future navigation. This mode is typically used during initial setup or when operating in new or changing environments. In Navigation Mode, the robot travels between predefined points within an existing mapped area, leveraging maps generated in Slam Mode for path planning, obstacle avoidance, and safe movement to target locations.

  • Refactor AI control messaging: We now use function calls for taking actions. Here's our new flow - Actions -> Function calls params -> LLM -> Function calls -> Json Structure (CortexOutputModel).

  • Support Nvidia Thor: We now support Nvidia Thor for Unitree Go2 full autonomy.

  • Added release notes to our docs: The official documentation now includes a dedicated Release Notes section, making it easier to track feature updates, improvements, and bug fixes over time. This also improves transparency for developers and users integrating new releases.

  • Introducing Lifecycle Each operational mode in OM1 follows a defined lifecycle, representing the complete process from entry to exit of that mode. A mode lifecycle ensures predictable behavior, safe transitions, and consistent data handling across all system states.

  • Downgraded Python to 3.10 for better Jetson support.

  • Integrated Nav2 for state feedback and target publishing, with auto AI-mode disable after localization.

  • Zenoh configs/sessions moved to zenoh_msgs, now preferring local network before multicast.

  • Added avatar background server to communicate with the OM1-avatar

  • Improved avatar animation with thinking behavior and ASR response injection into prompts.

  • Added support for long range control of humanoids and quadrupeds using the TBS_TANGO2 radios.

  • Added sleep mode for ASR, if there's no voice input for 5 min, it goes to sleep.

  • Support for custom camera indices and enables both microphone and speaker functionality in Docker.

  • Multiple LLM provider integrations(OpenAI, Gemini, Deepseek, xAI).

  • GoogleASR model for speech to text.

  • Riva and Eleven Labs for TTS.

  • Preconfigured support for Unitree Go2, G1, TurtleBot, Ubtech Yanshee.

  • Simulator support with Gazebo for Go2.

  • Multi-arch support - AMD64 and ARM64.

Docker image

Setup the API key

For Bash: vim ~/.bashrc or ~/.bash_profile.

For Zsh: vim ~/.zshrc.

Add

Update the docker-compose file. Replace "unitree_go2_autonomy_advance" with the agent you want to run.

The OM1 service is provided as a Docker image for easy setup:

The docker image is also available at Docker Hubarrow-up-right.

For more technical details, please refer to the docsarrow-up-right.

Last updated

Was this helpful?