Skip to main content
The OM1 Video Processor is a Docker-based solution that enables real-time video streaming, face recognition, and audio capture for OM1 robots. It’s designed to work seamlessly with your robot’s hardware while providing intelligent audio andvideo processing capabilities.

Key Features

  • Real-time Face Recognition: Identify and label faces in the video stream with bounding boxes
  • Audio-Visual Streaming: Simultaneous video and audio capture and streaming
  • Hardware Acceleration: Optimized for NVIDIA Jetson platforms with CUDA support
  • Easy Integration: Simple Docker-based deployment
  • Performance Monitoring: Built-in FPS counter and system metrics
  • Configurable devices: Supports multiple camera and microphone configurations
  • Direct RTSP streaming: Streams directly to OpenMind’s API without intermediate relay

What is RTSP?

  • RTSP (Real Time Streaming Protocol) is a network control protocol designed to manage multimedia streaming sessions.
  • It functions as a “remote control” for media servers, establishing and controlling one or more time-synchronized streams of continuous media such as audio and video.

Key characteristics:

  • Control Protocol: RTSP manages streaming sessions but does not typically transport the media data itself
  • Session Management: Establishes, maintains, and terminates streaming sessions
  • Time Synchronization: Coordinates multiple media streams (audio/video) to play in sync
  • Network Remote Control: Provides VCR-like commands (play, pause, stop, seek) for media playback over a network

Architecture Diagram

Architecture Diagram

System Requirements

  • Docker and Docker Compose installed
  • NVIDIA Jetson device with JetPack 6.1 (or compatible NVIDIA GPU system)
  • Access to a video capture device - via USB camera or built in webcam (default: /dev/video0)
  • Microphone (for audio streaming - default: default_mic_aec)
  • OpenMind API credentials
  • Linux system with V4L2 and ALSA support

Installation

  1. Clone the repository:
    git clone https://github.com/OpenMind/OM1-video-processor.git
    cd OM1-video-processor
    
  2. Set environment variables: Get OM_API_KEY and OM_API_KEY_ID from OpenMind portal. Once you generate a new API key, copy the key and paste it in the OM_API_KEY environment variable. To get your API key ID, copy the 16 digit id from your API key as highlighted in the image below: OM_API_KEY_ID
    export OM_API_KEY_ID="your_api_key_id"
    export OM_API_KEY="your_api_key"
    
  3. Configure your devices (Optional):
    export CAMERA_INDEX="/dev/video6"    # Default camera device
    export MICROPHONE_INDEX="default_mic_aec"     # Default microphone device
    
  4. Ensure devices are accessible
    # Check available video devices
    ls /dev/video*
    
    # List video devices with v4l2
    v4l2-ctl --list-devices
    
    # Check available audio devices
    pactl list sources short
    pactl list sinks short
    

Quick Start

  1. Start the streaming service:
    docker-compose up -d
    
  2. The system will automatically:
    • Initialize the camera and microphone
    • Start face recognition processing
    • Stream to the configured RTSP endpoint

Configuration

Environment Variables

VariableDescriptionDefault
CAMERA_DEVICECamera device path/dev/video0
AUDIO_DEVICEAudio input devicehw:3,0
RTSP_URLRTSP server endpointrtsp://your-rtsp-server/stream
FPSTarget frames per second30

Configuration

The system is configured the following components:

Docker Compose Configuration

The docker-compose.yml file configures:
  • NVIDIA runtime: GPU acceleration for face recognition processing
  • Network mode: Host networking for direct device access
  • Privileged mode: Required for camera and audio device access
  • Device mapping: Camera (default /dev/video0) and audio (/dev/snd) devices
  • Environment variables: OpenMind API credentials, device indices, and PulseAudio configuration
  • Shared memory: 4GB allocated for efficient video processing

Processing Pipeline

The streaming pipeline consists of two processes managed by Supervisor:
  • MediaMTX: RTSP server for stream routing and management
  • OM Face Recognition Stream: Main processing service that:
    • Captures video from the specified camera device
    • Performs real-time face recognition with GPU acceleration
    • Overlays bounding boxes, names, and FPS information
    • Captures audio from the specified microphone
    • Streams directly to OpenMind’s RTSP ingestion endpoint

Environment Variables

We need to configure the following environment variables:
  • OM_API_KEY_ID: Your OpenMind API key ID (required)
  • OM_API_KEY: Your OpenMind API key (required)
  • CAMERA_INDEX: Camera device path (default: /dev/video0)
  • MICROPHONE_INDEX: Microphone device identifier (default: default_mic_aec)

How It Works

  1. Video Capture: Captures video from the specified camera device
  2. Face Processing: Uses AI to detect and recognize faces in real-time
  3. Audio Capture: Simultaneously records audio from the microphone
  4. Streaming: Combines video and audio into an RTSP stream
  5. Monitoring: Provides real-time performance metrics

Ports

The following ports are used internally:
  • 8554: RTSP (MediaMTX local server)
  • 1935: RTMP (MediaMTX local server)
  • 8889: HLS (MediaMTX local server)
  • 8189: WebRTC (MediaMTX local server)

Development

Build the image:

docker-compose build

Customize the processing settings:

To modify the om_face_recog_stream parameters, edit the command in video_processor/supervisord.conf.
vim video_processor/supervisord.conf
Modified parameters -
--device: Camera device path
--rtsp-mic-device: Microphone device identifier
--draw-boxes: Enable/disable bounding box overlays
--draw-names: Enable/disable name overlays
--show-fps: Enable/disable FPS display
--no-window: Run in headless mode
--remote-rtsp: OpenMind RTSP ingestion endpoint
# Install dependencies locally
uv sync --all-extras

# Run the face recognition stream locally
uv run om_face_recog_stream --help

Troubleshooting

Common Issues

  • No video feed:
    • Verify camera device permissions
    • Check if the camera is being used by another application
  • Audio not working:
    • Verify the correct audio device is specified
    • Check PulseAudio configuration
  • Performance issues:
    • Ensure hardware acceleration is properly configured
    • Reduce resolution or FPS if needed
I