System Requirements

Operating System

  • macOS 12.0+
  • Linux (Ubuntu 20.04+)

Hardware

  • Memory (RAM): 8GB
  • Storage: 16GB
  • Camera, speakers, microphone etc as robots sensors

Software

Prerequisites

Ensure you have the following installed on your machine:

Package Manager

Mac
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Linux
sudo apt-get update && sudo apt-get upgrade -y

UV (A Rust and Python package manager)

Mac
brew install uv
Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

PortAudio Library

This will let you speak to the LLM and it will generate voice outputs. On Mac and Linux, you need portaudio.

Mac
brew install portaudio
Linux
sudo apt-get update
sudo apt-get install portaudio19-dev python-all-dev

ffmpeg

FFmpeg is the leading multimedia framework, able to decode, encode, transcode, mux, demux, stream, filter and play pretty much anything that humans and machines have created.

Mac
brew install ffmpeg
Linux
sudo apt-get update
sudo apt-get install ffmpeg

Installation and Setup

  1. Clone the repository

Run the following commands to clone the repository and set up the environment:

clone repo
git clone https://github.com/OpenmindAGI/OM1.git
cd OM1
git submodule update --init
uv venv
  1. Set the configuration variables

Locate the config folder and add your Openmind API key in /config/spot.json. If you do not already have one, you can obtain a free access key at https://portal.openmind.org/. Note: Using the placeholder key openmind-free will generate errors.

set api key
# /config/spot.json
...
"api_key": "om1_live_e4252f1cf005af..."
...
  1. Run the Spot Agent

Run the following command to start the Spot Agent:

uv run src/run.py spot
  1. WebSim to check input and output

Go to http://localhost:8000 to see real time logs along with the input and output in the terminal. For easy debugging, add --debug to see additional logging information.

Congratulations! - you just got started with OM1 and can now explore its capabilities.

Some necessary packages will be installed during this process, the first time you run the command. This might take a little time. Please be patient. Then you will see the system come to life:

Object Detector INPUT
// START
You see a person in front of you. You also see a laptop.
// END

AVAILABLE ACTIONS:
command: move
    A movement to be performed by the agent.
    Effect: Allows the agent to move.
    Arguments: Allowed values: 'stand still', 'sit', 'dance', 'shake paw', 'walk', 'walk back', 'run', 'jump', 'wag tail'

command: speak
    Words to be spoken by the agent.
    Effect: Allows the agent to speak.
    Arguments: <class 'str'>

command: emotion
    A facial expression to be performed by the agent.
    Effect: Performs a given facial expression.
    Arguments: Allowed values: 'cry', 'smile', 'frown', 'think', 'joy'

What will you do? Command: 

INFO:httpx:HTTP Request: POST https://api.openmind.org/api/core/openai/chat/completions "HTTP/1.1 200 OK"
INFO:root:OpenAI LLM output: commands=[Command(type='move', value='wag tail'), Command(type='speak', value="Hi there! I see you and I'm excited!"), Command(type='emotion', value='joy')]

Explanation of the log data

The response above provides insight into how the spot agent processes its environment and decides on its next actions.

  • First, it detects a person using vision.
  • Next, it decides on a friendly action (dancing and speaking).
  • Expresses emotions via facial displays.
  • Logs latency and processing times to monitor system performance.
  • Communicates with an external AI API for response generation.

Overall, the system follows a predefined behavior where spotting a person triggers joyful interactions, driven by the LLM-assisted decision-making process.