webhookLLM

Multi-Provider Large Language Model API

The OpenMind LLM API provides unified access to multiple leading large language model providers through a single, consistent interface. This endpoint enables chat completions across OpenAI, Anthropic (via OpenRouter), Google Gemini, X.AI, DeepSeek, NEAR.AI, and more.

Base URL: https://api.openmind.org

Authentication: Requires an OpenMind API key in the Authorization header as a Bearer token.

Endpoints Overview

Method
Endpoint
Description

POST

/api/core/{provider}/chat/completions

Send chat completion requests to the specified LLM provider

Supported Providers

OpenMind supports the following LLM providers:

Provider
Endpoint Path
Description

OpenAI

openai

GPT-4, GPT-5, and other OpenAI models

DeepSeek

deepseek

DeepSeek chat models

Google Gemini

gemini

Gemini Pro and Flash models

X.AI

xai

Grok models from X.AI

NEAR.AI

nearai

Qwen and other NEAR.AI hosted models

OpenRouter

openrouter

Multi-provider access including Anthropic Claude, Meta Llama

Supported Models

OpenAI Models

gpt-4o
gpt-4o-mini
gpt-4.1
gpt-4.1-mini
gpt-4.1-nano
gpt-5
gpt-5-mini
gpt-5-nano

DeepSeek Models

Google Gemini Models

X.AI Models

NEAR.AI Models

OpenRouter Models

Model names are validated using prefix matching. For example, "gpt-4o" will match "gpt-4o", "gpt-4o-2024-08-06", etc.

Chat Completions

Send a chat completion request to any supported LLM provider.

Endpoint: POST /api/core/{provider}/chat/completions

Path Parameters

Parameter
Type
Required
Description

provider

string

Yes

The LLM provider name (e.g., "openai", "gemini", "openrouter")

Request Headers

Header
Required
Description

Authorization

Yes

Bearer token with your OpenMind API key

Content-Type

Yes

Must be application/json

Accept

No

Recommended: application/json

Request Body

The request body follows the OpenAI Chat Completions API format:

Field
Type
Required
Description

model

string

Yes

Model identifier (must match supported models for the provider)

messages

array

Yes

Array of message objects with role and content

temperature

float

No

Sampling temperature (0.0 to 2.0)

max_tokens

integer

No

Maximum tokens to generate

top_p

float

No

Nucleus sampling parameter

stream

boolean

No

Whether to stream responses

frequency_penalty

float

No

Frequency penalty (-2.0 to 2.0)

presence_penalty

float

No

Presence penalty (-2.0 to 2.0)

Message Format

Basic Request Example

Response

Success (200 OK):

Response Fields

Field
Type
Description

id

string

Unique identifier for the completion

object

string

Object type, always "chat.completion"

created

integer

Unix timestamp of creation

model

string

Model used for the completion

choices

array

Array of completion choices

choices[].message

object

Generated message with role and content

choices[].finish_reason

string

Reason for completion ("stop", "length", etc.)

usage

object

Token usage statistics

Error Responses:

Usage Examples

OpenAI GPT-4

Anthropic Claude (via OpenRouter)

Google Gemini

DeepSeek

X.AI Grok

NEAR.AI Qwen

Multi-Turn Conversation

With Environment Variables

Advanced Parameters

Temperature Control

Control randomness in responses (0.0 = deterministic, 2.0 = very random):

Token Limits

Limit the maximum number of tokens in the response:

Top-P Sampling

Use nucleus sampling for controlled randomness:

Frequency and Presence Penalties

Reduce repetition in responses:

Model Selection Guide

When to Use Each Provider

OpenAI (GPT-4, GPT-5):

  • General-purpose tasks

  • Complex reasoning

  • Code generation

  • Creative writing

Anthropic Claude (via OpenRouter):

  • Long context understanding

  • Detailed analysis

  • Safety-critical applications

  • Nuanced conversations

Google Gemini:

  • Multimodal capabilities

  • Fast inference (Flash models)

  • Cost-effective solutions

  • Real-time applications

X.AI Grok:

  • Real-time information

  • Current events

  • Conversational AI

  • Research tasks

DeepSeek:

  • Code-focused tasks

  • Technical documentation

  • Algorithm design

  • Cost-efficient reasoning

NEAR.AI (Qwen):

  • Vision-language tasks

  • Multilingual support

  • Open-source model access

  • Specialized applications

Performance vs. Cost

Model Tier
Examples
Use Case

High Performance

gpt-5, claude-opus-4.1, gemini-3-pro

Complex reasoning, production applications

Balanced

gpt-4o, claude-sonnet-4.5, grok-4

General-purpose, most tasks

Fast/Economical

gpt-4o-mini, gemini-2.5-flash-lite, deepseek-chat

High-volume, simple tasks

Error Handling

HTTP Status Codes

Code
Description

200

Success - Completion generated successfully

400

Bad Request - Invalid JSON or malformed request

404

Not Found - Unsupported provider or model

503

Service Unavailable - Provider API unavailable or not configured

500

Internal Server Error - Server-side processing error

Error Response Format

All errors follow this format:

Common Errors

Invalid Provider:

Invalid Model:

Missing API Key:

Best Practices

API Key Management

  • Store API keys in environment variables, never in code

  • Rotate API keys regularly

  • Use separate keys for development and production

  • Monitor key usage through the OpenMind portal

Request Optimization

Efficient Message Design:

Token Management:

  • Set appropriate max_tokens to control costs

  • Use cheaper models for simple tasks

  • Monitor token usage in responses

  • Truncate conversation history when appropriate

Error Handling in Code

Python Example:

Performance Tips

  1. Choose the Right Model:

    • Use mini/flash models for simple tasks

    • Reserve premium models for complex reasoning

    • Test multiple providers for your specific use case

  2. Optimize Prompts:

    • Be specific and concise

    • Use system messages to set behavior

    • Provide examples for few-shot learning

  3. Control Token Usage:

    • Set max_tokens appropriately

    • Use shorter system prompts

    • Truncate long conversation histories

  4. Leverage Caching:

    • Cache responses for identical queries

    • Reuse common system prompts

    • Store frequent model outputs

Security Considerations

  • Never expose API keys in client-side code

  • Validate and sanitize user inputs

  • Implement rate limiting in your application

  • Monitor for unusual usage patterns

  • Use HTTPS for all requests

Cost Optimization

Model Selection Strategy

Token Usage Tips

  • Use max_tokens to cap response length

  • Implement conversation pruning for long chats

  • Monitor token usage via the usage field in responses

  • Consider streaming for real-time applications

Batch Processing

For multiple independent requests, process them in parallel:

Integration Examples

Python SDK

JavaScript/Node.js

Streaming Responses

Some providers support streaming responses. Set "stream": true in your request:

Streaming responses are sent as Server-Sent Events (SSE) with multiple data chunks.

Rate Limits

Rate limits vary by provider and your OpenMind subscription plan. Monitor your usage through:

  • Response headers (when provided by upstream providers)

  • OpenMind portal dashboard

  • API key usage reports

Additional Resources

Multi-Agent System

For advanced robotics applications, OpenMind also provides a multi-agent system that coordinates multiple LLMs for complex robotics tasks. This endpoint fuses sensor data and routes requests to specialized agents. For more information about the multi-agent robotics endpoint, please refer to the developing documentation.

Last updated

Was this helpful?