webhookElevenLabs TTS

ElevenLabs Text to Speech (TTS)

The ElevenLabs TTS API converts text into natural-sounding speech using ElevenLabs' advanced text-to-speech models. This endpoint provides high-quality voice synthesis with customizable voice selection, speech speed, and output formats.

Base URL: https://api.openmind.org

Authentication: OpenMind API key is required. Include the key in the x-api-key or Authorization header.

Endpoints Overview

Method
Endpoint
Description

POST

/elevenlabs/tts

Generate speech from text using ElevenLabs TTS

Generate Speech

Convert text to speech using the ElevenLabs TTS engine with customizable voice and output options.

Endpoint: POST /elevenlabs/tts

Request

curl -X POST https://api.openmind.org/elevenlabs/tts \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{
    "text": "Hello, this is a test of the ElevenLabs text to speech API."
  }'

Request Body

Field
Type
Required
Default
Description

text

string

Yes

-

The text to convert to speech

voice_id

string

No

JBFqnCBsd6RMkjVDRZzb

ElevenLabs voice ID for the desired voice

model_id

string

No

eleven_flash_v2_5

ElevenLabs model ID to use for synthesis

output_format

string

No

mp3_44100_128

Audio output format specification

speed

float

No

1.0

Speech speed multiplier (0.5 - 2.0)

elevenlabs_api_key

string

No

-

Optional ElevenLabs API key override

Response

Success (200 OK):

Response Fields

Field
Type
Description

response

string

Base64-encoded audio data ready for decoding and playback

format

string

Audio format of the returned data (e.g., "mp3_44100_128")

Error Responses:

The returned audio is base64-encoded. You must decode it before playback or saving to a file.

Usage Examples

Basic Text-to-Speech

Convert simple text to speech using default settings:

Custom Voice and Speed

Use a specific voice with faster speech rate:

Full Configuration

Customize all available parameters:

Save Audio to File

Generate speech and save directly to an MP3 file:

With Environment Variables

Store your configuration in environment variables for easier management:

Voice Configuration

Default Voice

The default voice ID is JBFqnCBsd6RMkjVDRZzb. This voice provides clear, natural-sounding English speech suitable for most applications.

Custom Voices

You can use any ElevenLabs voice ID by specifying it in the voice_id parameter. Visit the ElevenLabs Voice Libraryarrow-up-right to explore available voices.

Speed Control

The speed parameter accepts values between 0.5 (half speed) and 2.0 (double speed):

  • 0.5 - 50% slower (more deliberate)

  • 1.0 - Normal speed (default)

  • 1.5 - 50% faster

  • 2.0 - Double speed (maximum)

Output Formats

The default output format is mp3_44100_128, which provides high-quality audio at a reasonable file size. The format string indicates:

  • Codec: MP3

  • Sample Rate: 44,100 Hz

  • Bitrate: 128 kbps

Other formats may be supported depending on your ElevenLabs API configuration. Consult the ElevenLabs documentation for available format options.

Error Handling

All endpoints follow consistent error response patterns:

HTTP Status Codes

Code
Description

200

Success - Audio generated successfully

400

Bad Request - Missing required fields or invalid JSON

503

Service Unavailable - ElevenLabs API unavailable or not configured

500

Internal Server Error - Server-side processing error

Error Response Format

Common Error Scenarios

Missing Text Field:

API Key Not Configured: If the server-side ElevenLabs API key is not configured and you don't provide one in the request, you'll receive:

Connection Issues: If the service cannot reach the ElevenLabs API:

Best Practices

Audio Decoding

The API returns base64-encoded audio data. Always decode it before use:

Note the following best practices when using the ElevenLabs TTS API: - Audio responses are base64-encoded and must be decoded before playback - The ElevenLabs API key can be configured server-side or provided per-request - Default voice and model settings are optimized for English speech - Large text inputs may take longer to process

Last updated

Was this helpful?