A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.
The ElevenLabs API is how an app or AI agent works with an ElevenLabs account: turning text into spoken audio, transcribing speech back into text, managing the voices in the account, and dubbing media into other languages. Access is granted through an API key, and a key can be restricted to specific product areas and given a credit cap so it reaches only what it was scoped to. Long jobs like asynchronous transcription can push a signed event to a registered endpoint when they finish, rather than requiring an app to poll.
How an app or AI agent connects to ElevenLabs determines what it can reach. There is a route for making calls with an API key, a route for receiving events when long jobs finish, and a first-party server that exposes ElevenLabs audio tools to agents, and each is governed by the key behind it and the access that key carries.
The REST API takes JSON or multipart request bodies and returns audio or JSON, served under a single v1 namespace at https://api.elevenlabs.io. A call authenticates with an API key passed in the xi-api-key header. Regional base URLs are offered for data residency.
A WebSocket interface streams text into speech and streams audio back with low latency at /v1/text-to-speech/{voice_id}/stream-input, and a realtime speech-to-text socket transcribes audio as it arrives. It authenticates with the same API key, or with a short-lived single-use token for client-side use.
ElevenLabs POSTs a signed payload to a registered HTTPS endpoint when a long job finishes, like an asynchronous transcription or a conversational agent call ending. The receiver verifies an HMAC signature against the webhook's secret to confirm the request came from ElevenLabs. A webhook that fails repeatedly is auto-disabled.
ElevenLabs publishes a first-party Model Context Protocol server (the elevenlabs/elevenlabs-mcp repository) that exposes ElevenLabs audio tools to MCP clients like Claude and Cursor. It runs locally and forwards calls to the ElevenLabs API, covering text to speech, voice search, transcription, and outbound agent calls. It authenticates with an ElevenLabs API key.
An API key is passed in the xi-api-key header on every request. A key can be named, restricted to specific product areas to limit which endpoints it can call, and given a monthly credit cap, so a leaked or shared key reaches only what it was scoped to. Enterprise accounts can additionally restrict a key to specific IP ranges in preview.
A single-use token is a short-lived credential minted server-side and handed to a client, so a browser or device can open a streaming connection without ever holding the long-lived API key. It is the recommended way to authenticate client-side use.
A workspace service account acts as a member with no access to any resources until it is added to a group or has resources shared with it. API keys created under a service account inherit that scoped access, which is enforced at the backend rather than only in the dashboard.
The ElevenLabs API is split into areas an agent can act on, like turning text into speech, transcribing audio, managing the voices in an account, dubbing media into other languages, and running conversational agents. Some areas only generate audio, while others change account state, like adding or deleting a voice.
Methods for turning text into spoken audio, including streaming and character-level timestamps.
Methods for transcribing audio and video into text, synchronously or asynchronously.
Methods for listing, reading, editing, and deleting the voices in an account.
Methods for designing a voice from a prompt and creating instant or professional voice clones.
Methods for changing a voice, generating sound effects, and isolating speech from background noise.
Methods for translating and voicing media into another language and checking the job's status.
Methods for listing, reading, and deleting previously generated audio.
Methods for listing the available models and their capabilities.
Methods for reading the account profile and its subscription, usage, and limits.
Methods for creating, listing, and reading conversational agents (Agents Platform).
Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.
| Method | Endpoint | What it does | Access | Permission | Version | |
|---|---|---|---|---|---|---|
Text to SpeechMethods for turning text into spoken audio, including streaming and character-level timestamps.4 | ||||||
| POST | /v1/text-to-speech/{voice_id} | Convert text into speech using a chosen voice and return the audio. | write | — | Current | |
Generates audio and spends credits; the key needs Text to Speech access if the key is restricted by product. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/text-to-speech/{voice_id}/stream | Convert text into speech and stream the audio back as it is generated. | write | — | Current | |
Same generation as convert, delivered as a stream for lower latency. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/text-to-speech/{voice_id}/with-timestamps | Convert text into speech and return character-level timing alongside the audio. | write | — | Current | |
Returns base64 audio plus alignment data for syncing text to audio. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/text-to-speech/{voice_id}/stream-with-timestamps | Stream generated speech together with character-level timing data. | write | — | Current | |
Combines streaming delivery with alignment data. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Speech to TextMethods for transcribing audio and video into text, synchronously or asynchronously.2 | ||||||
| POST | /v1/speech-to-text | Transcribe an audio or video file into text, synchronously or asynchronously via webhook. | write | — | Current | |
Setting webhook to true processes the job asynchronously and POSTs the result when done. Supports diarization and per-channel transcripts. Acts ontranscript Permission (capability)None required VersionAvailable since the API’s base version Webhook event speech_to_text_transcriptionRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/speech-to-text/{transcript_id} | Retrieve a previously created transcript by its ID. | read | — | Current | |
Read-only. Acts ontranscript Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
VoicesMethods for listing, reading, editing, and deleting the voices in an account.5 | ||||||
| GET | /v1/voices | List the voices available in the account. | read | — | Current | |
Read-only. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/voices/{voice_id} | Retrieve the metadata and settings for a single voice. | read | — | Current | |
Read-only. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/voices/{voice_id}/edit | Edit an existing voice, such as its name, description, or sample files. | write | — | Current | |
Changes a voice the whole account uses. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/voices/{voice_id} | Permanently delete a voice from the account. | write | — | Current | |
Irreversible; removes the voice for the whole account. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/voices/{voice_id}/settings | Retrieve the generation settings for a voice, like stability and similarity. | read | — | Current | |
Read-only. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Voice design & cloningMethods for designing a voice from a prompt and creating instant or professional voice clones.4 | ||||||
| POST | /v1/text-to-voice/design | Design a new voice from a text prompt and return previews to choose from. | write | — | Current | |
Returns previews with a generated voice ID; it does not yet add a voice to the account. Acts onvoice_preview Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/text-to-voice/create | Create a saved voice from a previously generated design preview. | write | — | Current | |
Adds a new voice to the account using a generated_voice_id from a design. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/voices/add/ivc | Create an instant voice clone from uploaded audio samples. | write | — | Current | |
Adds a cloned voice to the account; clones a real person's voice from samples. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/voices/add/pvc | Create a professional voice clone trained on a larger set of audio. | write | — | Current | |
Adds a higher-fidelity cloned voice; may be gated by plan and verification. Acts onvoice Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Audio toolsMethods for changing a voice, generating sound effects, and isolating speech from background noise.3 | ||||||
| POST | /v1/speech-to-speech/{voice_id} | Transform recorded audio so it is spoken in a different voice, keeping timing and delivery. | write | — | Current | |
Generates audio from an uploaded recording and spends credits. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/sound-generation | Generate a sound effect from a text description. | write | — | Current | |
Generates audio and spends credits; controls include duration and prompt influence. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| POST | /v1/audio-isolation | Isolate speech from background noise in an uploaded audio file. | write | — | Current | |
Generates cleaned audio and spends credits. Acts onaudio Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
DubbingMethods for translating and voicing media into another language and checking the job's status.2 | ||||||
| POST | /v1/dubbing | Dub an audio or video file into a target language. | write | — | Current | |
Starts a billed dubbing job; runs asynchronously and is polled for status. Acts ondubbing Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/dubbing/{dubbing_id} | Retrieve metadata about a dubbing project, including whether it is still in progress. | read | — | Current | |
Read-only; returns status, languages, and any error. Acts ondubbing Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
HistoryMethods for listing, reading, and deleting previously generated audio.3 | ||||||
| GET | /v1/history | List previously generated audio items in the account. | read | — | Current | |
Read-only. Acts onhistory_item Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/history/{history_item_id} | Retrieve a single generated history item by its ID. | read | — | Current | |
Read-only. Acts onhistory_item Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| DELETE | /v1/history/{history_item_id} | Permanently delete a generated history item. | write | — | Current | |
Irreversible. Acts onhistory_item Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ModelsMethods for listing the available models and their capabilities.1 | ||||||
| GET | /v1/models | List the available models and their capabilities and languages. | read | — | Current | |
Read-only. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
User & subscriptionMethods for reading the account profile and its subscription, usage, and limits.2 | ||||||
| GET | /v1/user | Get information about the authenticated account, including subscription status. | read | — | Current | |
Read-only; the response includes the account's API key, so it exposes that credential. Acts onuser Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/user/subscription | Get extended subscription details, including tier, character usage, and limits. | read | — | Current | |
Read-only; reports remaining credits and the reset date. Acts onsubscription Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
Conversational AI agentsMethods for creating, listing, and reading conversational agents (Agents Platform).3 | ||||||
| POST | /v1/convai/agents/create | Create a conversational AI agent from a configuration object. | write | — | Current | |
Provisions a live agent with its prompt, model, tools, and voice settings. Acts onagent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/convai/agents | List the conversational AI agents in the account and their metadata. | read | — | Current | |
Read-only. Acts onagent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
| GET | /v1/convai/agents/{agent_id} | Retrieve the configuration for a single conversational AI agent. | read | — | Current | |
Read-only; an active agent call can later POST a post-call transcription webhook. Acts onagent Permission (capability)None required VersionAvailable since the API’s base version Webhook event post_call_transcriptionRate limitStandard limits apply SourceOfficial documentation ↗ | ||||||
ElevenLabs can notify an app when a long-running job finishes, like an asynchronous transcription completing or a conversational agent call ending. It POSTs a signed payload describing what happened, so an integration learns about the result without polling.
| Event | What it signals | Triggered by |
|---|---|---|
speech_to_text_transcription | An asynchronous transcription job finished. The payload carries the completed transcript, so an app learns the result without polling. Sent only when a speech-to-text request set webhook to true. | /v1/speech-to-text |
post_call_transcription | A conversational agent call ended. The payload contains the full conversation data, including the transcript, analysis results, and metadata. Retries are supported for this event. | /v1/convai/agents/{agent_id} |
post_call_audio | A conversational agent call ended. The payload carries minimal data with base64-encoded audio of the full conversation. | /v1/convai/agents/{agent_id} |
ElevenLabs limits how many requests run at once by a concurrency ceiling set by the subscription plan, and meters generation against a monthly credit balance measured in characters for speech.
ElevenLabs caps how many requests run at the same time with a concurrency limit set by the subscription plan, rather than a fixed requests-per-second rate. The documented concurrency ceilings are 2 on Free, 3 on Starter, 5 on Creator, 10 on Pro, and 15 on Scale and Business, with custom limits on Enterprise. Going over returns HTTP 429: too_many_concurrent_requests means the plan ceiling was hit, while system_busy means temporary platform congestion. Separately, generation is metered against a monthly credit balance, not a request count.
List endpoints that can return many items, like voices and history, page with a page_size parameter and a next cursor, such as last_history_item_id on history, returned in the response when more results remain. Smaller list endpoints return the full set in one response.
Speech generation is billed by character, and the per-request text length depends on the model, for example Multilingual v2 handles up to roughly 10,000 characters of long-form input. Speech-to-text accepts uploaded audio or video files, and the monthly credit balance, not a single request, is the binding limit. Audio output is MP3 by default, with PCM, Opus, and µ-law available, and some higher-quality formats require a paid tier.
The status codes an agent should handle, and what to do about each.
| Status | Code | Meaning | What to do |
|---|---|---|---|
| 400 | validation_error / invalid_request | The request was bad: a parameter is missing or invalid. The detail.param field names the offending field. | Read detail.message and detail.param, fix the request, and resend. It is not retryable as-is. |
| 401 | authentication_error | No valid API key was provided, for example a missing or invalid xi-api-key header. | Confirm a valid API key is being sent, and rotate the key if it may be compromised. |
| 402 | payment_required | The account has insufficient credits to complete the request. | Top up credits or wait for the monthly reset, then retry. |
| 403 | authorization_error | The key or member lacks permission for this request, for example a key restricted away from this product area, or a feature the plan does not include. | Grant the product-area access on the key, or use a key with the needed permission. |
| 404 | not_found | The requested object does not exist or is not visible to this key, for example a voice or transcript ID that is wrong. | Verify the ID and confirm it belongs to this account. |
| 422 | validation_error | The request body failed schema validation. detail.param identifies the field. | Correct the field named in detail.param and resend. |
| 429 | rate_limit_error | Too many requests at once or a concurrency limit was hit. The body is either too_many_concurrent_requests, meaning the plan's concurrency ceiling, or system_busy, meaning temporary platform congestion. | For too_many_concurrent_requests, queue requests or upgrade the plan. For system_busy, retry with exponential backoff. |
| 500 | internal_error | An error on the ElevenLabs side, which can also surface as 503 service_unavailable. It is rare. | Retry with backoff, and contact support if it persists. |
ElevenLabs serves a single namespace and signals model and feature changes through dated release notes rather than a new namespace, and it deprecates older speech and transcription models on announced dates.
ElevenLabs serves a single v1 namespace and ships capability changes as new model IDs and dated release notes rather than minting a new namespace. The entries below are notable dated changes from the changelog.
A dated release with model deprecations and conversational agent additions.
A dated release adding audio isolation history endpoints and agent security scoping.
A dated release in the published changelog timeline.
Pin model IDs explicitly and move off deprecated models before their removal date.
ElevenLabs changelog ↗Bollard AI sits between a team's AI agents and ElevenLabs. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.