Everything an AI agent can do with the ElevenLabs API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints29
API versionv1
Last updated23 June 2026
Orientation

How the ElevenLabs API works.

The ElevenLabs API is how an app or AI agent works with an ElevenLabs account: turning text into spoken audio, transcribing speech back into text, managing the voices in the account, and dubbing media into other languages. Access is granted through an API key, and a key can be restricted to specific product areas and given a credit cap so it reaches only what it was scoped to. Long jobs like asynchronous transcription can push a signed event to a registered endpoint when they finish, rather than requiring an app to poll.

29Endpoints
10Capability groups
12Read
17Write
0Permissions
Authentication
ElevenLabs authenticates with an API key sent in the xi-api-key header, not OAuth. A key can be named, restricted to specific product areas so it only reaches certain endpoints, and given a monthly credit cap. For client-side use, a server mints a short-lived single-use token so the long-lived key never leaves the backend. The GET user endpoint returns the account's API key, so a read of it exposes that credential.
Permissions
Access is governed by what a key is scoped to rather than by a per-call OAuth scope on every request. A key restricted away from a product area, or a feature the plan does not include, returns HTTP 403 authorization_error. In a shared workspace, a service account starts with no access and is granted resources through groups, and that scoping is enforced at the backend, not only in the dashboard.
Limits
Two limits apply at once. A concurrency ceiling set by the plan caps how many requests run together, from 2 on Free up to 15 on Scale and Business, and exceeding it returns 429 too_many_concurrent_requests. Separately, generation spends a monthly credit balance measured in characters for speech, and running out returns 402 payment_required.
Models and versioning
The API serves a single v1 namespace, and capability changes ship as new model IDs rather than a new namespace, like eleven_flash_v2_5 for low-latency speech, eleven_multilingual_v2 for long-form, and scribe for transcription. Older models are deprecated on announced dates, so an integration pins a model ID and migrates before removal. The list models endpoint reports each model's capabilities and languages.
Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to ElevenLabs determines what it can reach. There is a route for making calls with an API key, a route for receiving events when long jobs finish, and a first-party server that exposes ElevenLabs audio tools to agents, and each is governed by the key behind it and the access that key carries.

Ways to connect

REST API

The REST API takes JSON or multipart request bodies and returns audio or JSON, served under a single v1 namespace at https://api.elevenlabs.io. A call authenticates with an API key passed in the xi-api-key header. Regional base URLs are offered for data residency.

Best forConnecting an app or AI agent to ElevenLabs.
Governed byThe API key and the product areas and credit cap it is restricted to.
Docs ↗

WebSocket streaming

A WebSocket interface streams text into speech and streams audio back with low latency at /v1/text-to-speech/{voice_id}/stream-input, and a realtime speech-to-text socket transcribes audio as it arrives. It authenticates with the same API key, or with a short-lived single-use token for client-side use.

Best forReal-time voice, like conversational agents and live captioning.
Governed byThe API key or single-use token behind the connection.
Docs ↗

Webhooks

ElevenLabs POSTs a signed payload to a registered HTTPS endpoint when a long job finishes, like an asynchronous transcription or a conversational agent call ending. The receiver verifies an HMAC signature against the webhook's secret to confirm the request came from ElevenLabs. A webhook that fails repeatedly is auto-disabled.

Best forReceiving the result of asynchronous jobs without polling.
Governed byThe HMAC signing secret on the webhook.
Docs ↗

MCP server

ElevenLabs publishes a first-party Model Context Protocol server (the elevenlabs/elevenlabs-mcp repository) that exposes ElevenLabs audio tools to MCP clients like Claude and Cursor. It runs locally and forwards calls to the ElevenLabs API, covering text to speech, voice search, transcription, and outbound agent calls. It authenticates with an ElevenLabs API key.

Best forConnecting an AI agent or LLM client to ElevenLabs through MCP.
Governed byThe API key the server is configured with, and the access that key carries.
Docs ↗
Authentication

API key

An API key is passed in the xi-api-key header on every request. A key can be named, restricted to specific product areas to limit which endpoints it can call, and given a monthly credit cap, so a leaked or shared key reaches only what it was scoped to. Enterprise accounts can additionally restrict a key to specific IP ranges in preview.

TokenAPI key (xi-api-key header)
Best forServer-side calls from an app or AI agent.
Docs ↗

Single-use token

A single-use token is a short-lived credential minted server-side and handed to a client, so a browser or device can open a streaming connection without ever holding the long-lived API key. It is the recommended way to authenticate client-side use.

TokenShort-lived single-use token
Best forClient-side and in-browser connections.
Docs ↗

Service account

A workspace service account acts as a member with no access to any resources until it is added to a group or has resources shared with it. API keys created under a service account inherit that scoped access, which is enforced at the backend rather than only in the dashboard.

TokenAPI key tied to a service account
Best forGranular, group-scoped access in a shared workspace.
Docs ↗
Capability map

What an AI agent can do in ElevenLabs.

The ElevenLabs API is split into areas an agent can act on, like turning text into speech, transcribing audio, managing the voices in an account, dubbing media into other languages, and running conversational agents. Some areas only generate audio, while others change account state, like adding or deleting a voice.

Text to Speech

4 endpoints

Methods for turning text into spoken audio, including streaming and character-level timestamps.

Each call spends credits against the account's monthly balance.
View endpoints

Speech to Text

2 endpoints

Methods for transcribing audio and video into text, synchronously or asynchronously.

A delete here permanently removes a stored transcript.
View endpoints

Voices

5 endpoints

Methods for listing, reading, editing, and deleting the voices in an account.

A write here changes or removes a voice the whole account shares.
View endpoints

Voice design & cloning

4 endpoints

Methods for designing a voice from a prompt and creating instant or professional voice clones.

A write here adds a new voice to the account and spends a voice slot.
View endpoints

Audio tools

3 endpoints

Methods for changing a voice, generating sound effects, and isolating speech from background noise.

Each call spends credits against the account's monthly balance.
View endpoints

Dubbing

2 endpoints

Methods for translating and voicing media into another language and checking the job's status.

A create here starts a billed job and produces new media.
View endpoints

History

3 endpoints

Methods for listing, reading, and deleting previously generated audio.

A delete here permanently removes a generated item.
View endpoints

Models

1 endpoint

Methods for listing the available models and their capabilities.

Read-only; lists models without changing anything.
View endpoints

User & subscription

2 endpoints

Methods for reading the account profile and its subscription, usage, and limits.

Read-only, but the response includes the account's API key and usage.
View endpoints

Conversational AI agents

3 endpoints

Methods for creating, listing, and reading conversational agents (Agents Platform).

A create here provisions a live agent that can place and take calls.
View endpoints
Endpoint reference

Every ElevenLabs API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

MethodEndpointWhat it doesAccessPermissionVersion

Text to Speech

Methods for turning text into spoken audio, including streaming and character-level timestamps.4

Generates audio and spends credits; the key needs Text to Speech access if the key is restricted by product.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Same generation as convert, delivered as a stream for lower latency.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Returns base64 audio plus alignment data for syncing text to audio.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Combines streaming delivery with alignment data.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Speech to Text

Methods for transcribing audio and video into text, synchronously or asynchronously.2

Setting webhook to true processes the job asynchronously and POSTs the result when done. Supports diarization and per-channel transcripts.

Acts ontranscript
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventspeech_to_text_transcription
Rate limitStandard limits apply

Read-only.

Acts ontranscript
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Voices

Methods for listing, reading, editing, and deleting the voices in an account.5

Read-only.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Changes a voice the whole account uses.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Irreversible; removes the voice for the whole account.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Voice design & cloning

Methods for designing a voice from a prompt and creating instant or professional voice clones.4

Returns previews with a generated voice ID; it does not yet add a voice to the account.

Acts onvoice_preview
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Adds a new voice to the account using a generated_voice_id from a design.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Adds a cloned voice to the account; clones a real person's voice from samples.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Adds a higher-fidelity cloned voice; may be gated by plan and verification.

Acts onvoice
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Audio tools

Methods for changing a voice, generating sound effects, and isolating speech from background noise.3

Generates audio from an uploaded recording and spends credits.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Generates audio and spends credits; controls include duration and prompt influence.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Generates cleaned audio and spends credits.

Acts onaudio
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Dubbing

Methods for translating and voicing media into another language and checking the job's status.2

Starts a billed dubbing job; runs asynchronously and is polled for status.

Acts ondubbing
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only; returns status, languages, and any error.

Acts ondubbing
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

History

Methods for listing, reading, and deleting previously generated audio.3

Read-only.

Acts onhistory_item
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onhistory_item
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Irreversible.

Acts onhistory_item
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Models

Methods for listing the available models and their capabilities.1

Read-only.

Acts onmodel
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

User & subscription

Methods for reading the account profile and its subscription, usage, and limits.2

Read-only; the response includes the account's API key, so it exposes that credential.

Acts onuser
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only; reports remaining credits and the reset date.

Acts onsubscription
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Conversational AI agents

Methods for creating, listing, and reading conversational agents (Agents Platform).3

Provisions a live agent with its prompt, model, tools, and voice settings.

Acts onagent
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only.

Acts onagent
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventNone
Rate limitStandard limits apply

Read-only; an active agent call can later POST a post-call transcription webhook.

Acts onagent
Permission (capability)None required
VersionAvailable since the API’s base version
Webhook eventpost_call_transcription
Rate limitStandard limits apply
No endpoints match those filters.
Webhooks

Webhook events.

ElevenLabs can notify an app when a long-running job finishes, like an asynchronous transcription completing or a conversational agent call ending. It POSTs a signed payload describing what happened, so an integration learns about the result without polling.

EventWhat it signalsTriggered by
speech_to_text_transcriptionAn asynchronous transcription job finished. The payload carries the completed transcript, so an app learns the result without polling. Sent only when a speech-to-text request set webhook to true./v1/speech-to-text
post_call_transcriptionA conversational agent call ended. The payload contains the full conversation data, including the transcript, analysis results, and metadata. Retries are supported for this event./v1/convai/agents/{agent_id}
post_call_audioA conversational agent call ended. The payload carries minimal data with base64-encoded audio of the full conversation./v1/convai/agents/{agent_id}
No events match that search.
Rate limits & pagination

Rate limits, pagination & request size.

ElevenLabs limits how many requests run at once by a concurrency ceiling set by the subscription plan, and meters generation against a monthly credit balance measured in characters for speech.

Request rate

ElevenLabs caps how many requests run at the same time with a concurrency limit set by the subscription plan, rather than a fixed requests-per-second rate. The documented concurrency ceilings are 2 on Free, 3 on Starter, 5 on Creator, 10 on Pro, and 15 on Scale and Business, with custom limits on Enterprise. Going over returns HTTP 429: too_many_concurrent_requests means the plan ceiling was hit, while system_busy means temporary platform congestion. Separately, generation is metered against a monthly credit balance, not a request count.

Pagination

List endpoints that can return many items, like voices and history, page with a page_size parameter and a next cursor, such as last_history_item_id on history, returned in the response when more results remain. Smaller list endpoints return the full set in one response.

Request size

Speech generation is billed by character, and the per-request text length depends on the model, for example Multilingual v2 handles up to roughly 10,000 characters of long-form input. Speech-to-text accepts uploaded audio or video files, and the monthly credit balance, not a single request, is the binding limit. Audio output is MP3 by default, with PCM, Opus, and µ-law available, and some higher-quality formats require a paid tier.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

StatusCodeMeaningWhat to do
400validation_error / invalid_requestThe request was bad: a parameter is missing or invalid. The detail.param field names the offending field.Read detail.message and detail.param, fix the request, and resend. It is not retryable as-is.
401authentication_errorNo valid API key was provided, for example a missing or invalid xi-api-key header.Confirm a valid API key is being sent, and rotate the key if it may be compromised.
402payment_requiredThe account has insufficient credits to complete the request.Top up credits or wait for the monthly reset, then retry.
403authorization_errorThe key or member lacks permission for this request, for example a key restricted away from this product area, or a feature the plan does not include.Grant the product-area access on the key, or use a key with the needed permission.
404not_foundThe requested object does not exist or is not visible to this key, for example a voice or transcript ID that is wrong.Verify the ID and confirm it belongs to this account.
422validation_errorThe request body failed schema validation. detail.param identifies the field.Correct the field named in detail.param and resend.
429rate_limit_errorToo many requests at once or a concurrency limit was hit. The body is either too_many_concurrent_requests, meaning the plan's concurrency ceiling, or system_busy, meaning temporary platform congestion.For too_many_concurrent_requests, queue requests or upgrade the plan. For system_busy, retry with exponential backoff.
500internal_errorAn error on the ElevenLabs side, which can also surface as 503 service_unavailable. It is rare.Retry with backoff, and contact support if it persists.
Versioning & freshness

Version history.

ElevenLabs serves a single namespace and signals model and feature changes through dated release notes rather than a new namespace, and it deprecates older speech and transcription models on announced dates.

Version history

What changed, and when

Latest versionv1
v1Current version
Current API namespace (v1)

ElevenLabs serves a single v1 namespace and ships capability changes as new model IDs and dated release notes rather than minting a new namespace. The entries below are notable dated changes from the changelog.

What changed
  • Music v2 made available via the API, with chunk-based composition plans.
  • Flash text-to-speech (eleven_flash_v2, eleven_flash_v2_5) generating speech in roughly 75ms.
  • Conversation tags added for organizing and filtering conversations.
2026-06-08Requires migration
June 2026 release notes

A dated release with model deprecations and conversational agent additions.

What changed
  • eleven_monolingual_v1 and eleven_multilingual_v1 speech models scheduled for removal on July 9, 2026.
  • scribe_v1 transcription model deprecated; default ASR provider shifts toward the realtime scribe.
  • Agents: turn-detection model selection (turn_v2 / turn_v3) and soft-timeout handling.
  • Text to Dialogue: enable_logging parameter for zero-retention mode on enterprise.
2026-04-27Feature update
April 2026 release notes

A dated release adding audio isolation history endpoints and agent security scoping.

What changed
  • New audio isolation history endpoints (list and delete).
  • Agents: trust context levels (unknown, low, high) for security scoping.
  • Agents: MCP response timeout configuration and a pre-tool speech mode.
  • Dynamic variables refactored to a unified, typed schema.
2026-04-01Feature update
April 2026 release notes

A dated release in the published changelog timeline.

What changed
  • Continued model and Agents Platform updates documented in the changelog.

Pin model IDs explicitly and move off deprecated models before their removal date.

ElevenLabs changelog ↗
Questions

ElevenLabs API, answered.

How does an app authenticate with the ElevenLabs API?+
Every request carries an API key in the xi-api-key header. A key is created in the dashboard, can be named and restricted to specific product areas, and can be given a monthly credit cap. For browser or device use, a server mints a short-lived single-use token instead, so the long-lived key stays on the backend.
Can an API key be limited to only some features?+
Yes. A key can be restricted to specific product areas so it only reaches certain endpoints, and it can carry a credit cap. A request the key is not scoped for returns HTTP 403 authorization_error. ElevenLabs does not publish a fixed list of per-call scope strings, so the boundary is set on the key, not requested per call.
How are rate limits and concurrency handled?+
ElevenLabs limits how many requests run at once rather than a fixed rate per second. The concurrency ceiling depends on the plan, from 2 on Free up to 15 on Scale and Business. A 429 with too_many_concurrent_requests means the ceiling was hit, so queue requests or upgrade; a 429 with system_busy is temporary congestion, so retry with backoff.
How is usage billed?+
Generation spends a monthly credit balance, measured in characters for speech. The user subscription endpoint reports the remaining balance and the reset date, and running out returns HTTP 402 payment_required. Concurrency and credits are separate limits, so an account can have credits left yet still hit the concurrency ceiling.
How does an app receive results without polling?+
Long jobs can push a signed payload to a registered HTTPS endpoint when they finish. An asynchronous speech-to-text request set with webhook true sends the transcript on completion, and a conversational agent call sends a post-call transcription or audio webhook when it ends. The receiver verifies an HMAC signature against the webhook secret to confirm the request came from ElevenLabs.
Does ElevenLabs have an official MCP server?+
Yes. ElevenLabs publishes a first-party Model Context Protocol server, the elevenlabs/elevenlabs-mcp repository, that exposes its audio tools to MCP clients like Claude and Cursor. It runs locally, forwards calls to the ElevenLabs API, and authenticates with an ElevenLabs API key, so it inherits whatever that key is scoped to.
How does ElevenLabs version its API?+
The API serves a single v1 namespace and signals change through new model IDs and dated release notes rather than a new path. Older speech and transcription models are deprecated on announced dates, for example scribe_v1 deprecated with removal on July 9, 2026, so an integration pins a model ID and migrates before the removal date.
Related

More ai API guides for agents

What is Bollard AI?

Control what every AI agent can do in ElevenLabs.

Bollard AI sits between a team's AI agents and ElevenLabs. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.

  • Set read, write, or full access per agent, never a shared ElevenLabs key.
  • Denied by default, so an agent reaches only what has been explicitly allowed.
  • Every call recorded in plain English: who, what, where, and the decision.
ElevenLabs
Voiceover Agent
Generate speech ActionOffReadFull use
Transcribe audio ActionOffReadFull use
Delete voices ActionOffReadFull use
Per-agent access, set in Bollard AI, not in ElevenLabs