Resources/The Agent API Atlas/AI/Google Gemini

Everything an AI agent can do with the Google Gemini API.

A reference guide for building AI agents: every method, how to authenticate, and the permissions each one needs.

Endpoints26

API versionv1beta

Last updated23 June 2026

Orientation

How the Google Gemini API works.

The Google Gemini API is how an app or AI agent works with Google's Gemini models: generating text and images from a prompt, turning text into embeddings for search, uploading files for a model to read, caching context to reuse, and fine-tuning a model. Access is granted through an account-wide API key from Google AI Studio, which has no per-endpoint permissions, so any call the key can reach, it can make. The model is the versioned, dated thing rather than the API path, and the API can notify an endpoint when asynchronous batch work finishes.

26Endpoints

7Capability groups

11Read

15Write

0Permissions

Authentication

Gemini authenticates calls with an API key created in Google AI Studio, sent in the x-goog-api-key header or as a query parameter. The key is account-wide and is the credential behind nearly every call. Some operations, like calling a tuned model, can instead use an OAuth 2.0 access token tied to a Google account. This is the Gemini Developer API on generativelanguage.googleapis.com, which is separate from Gemini on Google Cloud Vertex AI, where the same models are reached through Google Cloud authentication and project-level controls.

Permissions

The API key has no granular per-endpoint permissions. It is not scoped to specific methods or resources, so any call the key can reach, it can make, from generating content to uploading files, caching context, and creating or deleting tuned models. There is no built-in way on the key itself to allow reads but block writes, or to limit it to one area. This is the gap Bollard fills, by deciding per agent which methods a key is allowed to call. Vertex AI, by contrast, layers Google Cloud Identity and Access Management roles on top, which the Developer API key does not have.

Versioning

The API exposes a stable surface under v1 and a broader preview surface under v1beta that carries newer features first, and the SDKs default to v1beta. There is no dated version string on the API path itself. Instead the model is the versioned, dated thing: each model has a name and version, models are promoted from preview to general availability and later retired, and release notes track those model changes by date.

Data model

Gemini is resource-oriented JSON over HTTPS at generativelanguage.googleapis.com. Generation calls address a model by name, like /v1beta/models/{model}:generateContent, while files, cached content, batches, and tuned models are stored resources at /v1beta/files, /v1beta/cachedContents, /v1beta/batches, and /v1beta/tunedModels. The Live API runs over a separate WebSocket connection for real-time audio and video, and there is no general webhook for ordinary generation, only completion notifications for asynchronous batch and long-running work.

Connect & authenticate

Connection & authentication methods.

How an app or AI agent connects to Gemini determines what it can reach. There is a route for making calls, a real-time route for live audio and video, and a hosted documentation server, and each is governed by the API key behind it.

Ways to connect

REST API

The REST API answers at https://generativelanguage.googleapis.com, with the stable surface under /v1 and the broader preview surface under /v1beta. A call authenticates with an API key from Google AI Studio, sent in the x-goog-api-key header or as a query parameter.

Best forConnecting an app or AI agent to Gemini.

Governed byThe API key behind the call.

Docs ↗

Live API (WebSocket)

The Live API uses a bidirectional WebSocket connection for real-time, low-latency interaction, streaming audio and video in and audio and text out. It authenticates with the same API key and suits voice and live multimodal agents rather than single request-and-response calls.

Best forReal-time voice and live multimodal agents.

Governed byThe API key behind the connection.

Docs ↗

MCP server (documentation)

Google hosts a public Model Context Protocol server at https://gemini-api-docs-mcp.dev that exposes a search_documentation function, so an agent can pull current Gemini API definitions and patterns into its context. It serves documentation lookup, not calls against the generative API itself, which are made over REST or the Live API.

Best forGiving a coding agent current Gemini API documentation.

Governed byPublic documentation access; it makes no account calls.

Docs ↗

Authentication

API key

A Gemini API key from Google AI Studio authenticates every call, sent in the x-goog-api-key header or as a query parameter. The key is account-wide and carries no per-endpoint scopes, so any call it can reach, it can make. A key must never be exposed in client code.

TokenAPI key (x-goog-api-key)

Best forServer-side access to the Gemini Developer API

Docs ↗

OAuth 2.0

Some operations, like calling a tuned model, can use OAuth 2.0 access tokens tied to a Google account rather than a plain API key. OAuth suits flows that act on behalf of a user, while most generation work uses an API key.

TokenOAuth 2.0 access token

Best forActing on behalf of a user, such as tuned-model access

Docs ↗

Capability map

What an AI agent can do with Google Gemini.

The Gemini API is split into areas an agent can act on, like generating content, creating embeddings, counting tokens, uploading files, caching context, running batches, and tuning models. Each area has its own methods, and some create lasting resources or spend against the account's quota.

Models & content

4 endpoints

Generate content from a model, stream it back as it is produced, and list or read the available models.

Each generation call spends against the account's token quota.

View endpoints →

Embeddings

2 endpoints

Turn text into embedding vectors, one input at a time or in a batch, for search and similarity work.

Each embedding call spends against the account's token quota.

View endpoints →

Token counting

1 endpoint

Run a model's tokenizer over input to count tokens before sending a generation call.

A read-only utility that does not generate content.

View endpoints →

Files

4 endpoints

Upload files for a model to read, then list, read, or delete them. Files are held for 48 hours.

Uploaded files are readable by any call using the same key until they expire or are deleted.

View endpoints →

Context caching

5 endpoints

Save precomputed input tokens as cached content and reuse them across calls, then list, read, update, or delete the cache.

Cached content persists until its time-to-live expires or it is deleted.

View endpoints →

Batch

5 endpoints

Submit many generation requests as one asynchronous job at half the standard cost, then read, list, or cancel it.

A batch job runs asynchronously and spends against quota when it completes.

View endpoints →

Tuned models

5 endpoints

Create a fine-tuned model from training data, then list, read, generate from, or delete it.

A tuned model is a lasting resource created on the account.

View endpoints →

Endpoint reference

Every Google Gemini API method.

Filter by method, access, or permission, or search any path. Select a row for version detail, rate limits, the related webhook event, and the source.

Hide deprecated

Method	Endpoint	What it does	Access	Permission	Version
Models & content Generate content from a model, stream it back as it is produced, and list or read the available models.4
POST	`/v1beta/{model=models/*}:generateContent`	Generate a single model response from an input request, which can include text, images, audio, code, and tool calls.	write	—	Current
The API key carries no per-method scope. Marked as a write because the call consumes token quota and can run tools, though it creates no stored resource. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1beta/{model=models/*}:streamGenerateContent`	Generate a model response that streams back in chunks as it is produced, rather than waiting for the full output.	write	—	Current
Takes the same request as generateContent and returns a stream of partial responses. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/models`	List the models available to the key, with their token limits and metadata.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/{name=models/*}`	Read the details of one model, including its version and token limits.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Embeddings Turn text into embedding vectors, one input at a time or in a batch, for search and similarity work.2
POST	`/v1beta/{model=models/*}:embedContent`	Generate one text embedding vector from input content using an embedding model.	write	—	Current
Marked as a write because it consumes token quota; it creates no stored resource. The API key carries no per-method scope. Acts onembedding Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1beta/{model=models/*}:batchEmbedContents`	Generate many embedding vectors in one call from a batch of input content.	write	—	Current
Each input in the batch is an EmbedContentRequest. The API key carries no per-method scope. Acts onembedding Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Token counting Run a model's tokenizer over input to count tokens before sending a generation call.1
POST	`/v1beta/{model=models/*}:countTokens`	Run a model's tokenizer over input content and return the token count, without generating anything.	read	—	Current
A read-only utility used to size a prompt before a generation call. The API key carries no per-method scope. Acts onmodel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Files Upload files for a model to read, then list, read, or delete them. Files are held for 48 hours.4
POST	`/upload/v1beta/files`	Upload a file for a model to read later, such as an image, audio clip, video, or document.	write	—	Current
Uses a separate upload host. Stored files are held for 48 hours, up to 2 GB per file and 20 GB per project. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/files`	List the files uploaded with the key.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/{name=files/*}`	Read the metadata of one uploaded file.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1beta/{name=files/*}`	Delete an uploaded file before its 48-hour expiry.	write	—	Current
Irreversible. The API key carries no per-method scope. Acts onfile Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Context caching Save precomputed input tokens as cached content and reuse them across calls, then list, read, update, or delete the cache.5
POST	`/v1beta/cachedContents`	Create cached content, saving precomputed input tokens to reuse across later calls.	write	—	Current
A time-to-live is set with the ttl field or an expireTime. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/cachedContents`	List the cached content created with the key.	read	—	Current
Read-only. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/{name=cachedContents/*}`	Read one cached content resource.	read	—	Current
Read-only. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
PATCH	`/v1beta/{cachedContent.name=cachedContents/*}`	Update a cached content resource, such as extending its time-to-live.	write	—	Current
The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1beta/{name=cachedContents/*}`	Delete a cached content resource.	write	—	Current
Irreversible. The API key carries no per-method scope. Acts oncachedContent Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Batch Submit many generation requests as one asynchronous job at half the standard cost, then read, list, or cancel it.5
POST	`/v1beta/{batch.model=models/*}:batchGenerateContent`	Submit many generation requests as one asynchronous batch job, at half the standard cost.	write	—	Current
Targets a 24-hour turnaround and can notify a registered endpoint on completion. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook event`batch-completed` Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/{name=batches/*}`	Read the status and results of one batch job.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/batches`	List the batch jobs created with the key.	read	—	Current
Read-only. The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1beta/{name=batches/*}:cancel`	Cancel a batch job that is still running.	write	—	Current
The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1beta/{name=batches/*}`	Delete a batch job record.	write	—	Current
The API key carries no per-method scope. Acts onbatch Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
Tuned models Create a fine-tuned model from training data, then list, read, generate from, or delete it.5
POST	`/v1beta/tunedModels`	Create a fine-tuned model from training data.	write	—	Current
Starts a long-running tuning job that produces a lasting tuned model. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/tunedModels`	List the tuned models created with the key.	read	—	Current
Read-only. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
GET	`/v1beta/{name=tunedModels/*}`	Read the details of one tuned model, including its tuning state.	read	—	Current
Read-only. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
POST	`/v1beta/{model=tunedModels/*}:generateContent`	Generate a response from a tuned model.	write	—	Current
Calling a tuned model can need proper authentication beyond a plain API key. The key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗
DELETE	`/v1beta/{name=tunedModels/*}`	Delete a tuned model.	write	—	Current
Irreversible. The API key carries no per-method scope. Acts ontunedModel Permission (capability)None required VersionAvailable since the API’s base version Webhook eventNone Rate limitStandard limits apply SourceOfficial documentation ↗

No endpoints match those filters.

Webhooks

Webhook events.

Gemini has no general webhook system for content generation, so an app reads results back from the call it made. The exception is asynchronous work, like a batch job or a long-running operation, which can notify a registered endpoint when it finishes.

Event	What it signals	Triggered by
`Batch / long-running operation completion`	Fires when an asynchronous batch job or long-running operation finishes, so an integration learns the result is ready without polling. This event-driven notification was introduced for the Batch API and long-running operations in May 2026.	`/v1beta/{batch.model=models/*}:batchGenerateContent`

No events match that search.

Rate limits & pagination

Rate limits, pagination & request size.

Gemini limits how fast and how much an app or AI agent can call, through per-model ceilings on requests per minute, tokens per minute, and requests per day, with the ceilings rising as the billing account moves up a usage tier.

Request rate

Gemini sets per-model limits across three dimensions at once: requests per minute, tokens per minute, and requests per day. Usage is checked against each, and exceeding any one returns HTTP 429 with the status RESOURCE_EXHAUSTED. The ceilings rise as the billing account moves up a usage tier, decided by cumulative spend: a Free tier, then Tier 1 once billing is set up, Tier 2 after 100 US dollars of spend and 30 days, and Tier 3 after 1,000 US dollars and 30 days. Higher tiers and newer or larger models carry higher ceilings, and a separate, much larger token allowance applies to enqueued batch work, for example several million enqueued tokens for a flash model at Tier 1 rising into the billions at Tier 3.

Pagination

List endpoints page through results with a pageSize parameter that caps the page and a pageToken parameter that requests the next page. A response returns a nextPageToken when more results remain, and an empty nextPageToken means the last page has been reached.

Request size

Limits are set by the model and the resource rather than one global cap. The Files API holds up to 20 GB per project, with a per-file maximum of 2 GB and files retained for 48 hours; uploaded PDFs are capped at 50 MB. Each model has its own input and output token limits, readable on the model resource, and context caching has its own minimum token count to be eligible.

Errors

Status codes & error handling.

The status codes an agent should handle, and what to do about each.

Status	Code	Meaning	What to do
400	`INVALID_ARGUMENT`	The request body is malformed, with a typo, a missing required field, or an invalid value.	Check the request against the API reference, correct the named field, and resend.
400	`FAILED_PRECONDITION`	The free tier is not available in the caller's region, or billing is not enabled for a request that needs it.	Enable a paid plan or billing on the project in Google AI Studio.
403	`PERMISSION_DENIED`	The API key lacks permission for the request, often a wrong key or missing authentication for a tuned model.	Confirm the right key is sent and use the correct authentication for the resource.
404	`NOT_FOUND`	The requested resource was not found, such as a model, file, or tuned model that does not exist for this key or version.	Check the resource name and the API version in the path, then retry.
429	`RESOURCE_EXHAUSTED`	A rate limit was exceeded for the model, on requests per minute, tokens per minute, or requests per day.	Back off and retry, smooth the request rate, or request a quota increase or a higher usage tier.
500	`INTERNAL`	An unexpected error on Google's side, sometimes triggered by an unusually long input context.	Retry with backoff, and reduce the input context if the error persists.
503	`UNAVAILABLE`	The service is temporarily overloaded or down.	Retry with backoff, or switch to another model for the moment.
504	`DEADLINE_EXCEEDED`	The request could not finish within the deadline, often because the prompt or context is too large.	Raise the client timeout, or reduce the prompt and context size.

Versioning & freshness

Version history.

Gemini exposes a stable v1 surface and a broader v1beta surface that carries preview features first, and the model itself is the thing that is versioned and dated, not the API path.

Version history

What changed, and when

Latest versionv1beta

v1betaCurrent version

Preview surface (SDK default)

The API exposes a stable v1 surface and a broader v1beta preview surface that carries newer features first, and the SDKs default to v1beta. The API path itself has no dated version string; the model is the versioned, dated thing, promoted from preview to general availability and later retired. The dated entries below are notable model and platform changes from the Gemini API release notes.

What changed

v1beta exposes preview features ahead of the stable v1 surface
Both v1 and v1beta support the Interactions API as of June 2026

2026-06-17Feature update

Streaming speech generation

Streaming support was added for the text-to-speech preview model.

What changed

Streaming output for the text-to-speech preview model

2026-05-28Feature update

Native visual models reach general availability

The native visual models known as Nano Banana 2 and Pro were released as generally available versions.

What changed

Native visual models released as GA

2026-05-19Feature update

Gemini 3.5 Flash GA and managed agents preview

Gemini 3.5 Flash was released as generally available, and managed agents launched in public preview.

What changed

Gemini 3.5 Flash reached general availability
Managed agents launched in public preview

2026-05-04Feature update

Event-driven webhooks for asynchronous work

Event-driven webhook support was introduced for the Batch API and long-running operations, so an integration can be notified on completion rather than polling.

What changed

Webhook completion notifications added for the Batch API and long-running operations

2026-04-22Feature update

Embedding model 2 reaches general availability

The second-generation embedding model was released as generally available.

What changed

Embedding model 2 reached general availability

2026-04-01Feature update

Flex and Priority inference tiers

New Flex and Priority inference tiers were introduced to trade off cost against latency.

What changed

Flex and Priority inference tiers introduced

An integration can target the stable surface or opt into the preview surface for newer features.

Gemini API release notes ↗

Questions

Google Gemini API, answered.

How do I authenticate, and where does the API key go?+

A call authenticates with an API key created in Google AI Studio. The key is sent in the x-goog-api-key request header, or as a key query parameter on the URL. It is account-wide, so it must stay on a server and never appear in client code, where anyone could read and reuse it. A few operations, like calling a tuned model, can use an OAuth 2.0 access token instead.

Can an API key be limited to read-only, or to one part of the API?+

Not on the key itself. A Gemini API key carries no per-endpoint scopes, so it cannot be set to allow reads but block writes, or to reach only one area like files or tuning. Any method the key can call, it can call. Limiting an agent to a subset of methods is exactly what a gateway like Bollard adds in front of the key. The Vertex AI route does support Google Cloud roles, but the Developer API key does not.

What is the difference between the Gemini Developer API and Vertex AI?+

Both serve the same Gemini models, but through different front doors. The Gemini Developer API on generativelanguage.googleapis.com uses a simple API key and is quick to start with. Vertex AI is part of Google Cloud, reached through Google Cloud authentication, with project-level Identity and Access Management roles, regional controls, and enterprise billing. Teams already on Google Cloud or needing those controls tend to use Vertex AI; this page covers the Developer API.

How do the rate limits work?+

Each model has limits on requests per minute, tokens per minute, and requests per day, and usage is checked against all three at once, so exceeding any one returns HTTP 429 with the status RESOURCE_EXHAUSTED. The ceilings rise with the account's usage tier, which is based on cumulative spend, from a Free tier up through Tier 1, Tier 2, and Tier 3. The fix for a 429 is to back off and retry, smooth the call rate, or request a quota increase.

Does Gemini send webhooks?+

Not for ordinary content generation, where the result comes back on the call that was made. The exception is asynchronous work: a batch job or a long-running operation can notify a registered endpoint when it completes, support that was added in May 2026, so an integration learns the result is ready without polling for it.

How long are uploaded files kept, and how big can they be?+

Files uploaded through the Files API are held for 48 hours, then deleted automatically, and can also be deleted sooner. A project can store up to 20 GB at once, with a maximum of 2 GB per file; uploaded PDFs are capped at 50 MB. While a file exists, any call using the same key can reference it.

What is context caching and how much can it save?+

Context caching saves precomputed input tokens so they can be reused across calls, for example asking several questions about the same large document or media file. Instead of resending and reprocessing that content each time, a call points at the cached content. Each cache carries a time-to-live, set with the ttl field or an expireTime, and expires or can be deleted when it is no longer needed.

What is Bollard AI?

Control what every AI agent can do with Gemini.

Bollard AI sits between a team's AI agents and Google Gemini. Grant each agent exactly the access it needs, read or write, area by area, and every call is checked and logged.

Set read, write, or full access per agent, never a shared Gemini key.
Denied by default, so an agent reaches only what has been explicitly allowed.
Every call recorded in plain English: who, what, where, and the decision.

Control Gemini access in Bollard Browse all APIs →

Google Gemini

Drafting Agent

Generate text and images ActionOffReadFull use

Read uploaded files ResourceOffReadFull use

Fine-tune models ActionOffReadFull use

Per-agent access, set in Bollard AI, not in Gemini

How the Google Gemini API works.

Connection & authentication methods.

REST API

Live API (WebSocket)

MCP server (documentation)

API key

OAuth 2.0

What an AI agent can do with Google Gemini.

Models & content

Embeddings

Token counting

Files

Context caching

Batch

Tuned models

Every Google Gemini API method.

Models & content

Embeddings

Token counting

Files

Context caching

Batch

Tuned models

Webhook events.

Rate limits, pagination & request size.

Request rate

Pagination

Request size

Status codes & error handling.

Version history.

What changed, and when

Google Gemini API, answered.

More ai API guides for agents

Hugging Face

ElevenLabs

Replicate

OpenAI

Anthropic

Cohere

Control what every AI agent can do with Gemini.