https://aiplatform.googleapis.com/v1/publishers/google/models/{model}:streamGenerateContent?key={API_KEY}
https://aiplatform.googleapis.com/v1/publishers/google/models/{model}:streamGenerateContent?key={API_KEY}
https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContent
You can try out several models in express mode, including the Gemini 2.0 Flash models. The following table lists the models that are available in express mode, along with their rate limits:
Model category | Available models | Requests per minute |
---|---|---|
Gemini | gemini-2.0-flash-001 | 30 |
gemini-2.0-flash-lite-001 | 30 | |
gemini-2.5-pro-preview-03-25 | 30 | |
gemini-2.5-flash-preview-04-17 | 30 |
For Gemini 2.0 models, the Multimodal Live API isn't available in the Console in express mode. To use the Multimodal Live API in express mode, use the Vertex AI API or the Google Gen AI SDK.
You can start sending requests from your application to Vertex AI APIs in three steps:
Use Vertex AI Studio in express mode to quickly try Vertex AI features.
For example, in the Google Cloud console in express mode, select Vertex AI > Freeform and use the Freeform page to create and optimize multimodal prompts using a variety of Gemini models.
Get the code for what you implemented with the UI.
On the Freeform page, click Google Colab to try the Python code.
Get code. A panel opens showing code that programmatically sends the same requests that you implemented in the UI. You can get the code for a programming language or curl. You can useUse your API key to authenticate with the Vertex AI API.
In the Google Cloud console in express mode, click "YOUR_API_KEY"
. For example:
The Google Gen AI SDK for Python is available on PyPI and GitHub:
To learn more, see the Python SDK reference (opens in a new tab).
Vertex AI in express mode provides a subset of the features for Generative AI on Vertex AI. Therefore, some of the Vertex AI documentation is not relevant if you signed up in express mode. For details on the available API endpoints in express mode, see the Vertex AI in express mode REST API reference.
In addition, customers in Google Cloud typically use organizations and projects to work with resources (for example, to call an API endpoint). When using Vertex AI in express mode, you don't need to worry about organizations or projects. However, you might see them mentioned in some of the Google Cloud documentation that you reference while you're using Vertex AI in express mode. You can still use the documentation, but ignore concepts and instructions that refer to organizations and projects. In addition, the location you selected when signing up in express mode is used throughout your experience.
When calling REST API endpoints in express mode, you'll use the endpoint format for express mode and specify your API key. For example:
Standard endpoint URL | https://{location}-aiplatform.googleapis.com/v1/projects/{project}/locations/{location}/publishers/google/models/{model}:streamGenerateContent |
Endpoint URL in express mode | https://aiplatform.googleapis.com/v1/publishers/google/models/{model}:streamGenerateContent?key={API_KEY} |
To authenticate with Vertex AI API endpoints that support express mode, use the API key that was created for you during sign-up or any key that you've created in express mode. An API key is an encrypted string that is auto-generated for you when you sign up in express mode. These API keys can be viewed and managed on the API Keys page.
To learn more about the best practices for managing API keys, see Best practices for managing API keys.
To view and manage your API keys, do the following:
Go to the Vertex AI Studio Overview page in express mode.
In Google Cloud console in express mode, click
Menu.Select API Keys.
The API Keys page opens and you can use it to manage your API keys.
Your free use of Vertex AI in express mode is restricted by quotas. These quotas restrict the rate at which you can use Vertex AI in express mode at no cost. A quota limits how much of a Google Cloud resource you can use.
To view your current usage and quotas, do the following:
Go to the Vertex AI Studio Overview page in express mode.
In Google Cloud console in express mode, click
Menu.Select Quotas.
You can increase your quotas and remove the 90 day limit by enabling billing.
After enabling billing, you pay only for what you use. You can also save your prompts and access additional settings in the console that are that grayed out when billing isn't enabled.
To view your current usage and quotas, do the following:
Go to the Vertex AI Studio Overview page in express mode.
In Google Cloud console in express mode, click
Menu.Select Billing.
You can start using all the capabilities and services available in Google Cloud in your project by graduating from express mode.
To graduate from express mode, do the following:
Go to the Vertex AI Studio Overview page in express mode.
In Google Cloud console in express mode, click
Menu.Select Billing.
In the Access all Google Cloud section, click Learn more and get started.
After you graduate from express mode, specify your project ID and location instead of your API key when you call the REST API endpoints. For example:https://{location}-aiplatform.googleapis.com/v1/projects/{projectid}/locations/{location}/publishers/google/models/{model}:streamGenerateContent
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-04-25 UTC.