Inference Providers

Hugging Face’s Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers.

To learn more about the launch of Inference Providers, check out our announcement blog post.

Partners

Here is the complete list of partners integrated with Inference Providers, and the supported tasks for each of them:

Provider	Chat completion (LLM)	Chat completion (VLM)	Feature Extraction	Text to Image	Text to video
Cerebras	✅
Cohere	✅	✅
Fal AI				✅	✅
Fireworks	✅	✅
HF Inference	✅	✅	✅	✅
Hyperbolic	✅	✅
Nebius	✅	✅		✅
Novita	✅	✅			✅
Replicate				✅	✅
SambaNova	✅		✅
Together	✅	✅		✅

Why use Inference Providers?

Inference Providers offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're experimenting with ML capabilities or building a new application, this API gives you instant access to high-performing models across multiple domains:

Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
Image and Video Generation: Easily create customized images, including LoRAs for your own styles.
Document Embeddings: Build search and retrieval systems with SOTA embeddings.
Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.

⚡ Fast and Free to Get Started: Inference Providers comes with a free-tier and additional included credits for PRO users, as well as Enterprise Hub organizations.

Key Features

🎯 All-in-One API: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more.
🔀 Multi-Provider Support: Easily run models from top-tier providers like fal, Replicate, Sambanova, Together AI, and others.
🚀 Scalable & Reliable: Built for high availability and low-latency performance in production environments.
🔧 Developer-Friendly: Simple requests, fast responses, and a consistent developer experience across Python and JavaScript clients.
💰 Cost-Effective: No extra markup on provider rates.

Inference Playground

To get started quickly with Chat Completion models, use the Inference Playground to easily test and compare models with your prompts.

Get Started

You can use Inference Providers with your preferred tools, such as Python, JavaScript, or cURL. To simplify integration, we offer both a Python SDK (huggingface_hub) and a JavaScript SDK (huggingface.js).

In this section, we will demonstrate a simple example using deepseek-ai/DeepSeek-V3-0324, a conversational Large Language Model. For the example, we will use Novita AI as Inference Provider.

Authentication

Inference Providers requires passing a user token in the request headers. You can generate a token by signing up on the Hugging Face website and going to the settings page. We recommend creating a fine-grained token with the scope to Make calls to Inference Providers.

For more details about user tokens, check out this guide.

cURL

Let's start with a cURL command highlighting the raw HTTP request. You can adapt this request to be run with the tool of your choice.

curl https://router.huggingface.co/novita/v3/openai/chat/completions \ -H "Authorization: Bearer $HF_TOKEN" \ -H 'Content-Type: application/json' \ -d '{ "messages": [ { "role": "user", "content": "How many G in huggingface?" } ], "model": "deepseek/deepseek-v3-0324", "stream": false }'

Python

In Python, you can use the requests library to make raw requests to the API:

importrequestsAPI_URL="https://router.huggingface.co/novita/v3/openai/chat/completions"headers= {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"} payload= { "messages": [ { "role": "user", "content": "How many 'G's in 'huggingface'?" } ], "model": "deepseek/deepseek-v3-0324", } response=requests.post(API_URL, headers=headers, json=payload) print(response.json()["choices"][0]["message"])

For convenience, the Python library huggingface_hub provides an InferenceClient that handles inference for you. Make sure to install it with pip install huggingface_hub.

fromhuggingface_hubimportInferenceClientclient=InferenceClient( provider="novita", api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx", ) completion=client.chat.completions.create( model="deepseek-ai/DeepSeek-V3-0324", messages=[ { "role": "user", "content": "How many 'G's in 'huggingface'?" } ], ) print(completion.choices[0].message)

JavaScript

In JS, you can use the fetch library to make raw requests to the API:

importfetchfrom"node-fetch";constresponse=awaitfetch("https://router.huggingface.co/novita/v3/openai/chat/completions",{method: "POST",headers: {Authorization: `Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`,"Content-Type": "application/json",},body: JSON.stringify({provider: "novita",model: "deepseek-ai/DeepSeek-V3-0324",messages: [{role: "user",content: "How many 'G's in 'huggingface'?",},],}),});console.log(awaitresponse.json());

For convenience, the JS library @huggingface/inference provides an InferenceClient that handles inference for you. You can install it with npm install @huggingface/inference.

import{InferenceClient}from"@huggingface/inference";constclient=newInferenceClient("hf_xxxxxxxxxxxxxxxxxxxxxxxx");constchatCompletion=awaitclient.chatCompletion({provider: "novita",model: "deepseek-ai/DeepSeek-V3-0324",messages: [{role: "user",content: "How many 'G's in 'huggingface'?",},],});console.log(chatCompletion.choices[0].message);

Next Steps

In this introduction, we've covered the basics of Inference Providers. To learn more about this service, check out our guides and API Reference:

Pricing and Billing: everything you need to know about billing.
Hub integration: how is Inference Providers integrated with the Hub?
Register as an Inference Provider: everything about how to become an official partner.
Hub API: high-level API for Inference Providers.
API Reference: learn more about the parameters and task-specific settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Inference Providers

Partners

Why use Inference Providers?

Key Features

Inference Playground

Get Started

Authentication

cURL

Python

JavaScript

Next Steps

Files

index.md

Latest commit

History

index.md

File metadata and controls

Inference Providers

Partners

Why use Inference Providers?

Key Features

Inference Playground

Get Started

Authentication

cURL

Python

JavaScript

Next Steps