Hugging Face’s Inference Providers give developers streamlined, unified access to hundreds of machine learning models, powered by our serverless inference partners. This new approach builds on our previous Serverless Inference API, offering more models, improved performance, and greater reliability thanks to world-class providers.
To learn more about the launch of Inference Providers, check out our announcement blog post.
Here is the complete list of partners integrated with Inference Providers, and the supported tasks for each of them:
Provider | Chat completion (LLM) | Chat completion (VLM) | Feature Extraction | Text to Image | Text to video |
---|---|---|---|---|---|
Cerebras | ✅ | ||||
Cohere | ✅ | ✅ | |||
Fal AI | ✅ | ✅ | |||
Fireworks | ✅ | ✅ | |||
HF Inference | ✅ | ✅ | ✅ | ✅ | |
Hyperbolic | ✅ | ✅ | |||
Nebius | ✅ | ✅ | ✅ | ||
Novita | ✅ | ✅ | ✅ | ||
Replicate | ✅ | ✅ | |||
SambaNova | ✅ | ✅ | |||
Together | ✅ | ✅ | ✅ |
Inference Providers offers a fast and simple way to explore thousands of models for a variety of tasks. Whether you're experimenting with ML capabilities or building a new application, this API gives you instant access to high-performing models across multiple domains:
- Text Generation: Including large language models and tool-calling prompts, generate and experiment with high-quality responses.
- Image and Video Generation: Easily create customized images, including LoRAs for your own styles.
- Document Embeddings: Build search and retrieval systems with SOTA embeddings.
- Classical AI Tasks: Ready-to-use models for text classification, image classification, speech recognition, and more.
⚡ Fast and Free to Get Started: Inference Providers comes with a free-tier and additional included credits for PRO users, as well as Enterprise Hub organizations.
- 🎯 All-in-One API: A single API for text generation, image generation, document embeddings, NER, summarization, image classification, and more.
- 🔀 Multi-Provider Support: Easily run models from top-tier providers like fal, Replicate, Sambanova, Together AI, and others.
- 🚀 Scalable & Reliable: Built for high availability and low-latency performance in production environments.
- 🔧 Developer-Friendly: Simple requests, fast responses, and a consistent developer experience across Python and JavaScript clients.
- 💰 Cost-Effective: No extra markup on provider rates.
To get started quickly with Chat Completion models, use the Inference Playground to easily test and compare models with your prompts.
You can use Inference Providers with your preferred tools, such as Python, JavaScript, or cURL. To simplify integration, we offer both a Python SDK (huggingface_hub
) and a JavaScript SDK (huggingface.js
).
In this section, we will demonstrate a simple example using deepseek-ai/DeepSeek-V3-0324, a conversational Large Language Model. For the example, we will use Novita AI as Inference Provider.
Inference Providers requires passing a user token in the request headers. You can generate a token by signing up on the Hugging Face website and going to the settings page. We recommend creating a fine-grained
token with the scope to Make calls to Inference Providers
.
For more details about user tokens, check out this guide.
Let's start with a cURL command highlighting the raw HTTP request. You can adapt this request to be run with the tool of your choice.
curl https://router.huggingface.co/novita/v3/openai/chat/completions \ -H "Authorization: Bearer $HF_TOKEN" \ -H 'Content-Type: application/json' \ -d '{ "messages": [ { "role": "user", "content": "How many G in huggingface?" } ], "model": "deepseek/deepseek-v3-0324", "stream": false }'
In Python, you can use the requests
library to make raw requests to the API:
importrequestsAPI_URL="https://router.huggingface.co/novita/v3/openai/chat/completions"headers= {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"} payload= { "messages": [ { "role": "user", "content": "How many 'G's in 'huggingface'?" } ], "model": "deepseek/deepseek-v3-0324", } response=requests.post(API_URL, headers=headers, json=payload) print(response.json()["choices"][0]["message"])
For convenience, the Python library huggingface_hub
provides an InferenceClient
that handles inference for you. Make sure to install it with pip install huggingface_hub
.
fromhuggingface_hubimportInferenceClientclient=InferenceClient( provider="novita", api_key="hf_xxxxxxxxxxxxxxxxxxxxxxxx", ) completion=client.chat.completions.create( model="deepseek-ai/DeepSeek-V3-0324", messages=[ { "role": "user", "content": "How many 'G's in 'huggingface'?" } ], ) print(completion.choices[0].message)
In JS, you can use the fetch
library to make raw requests to the API:
importfetchfrom"node-fetch";constresponse=awaitfetch("https://router.huggingface.co/novita/v3/openai/chat/completions",{method: "POST",headers: {Authorization: `Bearer hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx`,"Content-Type": "application/json",},body: JSON.stringify({provider: "novita",model: "deepseek-ai/DeepSeek-V3-0324",messages: [{role: "user",content: "How many 'G's in 'huggingface'?",},],}),});console.log(awaitresponse.json());
For convenience, the JS library @huggingface/inference
provides an InferenceClient
that handles inference for you. You can install it with npm install @huggingface/inference
.
import{InferenceClient}from"@huggingface/inference";constclient=newInferenceClient("hf_xxxxxxxxxxxxxxxxxxxxxxxx");constchatCompletion=awaitclient.chatCompletion({provider: "novita",model: "deepseek-ai/DeepSeek-V3-0324",messages: [{role: "user",content: "How many 'G's in 'huggingface'?",},],});console.log(chatCompletion.choices[0].message);
In this introduction, we've covered the basics of Inference Providers. To learn more about this service, check out our guides and API Reference:
- Pricing and Billing: everything you need to know about billing.
- Hub integration: how is Inference Providers integrated with the Hub?
- Register as an Inference Provider: everything about how to become an official partner.
- Hub API: high-level API for Inference Providers.
- API Reference: learn more about the parameters and task-specific settings.