8000 Update inference-providers-groq.md by CharlesCNorton · Pull Request #2919 · huggingface/blog · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Update inference-providers-groq.md #2919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions inference-providers-groq.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ authors:
We're thrilled to share that **Groq** is now a supported Inference Provider on the Hugging Face Hub!
Groq joins our growing ecosystem, enhancing the breadth and capabilities of serverless inference directly on the Hub’s model pages. Inference Providers are also seamlessly integrated into our client SDKs (for both JS and Python), making it super easy to use a wide variety of models with your preferred providers.

[Groq](https://groq.com) supports a wide variety of text and conversational models, including the latest open-source models such as [Meta's LLama 4](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct?inference_provider=groq), [Qwen's QWQ-32B](https://huggingface.co/Qwen/QwQ-32B?inference_provider=groq), ad many more.
[Groq](https://groq.com) supports a wide variety of text and conversational models, including the latest open-source models such as [Meta's Llama 4](https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct?inference_provider=groq), [Qwen's QWQ-32B](https://huggingface.co/Qwen/QwQ-32B?inference_provider=groq), and many more.

At the heart of Groq's technology is the Language Processing Unit (LPU™), a new type of end-to-end p 686D rocessing unit system that provides the fastest inference for computationally intensive applications with a sequential component, such as Large Language Models (LLMs). LPUs are designed to overcome the limitations of GPUs for inference, offering significantly lower latency and higher throughput. This makes them ideal for real-time AI applications.

Expand Down Expand Up @@ -58,7 +58,7 @@ See the list of supported models [here](https://huggingface.co/models?inference_

#### from Python, using huggingface_hub

The following example shows how to use Meta's LLama 4 using Groq as the inference provider. You can use a [Hugging Face token](https://huggingface.co/settings/tokens) for automatic routing through Hugging Face, or your own Groq API key if you have one.
The following example shows how to use Meta's Llama 4 using Groq as the inference provider. You can use a [Hugging Face token](https://huggingface.co/settings/tokens) for automatic routing through Hugging Face, or your own Groq API key if you have one.

Install `huggingface_hub` from source (see [instructions](https://huggingface.co/docs/huggingface_hub/installation#install-from-source)). Official support will be released soon in version v0.33.0.

Expand Down
0