Groq pricing - 2026
Groq API starts at $0.0000/1M input tokens. Prices effective since April 30, 2026.
| Model | Input $/1M | Output $/1M | Cached | Notes |
|---|---|---|---|---|
| Qwen Qwen3.32b | $0.290 | $0.590 | — | |
| Meta Llama Llama Guard 4.12b | $0.200 | $0.200 | — | |
| Meta Llama Llama 4 Maverick 17b 128e Instruct | $0.200 | $0.600 | — | |
| Meta Llama Llama 4 Scout 17b 16e Instruct | $0.110 | $0.340 | — | |
| Moonshotai Kimi K2 Instruct 0905 | $1.00 | $3.00 | $0.500 | |
| Openai Gpt Oss 120b | $0.150 | $0.600 | $0.0750 | |
| Openai Gpt Oss 20b | $0.0750 | $0.300 | $0.0375 | |
| Openai Gpt Oss Safeguard 20b | $0.0750 | $0.300 | $0.0370 | |
| Llama 3.1.8b Instant | $0.0500 | $0.0800 | — | |
| Llama 3.3.70b Versatile | $0.590 | $0.790 | — | |
| Gemma 7b It | $0.0500 | $0.0800 | — | |
| Llama 3.1 8B (Groq) | $0.0000 | $1.00 | — | |
| Llama 3.3 70B (Groq) | $0.0000 | $1.00 | — | |
| Llama 3.1 8B Instant | $0.0500 | $0.0800 | — | |
| Gemma 2 9B | $0.200 | $0.200 | — | |
| DeepSeek R1 Distill Llama 70B | $0.750 | $0.990 | — | Distilled reasoning model |
| Mixtral 8x7B | $0.240 | $0.240 | — | |
| Llama 3.3 70B Versatile | $0.590 | $0.790 | — |
Prices effective since April 30, 2026. Verified May 27, 2026. Confirm at LiteLLM before billing.
Cost calculator
Estimated monthly cost · 70% input / 30% output split
+12 more models not shown
Price history
Input price per 1M tokens - tracked from Jan 1, 2025
Prices scraped daily from official provider documentation. Chart shows input token pricing.
Groq usage limits by plan
| Rpm | 30 requests/min | Free tier |
| Tpm | 6000 tokens/min | Llama 3.3 70B |
| Rpd | 14400 requests/day | Free tier daily cap |
| Context Window | 128000 tokens | Llama 3.3 70B |
| Concurrent Requests | 5 requests | Simultaneous |
| Rpm | 1000 requests/min | Paid tier |
| Tpm | 500000 tokens/min | Higher throughput |
| Context Window | 128000 tokens | All supported models |
| Concurrent Requests | 50 requests | Simultaneous |
| Audio Hours Per Hour | 7200 seconds/hour | Whisper transcription |
Groq features and capabilities
Generation
| Image generation | ✕ No | Groq is an inference API — no image generation. | |
| Web search | ✕ No | No web search; fast inference only. | |
| Code generation | ✓ Yes | Supports Llama and Mixtral models for code. | |
| Reasoning mode | ✓ Yes | DeepSeek-R1 and Llama reasoning models available. |
Input & Context
| Audio input | ✓ Yes | Whisper transcription via API. | |
| Video input | ✕ No | No video support. | |
| Image input | ✓ Yes | Vision models available (Llama 3.2 Vision). | |
| File upload | ✕ No | No file upload; text and audio only via API. |
Integrations & API
| API access | ✓ Yes | OpenAI-compatible REST API — primary use case. | |
| Plugins/tools | ✓ Yes | Tool use and function calling supported. |
Languages
| English | ✓ Yes | Full English support across all models. | |
| Multilingual | ✓ Yes | Supports 100+ languages via Llama models. |
Memory
| Memory | ✕ No | No persistent memory; stateless API. | |
| Custom instructions | ✓ Yes | System prompt supported. |
Privacy & Security
| Training opt-out | ✓ Yes | Inputs not used for training by default. | |
| GDPR compliant | ✓ Yes | GDPR-compliant data processing. |
Related pages
About Groq API pricing
Groq API pricing is set by Groq and billed per million tokens processed. Input tokens (your prompt) and output tokens (the response) are priced separately. Cached input pricing applies when the same context is reused across requests, offering significant savings for repeated prompts.
Prices on this page are sourced from official Groq documentation and updated when Groq announces pricing changes. Check the official Groq pricing page for the most current rates.
Weekly AI pricing & uptime digest
Price drops, new model releases, and incident summaries - every Monday. Free.