Groq rate limits, context window & usage caps (2026)
Groq API rate limits and usage caps. Ultra-fast inference rate limits (30 RPM free), RPM/TPM/RPD by plan - free and paid tiers compared.
Context Window
128000
tokens · ~96K words
Plans
2
tiers tracked
API Tiers
3
rate limit tiers
Groq usage limits by plan
| Rpm | 30 requests/min | Free tier |
| Tpm | 6000 tokens/min | Llama 3.3 70B |
| Rpd | 14400 requests/day | Free tier daily cap |
| Context Window | 128000 tokens(~96K words) | Llama 3.3 70B |
| Concurrent Requests | 5 requests | Simultaneous |
| Rpm | 1000 requests/min | Paid tier |
| Tpm | 500000 tokens/min | Higher throughput |
| Context Window | 128000 tokens(~96K words) | All supported models |
| Concurrent Requests | 50 requests | Simultaneous |
| Audio Hours Per Hour | 7200 seconds/hour | Whisper transcription |
Groq API rate limits by tier
API access uses a tiered rate limit system. Higher tiers unlock more requests per minute (RPM) and tokens per minute (TPM).
| Tier | RPM | TPM |
|---|---|---|
| Free | 30 | 6,000 |
| Dev | 100 | 20,000 |
| Prod | 600 | 120,000 |
RPM = requests per minute · TPM = tokens per minute. Limits shown are approximate and may vary by model.
What happens when you hit Groq's limits?
Check the reset window - most limits refresh within 1–60 minutes
Use exponential backoff: 1s → 2s → 4s up to 60s max
If you hit limits regularly, upgrade your plan to increase caps
Groq limit reset schedule
⚡
Per minute
API RPM limits - reset every 60 seconds
🕐
Per hour
Short rolling windows for message quotas
⏱
Per 5 hours
Common for consumer plan message limits
📅
Per day / month
Image gen credits and file storage caps
Exact reset period per limit type is shown in the "Notes" column of the plan table above. Groq uses rolling-window resets - quotas refresh continuously, not at a fixed midnight cutoff.
More Groq intelligence
Live status →
Check if rate limit errors are due to an active outage
Pricing →
Compare Groq plan costs and API token pricing
Free tier →
Compare free limits across all AI tools
Limits sourced from Groq's official documentation. Updated when plan changes are announced.
Groq limits - frequently asked questions
What is the Groq message limit?
Groq message limits vary by plan - see the full breakdown by tier in the table above.
Does Groq have a file upload limit?
Yes, Groq enforces file upload limits that vary by plan. See the detailed breakdown above.
When do Groq limits reset?
Reset periods vary by limit type - many Groq limits reset on a rolling window (e.g., per 5 hours or per 24 hours). Check the notes column in the table above for specific reset schedules.
What happens when you hit Groq's rate limit?
Groq returns HTTP 429 with x-ratelimit-remaining-requests and x-ratelimit-reset-requests headers. Free tier is 30 RPM / 14,400 RPD - upgrade to a paid plan for higher limits.
What is Groq's context window?
Groq's context window is 128000 tokens (~96K words). This is the maximum amount of text - including your conversation history - the model can process in a single request.