Replicate rate limits, context window & usage caps (2026)
Replicate API rate limits and usage caps. Cold start limits, concurrent predictions and credit-based usage caps.
Replicate usage limits by plan
| Predictions | 50 predictions | Free credits for new users |
| Models | Unlimited | Public models only |
| Private | Not available | No private deployments |
| Gpu Access | Not available | Limited GPU on free |
| Predictions | Unlimited | Per-second GPU billing |
| Models | Unlimited | All public + private models |
| Private | Unlimited | Private model deployments |
| Gpu Access | Unlimited | A100, H100, T4 available |
| Predictions | Unlimited | Shared billing |
| Users | Unlimited | |
| Private | Unlimited | |
| Spend Limit | Unlimited | Configurable spend cap |
What happens when you hit Replicate's limits?
Check the reset window - most limits refresh within 1–60 minutes
Reload or try again after the reset window passes
If you hit limits regularly, upgrade your plan to increase caps
Replicate limit reset schedule
⚡
Per minute
API RPM limits - reset every 60 seconds
🕐
Per hour
Short rolling windows for message quotas
⏱
Per 5 hours
Common for consumer plan message limits
📅
Per day / month
Image gen credits and file storage caps
Exact reset period per limit type is shown in the "Notes" column of the plan table above. Replicate uses rolling-window resets - quotas refresh continuously, not at a fixed midnight cutoff.
More Replicate intelligence
Live status →
Check if rate limit errors are due to an active outage
Pricing →
Compare Replicate plan costs and API token pricing
Free tier →
Compare free limits across all AI tools
Limits sourced from Replicate's official documentation. Updated when plan changes are announced.
Replicate limits - frequently asked questions
What is the Replicate message limit?
Replicate message limits vary by plan - see the full breakdown by tier in the table above.
Does Replicate have a file upload limit?
Yes, Replicate enforces file upload limits that vary by plan. See the detailed breakdown above.
When do Replicate limits reset?
Reset periods vary by limit type - many Replicate limits reset on a rolling window (e.g., per 5 hours or per 24 hours). Check the notes column in the table above for specific reset schedules.
What happens when you hit Replicate's rate limit?
Replicate will temporarily block new requests when you exceed your plan's limits. You may see an in-app message or receive an HTTP 429 response. Wait for the reset window to pass or upgrade your plan.