OpenAI Whisper API Limits 2026: File Size, Rate Limits, and Workarounds

The OpenAI Whisper API enforces a 25MB file size limit, supports seven audio formats (mp3, mp4, mpeg, mpga, wav, webm, m4a), and charges $0.006 per minute. Rate limits vary by tier, starting at 50 RPM for Tier 1. There's no audio duration cap, so a compressed file can hold hours of speech within the 25MB boundary.
What you'll need:
- An OpenAI API key with billing enabled
- FFmpeg installed (for splitting large audio files)
- Basic Python or Node.js knowledge for API calls
- Time estimate: 15-30 minutes to read and implement
- Skill level: Beginner to intermediate
Quick overview of the key topics covered:
- Complete limits table -- Every Whisper API restriction in one place
- The 25MB file size limit -- What it means and how to work around it
- Supported audio formats -- Technical specs for accepted file types
- Rate limits and pricing -- Tier-based RPM/TPM limits and cost breakdown
- Model parameters -- Whisper's architecture and capabilities
- Free vs. paid usage -- What's actually free and what costs money
- Large file handling -- Production-tested approaches for files over 25MB
- Whisper vs. Azure AI -- Which deployment option fits your use case
OpenAI Whisper API Limits in 2026: Complete Overview
The Whisper API is OpenAI's automatic speech recognition (ASR) endpoint, built on a model trained on 680,000 hours of multilingual audio data from the web. It converts spoken language into written text, and it does it well -- but every API has boundaries you need to know before building production workflows around it.
Here's the full picture of every limit that applies in 2026:
| Parameter | Limit | Source |
|---|---|---|
| File Size | 25 MB maximum per request | OpenAI Speech-to-Text Docs |
| Supported Formats | mp3, mp4, mpeg, mpga, wav, webm, m4a | OpenAI Speech-to-Text Docs |
| Audio Duration | No explicit limit (file size dependent) | OpenAI Community Forum |
| Streaming | Not supported (complete files only) | OpenAI API documentation |
| Model | whisper-1 | OpenAI API documentation |
| Pricing | $0.006 per minute ($0.36/hour) | Brass Transcripts |
| Rate Limits | Tier-based (50-2000+ RPM) | OpenAI Rate Limits Guide |
That table answers the most common question I see developers ask: "What are the actual Whisper API limits?" In my experience building TranscribeTube's transcription pipeline, the 25MB file size cap is the restriction that trips up most teams -- not because it's unreasonable, but because people don't realize it applies to the raw file upload, not the audio duration itself.
What Changed Since 2024?
OpenAI hasn't raised the 25MB ceiling since the Whisper API launched. The rate limit tiers have been adjusted slightly, and pricing remains stable at $0.006/min. The main change is that more developers now know the workarounds (file splitting, compression, alternative providers), so it's easier to build around these constraints.
The 25MB File Size Limit and How to Work Around It
The limit you'll hit most often is the 25MB file size cap. According to the OpenAI developer documentation, "the Transcriptions API only supports files that are less than 25 MB. If you have an audio file that is longer than that, you will need to break it up into chunks of under 25MB."
Here's the thing most guides miss: the 25MB limit is about file size, not audio duration. A 25MB file could be a 3-minute WAV recording or a 2-hour low-bitrate MP3. This distinction matters when you're planning your approach.
Step 1: Check Your File Size and Format
Before doing anything, check what you're working with:
# Check file size
ls -lh your-audio-file.mp3
# Check audio details with FFmpeg
ffprobe -i your-audio-file.mp3 -show_format -show_streams
If your file is under 25MB, you're good -- send it directly. If it's over, you have three options.
Step 2: Compress the Audio First
Often you can avoid splitting altogether by compressing:
# Convert to low-bitrate mono MP3 (usually enough for speech)
ffmpeg -i input.wav -ac 1 -ab 64k -ar 16000 output.mp3
Speech transcription doesn't need CD-quality audio. A 16kHz sample rate with 64kbps bitrate works fine for Whisper and cuts file size way down. I've seen WAV files go from 180MB to 8MB with this approach, without any measurable accuracy loss.
Step 3: Split Large Files Into Chunks
When compression isn't enough, split the file:
# Split into 10-minute chunks (usually under 25MB at reasonable bitrates)
ffmpeg -i large-file.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3
Watch out for: Splitting at arbitrary points can cut words in half. Use silence detection for cleaner splits:
# Split at silent points (minimum 0.5s silence, -30dB threshold)
ffmpeg -i large-file.mp3 -af silencedetect=noise=-30dB:d=0.5 -f null -
Step 4: Transcribe Each Chunk and Merge
Send each chunk to the Whisper API separately, then concatenate the results in order. Here's a Python example:
import openai
from pathlib import Path
client = openai.OpenAI()
chunks = sorted(Path("chunks/").glob("chunk_*.mp3"))
full_transcript = ""
for chunk in chunks:
with open(chunk, "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
full_transcript += transcript.text + " "
print(full_transcript.strip())
Pro tip: After processing thousands of audio files through TranscribeTube, I've found that 10-minute chunks hit the sweet spot between staying under 25MB and maintaining context for accurate transcription. Shorter chunks (under 2 minutes) sometimes produce worse results because Whisper loses contextual cues.
You'll know it's working when: Each chunk returns a JSON response with a text field, and your merged transcript reads naturally without obvious gaps at chunk boundaries.
Common mistakes:
- Not handling overlapping context: When you split mid-sentence, the chunk boundary creates an awkward break. Add a 2-second overlap between chunks and deduplicate during merge.
- Ignoring file format during split: Splitting a WAV file produces even larger chunks. Always compress to MP3 first, then split.
If you'd rather skip the manual splitting process entirely, tools like TranscribeTube's audio to text converter handle large files automatically by managing chunking, parallel processing, and transcript merging behind the scenes.
Supported Audio Formats and Technical Specifications
Whisper accepts seven audio formats. They vary quite a bit in file size efficiency:
| Format | Typical File Size (per hour) | Best Use Case | Compression |
|---|---|---|---|
| mp3 | 30-60 MB | General purpose | Lossy |
| mp4 | 40-80 MB | Video audio tracks | Lossy |
| mpeg | 30-60 MB | Legacy systems | Lossy |
| mpga | 30-60 MB | MPEG audio layer | Lossy |
| wav | 300-600 MB | Uncompressed source | None |
| webm | 20-50 MB | Web recordings | Lossy |
| m4a | 25-50 MB | Apple/mobile | Lossy (AAC) |
Which Format Should You Use?
For the Whisper API specifically, MP3 at 64-128kbps gives you the best balance of quality and size. WAV files burn through the 25MB limit in minutes of audio, while MP3s at speech-optimized bitrates let you pack more content into each upload.
According to n8n's workflow documentation, "Whisper's 25 MB file size limit" translates to roughly 20 minutes of audio at standard quality settings.
Formats NOT supported: FLAC, OGG, AAC (standalone), AIFF, and WMA won't work. Convert them first:
# Convert FLAC to MP3
ffmpeg -i recording.flac -codec:a libmp3lame -b:a 128k recording.mp3
# Convert OGG to MP3
ffmpeg -i recording.ogg -codec:a libmp3lame -b:a 128k recording.mp3
If you're working with audio transcription regularly, keeping your source files in MP3 format from the start prevents these conversion headaches.
OpenAI Whisper Rate Limits, Pricing, and Token Usage
Rate limits control how many requests you can send per minute and how much total audio you can process. According to the OpenAI rate limits documentation, "rate limits are restrictions that our API imposes on the number of times a user or client can access our services within a specified period of time."
Rate Limits by Tier
OpenAI uses a tier system based on your account's spending history:
| Tier | Requests Per Minute (RPM) | Qualification |
|---|---|---|
| Free | 3 RPM | New accounts |
| Tier 1 | 50 RPM | $5+ spent |
| Tier 2 | 100 RPM | $50+ spent |
| Tier 3 | 500 RPM | $100+ spent |
| Tier 4 | 1,000 RPM | $250+ spent |
| Tier 5 | 2,000+ RPM | $1,000+ spent |
Pricing Breakdown
Whisper API pricing is simple. According to Brass Transcripts, the rate is $0.006 per minute of audio, which works out to $0.36 per hour.
Here's what that looks like at scale:
| Monthly Volume | Cost | Cost Per Hour |
|---|---|---|
| 10 hours/month | $3.60 | $0.36 |
| 100 hours/month | $36.00 | $0.36 |
| 1,000 hours/month | $360.00 | $0.36 |
| 10,000 hours/month | $3,600.00 | $0.36 |
There are no volume discounts on the Whisper API. Whether you process 1 hour or 10,000 hours, the per-minute rate stays the same. For high-volume operations, self-hosting the open-source Whisper model or using a managed service like TranscribeTube's audio transcription API can cut costs by a wide margin.
Handling Rate Limit Errors
When you hit a rate limit, you'll get a 429 Too Many Requests response. Handle it with exponential backoff:
import time
import openai
def transcribe_with_retry(file_path, max_retries=5):
client = openai.OpenAI()
for attempt in range(max_retries):
try:
with open(file_path, "rb") as audio_file:
return client.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
except openai.RateLimitError:
wait_time = 2 ** attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
raise Exception("Max retries exceeded")
Pro tip: In our production environment at TranscribeTube, we queue transcription jobs and process them at a controlled rate rather than blasting the API. This approach avoids rate limit errors entirely and costs the same.
How Many Parameters Does OpenAI Whisper Have?
OpenAI trained Whisper in five model sizes, ranging from 39 million to over 1.5 billion parameters. According to Deepgram's technical analysis, "OpenAI offers Whisper in five model sizes, ranging from 39 million to over 1.5 billion parameters. Larger models tend to provide higher accuracy."
Here's the breakdown:
| Model | Parameters | Relative Speed | English-Only | Multilingual |
|---|---|---|---|---|
| tiny | 39M | ~32x | tiny.en | tiny |
| base | 74M | ~16x | base.en | base |
| small | 244M | ~6x | small.en | small |
| medium | 769M | ~2x | medium.en | medium |
| large | 1,550M | 1x | N/A | large-v3 |
The API (whisper-1) uses the large model variant, which gives you the best accuracy. If you self-host, you can choose smaller models for faster processing at the cost of some accuracy.
large-v3-turbo: The Latest Addition
OpenAI released large-v3-turbo, which is much faster than the standard large-v3 while keeping comparable accuracy. It's available for self-hosting but not yet through the API -- the API still runs whisper-1 (based on large-v2/v3).
This matters if you're deciding between the API and self-hosting. The API gives you simplicity but locks you into one model. Self-hosting gives you model flexibility and potentially better performance with large-v3-turbo.
If you're interested in how different models compare in real-world accuracy, check out our breakdown of AI transcription accuracy.
Is OpenAI Whisper Free? Understanding Commercial Usage
This is one of the most common questions, and the answer is: it depends on how you use it.
The open-source Whisper model is free. You can download it from GitHub, run it locally, and process as much audio as your hardware can handle. No API key needed, no per-minute charges, no rate limits. The trade-off is that you need a machine with a decent GPU (at least 4GB VRAM for the small model, 10GB+ for large).
The Whisper API is paid. Every minute of audio processed through the API costs $0.006. There's no free tier for ongoing use -- new accounts get a small credit, but it runs out quickly with any real workload.
Cost Comparison: API vs. Self-Hosted
| Factor | Whisper API | Self-Hosted Whisper |
|---|---|---|
| Setup Cost | $0 | $50-500/month (GPU server) |
| Per-Minute Cost | $0.006 | ~$0 (after hardware) |
| Break-Even Point | N/A | ~140 hours/month |
| Maintenance | None | Updates, GPU management |
| Accuracy | High (large model) | Configurable (any model) |
| Rate Limits | Tier-based | None (limited by hardware) |
For most teams processing under 100 hours per month, the API is cheaper when you factor in server costs and engineering time. Above that threshold, self-hosting starts making financial sense -- but it brings operational complexity.
For a practical walkthrough of setting up local Whisper, see our guide on how to transcribe audio with Whisper.
Best Practices for Handling Large Audio Files in Production
Working within the 25MB limit in production takes some planning. Here's what actually works at scale, based on processing millions of minutes of audio through TranscribeTube.
Pre-Processing Pipeline
Before any file hits the Whisper API, run it through this pipeline:
- Format normalization: Convert to MP3 at 16kHz mono, 64kbps
- Size check: If under 25MB, send directly. If over, proceed to step 3
- Intelligent splitting: Split at silence points using VAD (Voice Activity Detection)
- Parallel transcription: Send all chunks concurrently (respect rate limits)
- Merge and post-process: Concatenate transcripts, fix chunk boundary artifacts
Audio Quality Optimization
Poor audio quality affects transcription accuracy more than any API limit. Before sending files to Whisper:
- Reduce background noise: Use a noise gate or spectral subtraction. FFmpeg's
anlmdnfilter works well for basic denoising. - Normalize audio levels: Inconsistent volume causes Whisper to miss quieter sections. Run
ffmpeg -i input.mp3 -af loudnorm output.mp3. - Remove silence padding: Long silences waste processing time and cost. Trim them.
Error Handling for Production
The OpenAI Community Forum has multiple threads about the "request too large" error appearing even for files seemingly under 25MB. According to one community thread, this typically happens when multipart form encoding adds overhead to the request size. Keep files at 24MB or below to avoid edge cases.
Also monitor for:
- 413 Payload Too Large: File exceeds 25MB limit
- 429 Rate Limited: Too many requests per minute
- 500 Internal Server Error: Retry with exponential backoff
- Timeout errors: Long files may timeout; chunk them shorter
Pro tip: We've found that providing the language parameter when you know the source language (e.g., language="en") improves both speed and accuracy. Without it, Whisper spends the first 30 seconds detecting the language, which is wasted time and money when you already know the answer.
For podcast workflows specifically, you can check our detailed guides on transcribing Spotify podcasts and Apple Podcasts.
Whisper API vs Azure AI: Which Should You Choose in 2026?
If you're hitting Whisper API limits regularly, Azure AI Speech Services is another way to deploy the same Whisper model, with different trade-offs. According to Microsoft's documentation, "the file size limit for the Azure OpenAI Whisper model is 25 MB" -- so the file size limit is the same.
Here's a head-to-head comparison:
| Feature | OpenAI Whisper API | Azure AI Speech (Whisper) |
|---|---|---|
| File Size Limit | 25 MB | 25 MB |
| Batch Transcription | No | Yes |
| Real-Time Streaming | No | Yes (preview) |
| SLA | No formal SLA | 99.9% uptime SLA |
| Data Residency | US-based | Regional deployment |
| Pricing | $0.006/min | Pay-as-you-go (varies by region) |
| HIPAA Compliance | No | Available |
| Model Options | whisper-1 only | Multiple Whisper versions |
When to Choose Azure Over OpenAI Direct
Pick Azure if you need:
- Batch transcription for processing large volumes of pre-recorded files asynchronously
- Data residency requirements (GDPR, data sovereignty)
- Enterprise SLA with guaranteed uptime
- HIPAA compliance for healthcare transcription
Pick OpenAI direct if you need:
- Simpler integration with fewer configuration steps
- Lower barrier to entry for prototyping
- Consistent pricing without regional variations
Some teams reported on Reddit that alternative services support much higher file size limits (up to 600 MB), which could be worth exploring if the 25MB cap is your main bottleneck.
For a broader comparison of speech-to-text options beyond Whisper, see our best speech-to-text API comparison.
Tools Mentioned in This Guide
| Tool | Purpose | Cost | Best For |
|---|---|---|---|
| OpenAI Whisper API | Cloud speech-to-text | $0.006/min | Teams processing < 100 hrs/month |
| FFmpeg | Audio conversion and splitting | Free (open source) | File preparation and compression |
| TranscribeTube | Managed transcription platform | See pricing page | Teams wanting zero-config transcription |
| Azure AI Speech | Enterprise Whisper deployment | Pay-as-you-go | Enterprise with compliance needs |
Frequently Asked Questions About OpenAI Whisper API Limits
What is the Whisper limit in OpenAI?
The Whisper API enforces a 25MB file size limit per request. There's no explicit audio duration limit -- the restriction is purely on file size. A compressed MP3 at 64kbps can hold roughly 50 minutes of audio within 25MB, while an uncompressed WAV would max out in about 2 minutes. Rate limits range from 3 RPM (free tier) to 2,000+ RPM (Tier 5), depending on your account's cumulative spending.
What are the limitations of Whisper AI?
Beyond the 25MB file size limit, Whisper has several practical limitations. It doesn't support real-time streaming -- you must upload complete files. Heavily accented speech and less common languages produce lower accuracy. Background noise degrades results, though Whisper handles moderate noise well thanks to its training on 680,000 hours of diverse audio data. The API also lacks speaker diarization (identifying who said what), which requires additional processing. For that capability, see our guide on AI transcription with speaker identification.
How many parameters does OpenAI Whisper have?
The Whisper model family ranges from 39 million parameters (tiny) to 1.55 billion parameters (large). The API uses the large variant for maximum accuracy. Self-hosted users can choose smaller models for faster processing -- the tiny model runs roughly 32x faster than large, making it suitable for real-time applications where speed matters more than perfect accuracy.
What is the OpenAI Whisper file size limit?
25 MB per upload. This applies to the raw file being sent in the multipart form request. The actual multipart encoding adds a small overhead, so keeping files at 24MB or below is safer. If your file exceeds this limit, compress it (convert to MP3 at 64kbps) or split it into smaller chunks using FFmpeg.
Is OpenAI Whisper free?
The open-source Whisper model is completely free to download and run locally. The Whisper API charges $0.006 per minute of audio processed. New OpenAI accounts receive a small credit, but it depletes quickly with regular use. For teams processing more than 140 hours of audio monthly, self-hosting on a GPU server becomes more cost-effective than the API.
How to transcribe large audio files with Whisper API?
Split the file into chunks under 25MB using FFmpeg: ffmpeg -i large-file.mp3 -f segment -segment_time 600 -c copy chunk_%03d.mp3. Then transcribe each chunk separately through the API and concatenate the results in order. For production environments, use silence-based splitting to avoid cutting words mid-sentence, and process chunks in parallel to reduce total processing time. Alternatively, use a managed service like TranscribeTube that handles large file processing automatically.
Key Takeaways
The Whisper API's limits are straightforward once you know them: 25MB file size, tier-based rate limits, $0.006/min pricing, seven supported audio formats. The 25MB restriction is the one you'll run into most often, and the solution is always some combination of compression and file splitting.
For teams just getting started, the API is the fastest path to accurate transcription. As your volume grows, evaluate self-hosting or managed alternatives that handle large files without manual chunking.
If you're building a transcription workflow and want to skip the infrastructure work, TranscribeTube's audio to text converter handles file size limits, format conversion, and large file splitting automatically -- so you can focus on what you're actually building.