Keep this tab open — you'll be redirected to your transcript.
MP3WAVM4AFLACOGGAACWMAOpus
Every common audio format, one transcriber
This page accepts the audio formats people actually have: MP3 and AAC from podcasts and downloads, WAV and AIFF from recording gear, M4A from iPhone and Android voice memo apps, FLAC from archival rips, OGG and Opus from messaging apps, and WMA from older Windows recorders. There is no separate converter step — the transcriber decodes the file itself, so you never need to convert WAV to MP3 first.
Language is auto-detected across 90+ languages, or you can pin it in the options if your recording mixes languages and detection guesses wrong.
From recording to usable text
Once processed, the transcript opens in an editor with per-sentence timestamps. Click any line to correct it, search for a phrase, then export: TXT for documents and notes, SRT or VTT if the audio belongs under a video. Signed-in users also get DOCX and PDF for sharing polished documents.
Frequently asked questions
Which audio formats are supported?
MP3, WAV, M4A, FLAC, OGG, AAC, WMA, Opus, AIFF, and AMR. If your recorder produces something rarer, converting to WAV or MP3 first with any free converter will always work — but try the original first; the decoder covers most containers.
Will noisy or low-quality recordings work?
Usually, with caveats. The model is trained on a lot of imperfect real-world audio, so hiss, room echo, and compression artifacts rarely break it. What genuinely hurts accuracy is speech that is quiet relative to the noise — a recorder far from the speaker, or music louder than the voice. If you can, record closer to the source; the editor makes fixing the remaining errors quick.
Can it handle multiple speakers on one track?
Yes — everyone's words are transcribed regardless of how many people speak. On the free tier the transcript doesn't say who said what; speaker labels (diarization) are a Pro feature that tags each segment SPEAKER 1, SPEAKER 2, etc., and lets you rename them in the editor.
Do I get timestamps?
Yes. Every segment carries start and end times. In the editor you can toggle timestamp display on or off, and the SRT/VTT exports embed them in standard subtitle timing format. The TXT export is clean prose without timestamps.
Can I edit the transcript before exporting?
Yes — that's the default workflow. The transcript opens in an editor where each segment is click-to-edit with autosave. Fix names, jargon, or misheard words, then export; your edits are included in every format.
How long can my audio file be?
Anonymous: 30 minutes / 100 MB per file, 3 files a day. Free account: 1 hour / 500 MB, 5 files a day. Pro or a credit pack: up to 10 hours and 5 GB per file. Duration is what matters — a long, small MP3 hits the cap before a short, huge WAV does.