This page is pinned to Mandarin (zh) — 普通话/国语 as spoken in mainland China, Taiwan, Singapore, and Malaysia. Standard Mandarin transcribes at a high tier, including accented Mandarin from southern speakers.
Cantonese is NOT a Mandarin dialect for speech-recognition purposes — it's a separate language with its own code (yue), its own vocabulary, and its own written form. A Cantonese recording pushed through this page will produce garbage-to-mediocre Mandarin text. We deliberately don't index a Cantonese page yet because output quality doesn't meet our bar; for Cantonese audio, expect to look at dedicated Cantonese tooling.
Simplified output, punctuation, and mixed speech
Transcripts are produced in simplified characters by default (the dominant training convention) with full-width Chinese punctuation (,。?) inserted — Chinese is written without spaces and the output follows that. Taiwanese Mandarin speakers should note the transcript will typically be simplified, not traditional; converting script is a lossless one-click step in many tools. English words and brand names inside Mandarin speech are transcribed inline in Latin script.
Frequently asked questions
Does this work for Cantonese?
No — and it's important to know why: Cantonese (yue) is a different language from Mandarin (zh), not an accent. This page pins Mandarin. Cantonese audio here will come out badly. The underlying model has some Cantonese ability under its separate yue code, but we don't consider it good enough to ship a page for yet.
Simplified or traditional characters in the output?
Simplified, as a rule — that's the dominant convention in the model's Mandarin training data, even for Taiwanese speakers. If you need traditional characters, run the exported TXT through any simplified→traditional converter; the conversion is essentially lossless in that direction.
How accurate is Mandarin transcription?
High tier — clear Mandarin (broadcast, meetings, lectures) transcribes very reliably, including tone-dependent disambiguation, because the language model picks characters from context. Strong regional accents (Sichuan-flavored Mandarin, Taiwanese-accented Mandarin) still work; actual regional languages (Shanghainese, Hokkien) do not.
How does punctuation and spacing work in the transcript?
Chinese is written without spaces, and the transcript follows suit, with full-width punctuation (,。!?、) inserted at natural boundaries. Segment/SRT cue splits land on phrase boundaries, so subtitles read naturally rather than breaking mid-word.
Mandarin audio to English text — possible?
This page transcribes Mandarin into Chinese text; that's the high-accuracy step. For English, export TXT and machine-translate — zh→en translation on clean text is strong, and the two-step result beats one-shot speech translation. Built-in transcript translation is planned.
What about English words mixed into Mandarin speech?
Very common in tech and business Mandarin, and handled well: English terms, acronyms, and brand names are transcribed inline in Latin script (e.g. 我们用 Kubernetes 部署). Long English-only passages transcribe better via the auto-detect uploader on the home page.