Japanese Text Converter

Overview

This Japanese text conversion tool takes input containing kanji, hiragana, or katakana and converts it to hiragana, katakana, or romaji — output that the Kuroshiro + Kuromoji morphological engine produces on the server. Four conversion modes control how much pronunciation information is added alongside or in place of the original characters.

What each conversion mode actually outputs

Normal replaces every character directly. 東京に行きますとうきょうにいきます (hiragana target). No spaces, no original kanji retained.

Segmented inserts spaces at word boundaries. The same input becomes とうきょう に いきます, which is useful when you need to see where one word ends and the next begins.

Okurigana keeps the kanji but annotates only the inflectional suffix in parentheses. 食べます食(た)べます. The kanji itself stays; only the part that changes with conjugation gets the reading.

Furigana annotates every kanji with its full reading in parentheses. 東京に行きます東京(とうきょう)に行(い)きます. This is what textbook ruby annotation looks like in plain text.

Choosing a romaji system

When the target is romaji, three spelling standards are available:

Hepburn (most common)

  • Closest to English phonics
  • ちょっと → chotto, 新幹線 → shinkansen
  • Used in passports until 2000, still standard for place names and textbooks internationally

Nippon / Passport

  • Nippon: maps strictly to the 50-sound chart — ちょっと → tyotto, さ行 → sa/si/su/se/so
  • Passport: current Japanese government standard — handles long vowels differently, e.g. 大野 → Ohno not Oono
  • Nippon-style is rarely encountered outside academic linguistics

Okurigana and furigana modes combined with romaji output produce limited results — the parenthetical readings are romanized but the root kanji stays as kanji. Hiragana is the more useful target when annotating.

When kanji readings are wrong

The engine uses the Kuromoji dictionary for morphological analysis. Common vocabulary and standard grammar patterns parse correctly. Errors cluster around:

  • Homographs (今日 = きょう in speech but こんにち in 今日は): context usually resolves these, but short inputs give less context
  • Proper nouns — place names and personal names often have non-standard readings the dictionary doesn't cover
  • Neologisms and internet vocabulary — recent coinages may be missing from the dictionary

If a reading looks wrong, switching to segmented mode lets you see the word boundaries the engine chose, which often reveals where the misparse happened.

Getting clean output for different purposes

  • Input method practice: use normal mode → hiragana. Short sentences work better than paragraph-length blocks.
  • Textbook annotation: furigana mode → hiragana gives the ruby-style markup. Longer texts produce more context, which improves homograph accuracy.
  • Non-Japanese readers: normal mode → romaji (Hepburn). The segmented mode adds spaces that make the word boundaries clearer to read.
  • Checking a specific kanji reading: paste just that word, not the full sentence. Fewer surrounding tokens means the morphological analysis focuses on that unit.