Tag Archives: Japanese

How the Japanese communicate, part 1: Alphabet

This is the first of a few articles introducing the way Japanese people communicate with each other. They won’t teach you how to read, write or speak Japanese but they should hopefully be enlightening for people with no previous knowledge. Let’s kick off by looking at the alphabets (yes, plural) that Japanese people use.

Just a bunch of squiggles

Sumo flagsThe thing that attracted me to Japanese was not manga, anime or robots like many others. It was sumo wrestling and squiggles. On first sight, Japanese looked liked random squiggles but the thought of being able to glance at it and actually know what’s written seemed so cool. I’d be like a spy cracking a secret code! Even now, many years later, I still get a thrill from seeing a bunch of squiggles and understanding (most of) it.

Characters all the way down

It’s commonly said that Japanese has thousands of characters. Yep. But that’s only in one alphabet. There are two further alphabets with much fewer characters. Here’s how they break down:

Alphabet name No. of characters Style Example: kuruma (car)
Hiragana 48 curvy くるま
Katakana 48 angular クルマ
Kanji Up to 50,000 but 2,000-3,000 used daily fancy

Let’s look at these one by one.


Hiragana charactersJapan is still a pretty sexist country but 1,300 years ago it was much worse. Women were only allowed a lower level of education than men so consequently didn’t use the thousands of Chinese characters (kanji) that men did. Instead, they started using a cursive, simplified form of a handful of kanji to represent sounds and ignored their meaning. For instance, to write kumo (cloud), they could have used derivations of 久 (long time) and 毛 (hair, fur) which, over time, became the hiragana く (ku) and も (mo). This spread from ladies of the imperial court so that eventually men also used hiragana for unofficial writing. It’s now the foundation for reading and writing in modern Japan so if you only learn one Japanese alphabet, make it hiragana.


Collectively known as kana, hiragana and katakana are phonetic alphabets and it’s no coincidence that they both have exactly the same number of characters (48) — they each have the same pronunciation and sometimes even look similar. For example the hiragana か (ka) is ヵ (ka) in katakana. This can be confusing but katakana are easy to spot because they look very angular whereas hiragana have curves. But why have two similar alphabets at all? Good question. Katakana developed at about the same time as hiragana but are now used when writing foreign words primarily from Western cultures, such as ロケット (roh-ke-tto) for rocket. They are also used for onomatopoeia and for adding emphasis or a “cool” factor. For example, the word for fashionable is オシャレ (o-sha-re) which, despite being an originally Japanese word, is often written in katakana to seem, well, fashionable.


The two characters for the word kanji (漢字) mean “Chinese character” and that is indeed where they came from. Korean also uses various Chinese characters but because of a decree by Mao Zedong in the 1950s to simplify characters to improve literacy in China, modern Chinese characters (except in places such as Taiwan and Hong Kong) can be quite different to those found in Korean and Japanese.

Pronunciation is also a little complex. Some kanji retain their sound from Chinese, e.g. three (三) is pronounced san both in Mandarin Chinese and Japanese. Some have evolved slightly over the centuries, e.g. famous (有名) is you-ming in Mandarin Chinese and yuu-mei in Japanese. In other cases there seems to be no similarity. And as if that’s not enough, each kanji has at least two correct ways of pronouncing it in Japanese. More about that in a later article.

But despite these challenges and having thousands of them to learn, for me, kanji are the most beautiful aspect of the Japanese language both in their form and their logic. They range from very simple to very complex and can be works of art in themselves. Some are figurative, for example the character for tree (木) looks sort of like a tree, and some are clearly abstract. Characters can be combined to make a more complex character and can be chained together, most commonly in pairs, to make words. Let’s see a quick example of this.

  • The kanji for sun is 日 (pronounced hi, ka, nichi or ni).
  • We’ve already met the kanji for tree (木) but if we add a small line at the bottom to make 本 it now means root or origin (or book, incidentally).
  • If we chain these two kanji together we get 日本 (ni-hon) meaning origin of the sun, or land of the rising sun, i.e. Japan.
  • Adding the delightfully simple kanji for person (人) to the end gives us 日本人 (ni-hon-jin), meaning Japanese person.

Isn’t that beautiful?


Is there an “alphabetical order” for hiragana and katakana?
Yes. The hiragana/katakana order sounds like this: a i u e o, ka ki ku ke ko, sa shi su se so, etc. Notice a pattern?
Is there an “alphabetical order” for kanji?
Not really. They can be ordered by their number of strokes (how many lines each one has) and there’s also a fixed order that children learn them in at school.
Speaking of which, at what age do children learn the alphabets?
Hiragana and katakana are learnt first at around age five or six. Children then spend several years learning the kanji — 1,006 in elementary school and another 1,130 in junior high and high school.
How do children learn them all? Are they geniuses?
Hah, they wish! They have to sit and memorise and test and moan just like kids all over the world.
Is there an ABC song equivalent?
Unfortunately not. Feel free to write one.


Photo of sumo flags by David Steadman: flickr.com/photos/90949166@N00/4274125768/
Photo of hiragana by Antonio González Tajuelo: flickr.com/photos/antoniotajuelo/4782563135

How to show Japanese text in Evince

A quick tip that might help others (or me, the next time I forget)…

I found that Evince, Ubuntu’s default PDF viewer, doesn’t display Japanese characters, at least on my non-Japanese system. After a quick search it seems the answer is nice and simple – install the poppler-data package.

So, either use the Synaptic Package Manager or the following terminal command:

sudo apt-get install poppler-data

And that’s it!

This should work for displaying Chinese and Korean characters as well.

The HTML5 <ruby> element in words of one syllable or less

Opera colleague Bruce Lawson thought it might be spiffing if the description of the <ruby> element that appears in the HTML5 spec was clarified a bit, so here’s my attempt. I’m using Japanese as an example although it applies to Chinese and possibly other languages as well. Please note my definition of one syllable may differ from yours.

Step 1: The Japanese Writing System

In Japanese there are three alphabets—one semantic (thousands of characters) and two phonetic (roughly 50 each).

The semantic alphabet is called kanji, based on traditional Chinese characters, and each character has a meaning (although sometimes quite abstract). When you read words written with kanji you may understand their meaning but there are no clues as to their pronunciation.

日 = sun
本 = origin
日本 = land of the rising sun = Japan

The first phonetic alphabet is hiragana. This comes from highly stylised Chinese characters developed around the 5th century when only men were educated enough (or deemed intelligent enough) to use Chinese characters (kanji). Literate ladies of the ruling class used the easy-to-write hiragana, each representing one syllable and having no meaning, to write letters, poetry and novels (the original chick lit). When you read words written with hiragana you can pronounce them but you may not know their meaning.

に = ni
ほ = ho
ん = n
にほん = nihon = Japan

The second phonetic alphabet is katakana. This apparently was developed by monks also using Chinese characters as a basis for a highly simplified alphabet. Whereas hiragana characters are more rounded, katakana characters are more sharp and angular. They are used primarily for foreign words that don’t have a Japanese translation (e.g. “browser”) or for making a word stand out or appear modern. Like hiragana, when you read words written with katakana you can pronounce them but you may not know their meaning.

二 = ni
ホ = ho
ン = n
二ホン = nihon = Japan

Any piece of Japanese text (banner ad, article, legal doc, etc.) uses a combination of kanji, hiragana and katakana. It is sometimes the case that people reading the text can’t read the kanji, especially because kanji characters can have more than one pronunciation. People and place names are one example of kanji having numerous or irregular pronunciations.

日 = can be pronounced "nichi", "hi" or "ka"
本 = can be pronounced "hon" or "moto"
日本 = can be pronounced "nihon" or "nippon" = Japan

Step 2: What does this have to do with a pink gemstone?

To help the reader, sometimes the pronunciation is written above the kanji using the hiragana alphabet. This is called furigana in Japanese and ruby in English (from the name of small type with a height of 5.5 points). It is often used in newspapers and books but not so much on websites, due to the difficulty of squeezing miniature text above larger text on a single line. The <ruby> element aims to solve this.

Step 3: Tell us how it works

According to the current HTML5 spec, the <ruby> element is an inline element and is placed around the word or character you’d like to clarify, like so:


By itself this does nothing, so we add the pronunciation either for each character or, as in this case and my personal preference, for the word as a whole. For this, we use the <rt> tag, meaning ruby text.


We could leave it like that and supporting browsers would show the hiragana pronunciation above the kanji text, but non-supporting browsers would ignore the tags and show both the text and its pronunciation side-by-side. To solve this, the masters of the HTML5 universe have given us another tag, <rp> meaning ruby parentheses, which cleverly hides characters (namely parentheses) in supporting browsers. This means we can write the pronunciation in parentheses which non-supporting browsers will show, and supporting browsers will continue to show the pronunciation without parentheses above the main text.


Step 4: Say what?

  • Supporting browsers → ruby text is shown above main text
  • Non-supporting browsers → ruby text is shown next to main text but in parentheses.

Text using the HTML5 ruby element.

Overall, the <ruby> element is a great help not just for learners of Japanese (or Chinese, etc.) but also for when uncommon characters are used in a piece of text. It could also be used in any language simply for clarifying a term or unfamiliar concept in an unobtrusive and accessible manner.

For reference:

HTML5 Doctor Oli Studholme has written a much more thorough <ruby> explanation with examples of usage in other languages.

Also, see this page for an example of how hiragana.jp uses the <ruby> element (and CSS for non-supporting browsers) to provide ruby text above all kanji words. Perfect for people learning to read Japanese. Note that they also use the <rb> tag to markup the kanji words but at the time of writing this is not officially part of HTML5.

How to input Japanese in Opera on Linux

UPDATE: As of 2010, Ubuntu and Fedora both use iBus and I must say this is a great addition to the IME landscape. As Jacob7908 suggested in the comments, this is my new recommendation.
Like many Linux users I’ve never really had much luck with SCIM and Opera but I’m pleased to say I’ve finally found a solution: use UIM.

I’ve seen suggestions to install scim-bridge-qt (most promising), uninstall scim-bridge and start a second instance of SCIM just for Opera. None of these worked for me but I found a suggestion to use UIM instead which did the trick. Here’s what you do:

1. Install Anthy and UIM

2. Add this to the top of your .xinitrc file or equivalent (e.g. .xsession) in your home directory.

export GTK_IM_MODULE="uim"
export QT_IM_MODULE="uim"
uim-xim &
export XMODIFIERS=@im="uim"
uim-toolbar-gtk &

The optional last line launches a toolbar which you right-click to edit preferences, for example your key-bindings.

3. Logout and login again, launch Opera and enjoy.

If you have a Japanese keyboard you may need to add this line to the top of the .xinitrc file to make UIM detect the Hankaku-Zenkaku key on the left.

This works at least on Arch Linux. Please leave feedback if it doesn’t work for your distro or if you have other suggestions.