A better way to input Vietnamese

Earlier this year I had the pleasure of implementing for Firefox OS an input method for Vietnamese (a language I have some familiarity with). After being dissatisfied with the Vietnamese input methods on other smartphones, I was eager to do something better.

I believe Firefox OS is now the easiest smartphone on the market for out-of-the-box typing of Vietnamese.

The Challenge of Vietnamese

Vietnamese uses the Latin alphabet, much like English, but it has an additional 7 letters with diacritics (Ă, Â, Đ, Ê, Ô, Ơ, Ư). In addition, each word can carry one of five tone marks. The combination of diacritics and tone marks means that the character set required for Vietnamese gets quite large. For example, there are 18 different Os (O, Ô, Ơ, Ò, Ồ, Ờ, Ỏ, Ổ, Ở, Õ, Ỗ, Ỡ, Ó, Ố, Ớ, Ọ, Ộ, Ợ). The letters F, J, W, and Z are unused. The language is (orthographically, at least) monosyllabic, so each syllable is written as a separate word.

This makes entering Vietnamese a little more difficult than most other Latin-based languages. Whereas languages like French benefit from dictionary lookup, where the user can type A-R-R-E-T-E and the system can from prompt for the options ARRÊTE or ARRÊTÉ, that is much less useful for Vietnamese, where the letters D-O can correspond to one of 25 different Vietnamese words (do, , , , dỗ, , dở, dỡ, dợ, đo, đò, đỏ, đó, đọ, đô, đồ, đổ, đỗ, đố, độ, đơ, đờ, đỡ, đớ, or đợ).

Other smartphone platforms have not dealt with this situation well. If you’ve tried to enter Vietnamese text on an iPhone, you’ll know how difficult it is. The user has two options. One is to use the Telex input method, which involves memorizing an arbitrary mapping of letters to tone marks. (It was originally designed as an encoding for sending Vietnamese messages over the Telex telegraph network.) It is user-unfriendly in the extreme, and not discoverable. The other option is to hold down a letter key to see variants with diacritics and tone marks. For example, you can hold down A for a second and then scroll through the 18 different As that appear. You do that every time you need to type a vowel, which is painfully slow.

Fortunately, this is not an intractable problem. In fact, it’s an opportunity to do better. (I can only assume that the sorry state of Vietnamese input on the iPhone speaks to a lack of concern about Vietnamese inside Apple’s hallowed walls, which is unfortunate because it’s not like there’s a shortage of Vietnamese people in San José.)

Crafting a Solution

To some degree, this was already a solved problem. Back in the days of typewriters, there was a Vietnamese layout called AĐERTY. It was based on the French AZERTY, but it moved the F, J, W, and Z keys to the periphery and added keys for Ă, Đ, Ơ, and Ư. It also had five dead keys. The dead keys contained:

  • a circumflex diacritic for typing the remaining letters (Â, Ê, and Ô);
  • the five tone marks; and
  • four glyphs each representing the kerned combination of the circumflex diacritic with a tone mark, needed where the two marks would otherwise overlap

Photo of a Vietnamese typewriter

My plan was to make a smartphone version of this typewriter. Already it would be an improvement over the iPhone. But since this is the computer age, there were more improvements I could make.

Firstly, I omitted F, J, W, and Z completely. If the user needs to type them — for a foreign word, perhaps — they can switch layouts to French. (Gaia will automatically switch to a different keyboard if you need to type a web address.) And obviously I could omit the glyphs that represent kerned pairs of diacritic & tone marks, since kerning is no longer a mechanical process.

The biggest change I made is that, rather than having keys for the five tone marks, words with tones appear as candidates after typing the letters. This has numerous benefits. It eliminates five weird-looking keys from the keyboard. It eliminates confusion about when to type the tone mark. (Tone marks are visually positioned in the middle of the word, but when writing Vietnamese by hand, tone marks are usually added last after writing the rest of the word.) It also saves a keystroke too, since we can automatically insert a space after the user selects the candidate. (For a word without a tone mark, the user can just hit the space bar. Think of the space bar as meaning “no tone”.)

This left just 26 letter keys plus one key for the circumflex diacritic. Firefox OS’s existing AZERTY layout had 26 letter keys plus one key for the apostrophe, so I put the circumflex where the apostrophe was. (The apostrophe is unused in Vietnamese.)

Screenshot of Vietnamese input method in use

In order to generate the tone candidates, I had to detect when the user had typed a valid Vietnamese syllable, because I didn’t want to display bizarre-looking nonsense as a candidate. Vietnamese has rules for what constitutes a valid syllable, based on phonotactics. And although the spelling isn’t purely phonetic (in particular, it inherits some peculiarities from Portuguese), it follows strict rules. This was the hardest part of writing the input method. I had to do some research about Vietnamese phonotactics and orthography. A good chunk of my code is dedicated to encoding these rules.

Knowing about the limited set of valid Vietnamese syllables, I was able to add some convenience to the input method. For example, if the user types V-I-E, a circumflex automatically appears on E because VIÊ is a valid sequence of letters in Vietnamese while VIE is not. If the user types T to complete the partial word VIÊT, only two tone candidates appear (VIẾT and VIỆT), because the other three tone marks can’t appear on a word ending with T.

Using it yourself

You can try the keyboard for yourself at Timothy Guan‑tin Chien’s website.

The keyboard is not included in the default Gaia builds. To include it, add the following line to the Makefile in the gaia root directory:


The code is open source. Please steal it for your own open source project.