Errors reveal characteristics of the languages we use and the
technologies we use to communicate them -- everything from scripts and
letter forms (which while very fundamental to written communication are
technologies nonetheless) to the computer software we use to create and
communicate text.
I've spent the last few weeks in Japan. In the process, I've learned a
bit about the Japanese language; no small part of this through errors.
Here's one error that taught me quite a lot. The sentence is shown in
Japanese and then followed by a translation into English:
今年から貝が胃に棲み始めました。
This year, a clam started living in my stomach.
Needless to say perhaps, this was an error. It was supposed to say:
今年から海外に住み始めました。
This year, I started living abroad.
When the sentences are translated into romaji (i.e., Japanese
written in an Roman script) the similarity becomes much more clear to
readers that don't understand Japanese:
Kotoshikara kaiga ini sumihajimemashita.
Kotoshikara kaigaini sumihajimemashita.
Kotoshikara means "since this year." Sumihajimemashita means, "has
started living." The word kaigaini means "abroad" or "overseas."
Kaiga ini (two words) means "clam in stomach." When written
phonetically in romaji, the only difference in the two sentences lie in
the introduction of a word-break in the middle of "kaigaini." Written
out in Japanese, the sentences are quite different; even without
understanding, one can see that more than a few of the characters in the
sentences differ.
In English word spacing plays an essential role in making written
language understandable. Japanese, however, is normally written without
spaces between words.
This isn't a problem in Japanese because the Japanese script uses a
combination of logograms -- called kanji -- and
phonetic characters -- called hiragana and
katakana or simply kana -- to delimit words and to
describe structure. The result, to Japanese readers, is unambiguous.
Phonetically and without spaces, the two sentences are identical in
either kana or romaji:
ことしからかいがいにすみはじめました。
Kotoshikarakaigainisumihajimemashita.
In purely phonetic form, the sentence is ambiguous. Using kanji, as
shown in the opening examples, this ambiguity is removed. While
phonetically identical, "kaigaini" (abroad) and "kaiga ini" (clam in
stomach) are very different when kanji is used; they are written
"海外に" and "貝が胃に" respectively and are not easily confusable by
Japanese readers.
This error, and many others like it, stems from the way that Japanese
text is input into computers. Because there are more than 4,000 kanji
in frequent use in Japan, there simply are not enough keys on a keyboard
to input kanji directly. Instead, text in Japanese is input into
computers phonetically (i.e., in kana) without spaces or explicit word
boundaries. Once the kana is input, users then transform the phonetic
representation of their sentence or phrase into a version using the
appropriate kanji logograms. To do so, Japanese computer users employ
special software that contains a database of mappings of kana to kanji.
In the process, this software makes educated guesses about where word
boundaries are. Usually, computers guess correctly. When computers get
it wrong, users need to go back and tweak the conversion by hand or
select from other options in a list. Sometimes, when users are in a
rush, they use an incorrect kana to kanji conversion. It would be
obvious to any Japanese computer users that this is precisely what
happened in the sentence above.
This type of error has few parallels in English but is extremely common
in Japanese writing. The effects, like this one, are often confusing or
hilarious. For a Japanese reader, this error reveals the kana to kanji
mapping system and the computer software that implements it -- nobody
would make such a mistake with a pen and paper. For a person less
familiar with Japanese, the error reveals a number of technical
particularities about the Japanese writing system and, in the process,
about the ways in Japanese differs from other languages they might speak.