I recently heard that “Bucklame,” apparently a nickname for New Zealand’s largest city Auckland, has its source in a technical error that is dear to my heart. It seems that it stems from the fact that many mobile phones’ predictive text input software will suggest the term “Bucklame” if a user tries to input “Auckland” — the latter of which was apparently not in its list of valid words.
In my initial article on revealing errors, I wrote a little about the technology at the source of this error: Tegic’s (now Nuance‘s) T9 predictive text technology which is a frequent way that users of mobile phones with normal keypad (9-12 keys) can quickly type in text messages with 50+ letters, numbers and symbols. Here is how I described the system:
Tegic’s popular T9 software allows users to type in words by pressing the number associated with each letter of each word in quick succession. T9 uses a database to pick the most likely word that maps to that sequence of numbers. While the system allows for quick input of words and phrases on a phone keypad, it also allows for the creation of new types of errors. A user trying to type me might accidentally write of because both words are mapped to the combination of 6 and 3 and because of is a more common word in English. T9 might confuse snow and pony while no human, and no other input method, would.
Mappings of number-sequences to words are based on database that offers words in order of relative frequency. These word frequency lists are based on a corpus of text in the target language pre-programmed into the phone. These corpora, at least initially, were not based on the words people use to communicate using SMS but one a more readily available data source (e.g., in emails or memos of in fiction). This leads to problems common to many systems that built on shaky probabilistic models: what is likely in one context may not be as likely in another. For example, while “but” is an extremely common English word, it might be much less common in SMS where more complex sentence structures are often eschewed due to economy of space (160 character messages) and laborious data-entry. The word “pony” might be more common than “snow” in some situations but it’s certainly not in my usage!
Of course, proper nouns, of which there are many, are often excluded from these systems as well. Since the T9 system does not “know” the word “Auckland”, the nonsensical compound-word “bucklame” seems to be an appropriate mapping for the same number-sequence. Apparently, people liked the error so much they kept using it and, with time perhaps, it stops being an error at all.
As users move to systems with keyboards like Blackberries, Treos, Sidekicks, and iPhones (which use a dual-mode system) these errors become impossible. As a result, the presence of these types of errors (e.g., a swapped “me” and “of”) can tell communicators quite a lot about the type of device they are communicating with.
From my own experiences, I can say that some of the modern phones have their own frequency databases, and are created from your usage of words. Let’s use your example of ‘of’/’me’. If the first try is ‘of’ and I change it to ‘me’, the next time I write, it is automatically ‘me’.
And also the 160 character limit is obsolete. All modern phones can send combined messages, length varying from 3 combined messages up to several thousand characters.
Maybe T9 shouldn’t be used for critical communication, e.g. telling police officers whom to arrest. Aside from the Buttle-Tuttle confusion in ‘Brazil’ I know of no such problems, however.
Once a friend sent me an SMS form a workshop or so: “Hier sind alle total voll.” (German, “Everybody here is totally drunk.”). I was a little bit irritated at first. However, what she obviously wanted to write was “toll” instead of “voll”, meaning “Everybody here is absolutely great.”
Thanks Mats for your comment! I realize that these systems “learn” based on the usage. This allows them both to pick up new words and to adjust their idea of word frequencies by weighting them heavily toward the person using the phone. I probably should have mentioned this in the article. Thanks for pointing it out!
I also reliaze that most phones will split messages (mine does this) but they then arrive as separate messages — sometimes even out of order!
I can only speak to my experience on a eighteen-month-old phone in the United States. Perhaps things are more advanced elsewhere. :)
Great example Raphael! That’s very funny.
I also thought about this issue recently, in connection with the rise of microblogging via text message / SMS (read: twitter). I found myself occasionally sending tweets with “mistakes” that were due to the predictive text entry choosing a word other than the one I intended, sometimes errors that were as small as one character.
I thought these should be called Twypos. :)
On my trip back to China a few years ago, my cousin was teaching me about Chinese internet slang and a lot of it was based on this sort of thing. You see, most people in China type Chinese characters by typing in the pinyin and then selecting one of the characters that pop up by a number. Fast-typing teens in China often just pick the first one whether it’s right or not and the result is visually nonsensical but audibly sort of correct.
Incidentally, I propose the word “homotexts” to refer to two words that have the same cell phone input.
Christina: Methinks it should be “homotxts”
In the Dutch language the sentence:
“Ik hou van jou” (I love you) becomes
“Ik hou van lot” (I love lot)…
So thanks to T9 you tell your girlfriend that you love another girl named ‘Lot’ instead of her…
Another one: The Dutch word for kiss is “kus”, which T9 will interpret as “jus”, which means you want to give her gravy instead of a kiss…
Check this, it’s hilarious:
http://digg.com/comedy/What_Life_Would_Be_Like_Without_Text_Swearing
Thanks anonymous! The video really is very funny. I mentioned it here once, and have even incorporated it into my talks!