Sıkısınca

Last week saw the popularization of some older news about a misunderstanding, prompted by an error caused by technological limitations of mobile phones, that resulted in two deaths and three imprisonments. The whole sad story took place in Turkey. You can read the original story in the Turkish language Hürriyet.

Basically, Emine and her husband Ramazan Çalçoban has recently been separated and were feuding daily on their mobile phones and over SMS text messages. At one point, Ramazan sent a message saying, “you change the subject every time you get backed into a corner.” The word for “backed into a corner” is sıkısınca. Notice the lack of dots on the i’s in the word. The very similar sikisince — spelled with dots — means “getting fucked.” Ramazan’s mobile phone could not produce the “closed” dotless ı so it he wrote the word with dots and sent it anyway. Reading quickly, Emine misinterpreted the message thinking that Ramazan was saying, “you change the subject every time they are fucking you.” Emine showed the message to her father and sisters who, outraged that Ramazan was calling Emine a whore, attacked Ramazan with knifes when he showed up at the house later. In the fight, Ramazan stabbed Emine back and she later died of bleeding. Ramazan committed suicide in jail and Emine’s father and sisters were all arrested.

This is certainly the gravest example of a revealing error I’ve looked at yet and it stands as an example of the degree to which tiny technological constraints can have profound unanticipated consequences. In this case, the lack of technological support for characters used in Turkish resulted in the creation of text that was deeply, even fatally, ambiguous.

Of course, many messages sent with SMS, email, or chat systems are ambiguous. Emoticons are an example of a tool that society has created to disambiguate phrases in text-based chatting and their popularity can tell us a lot about what purely word-based chatting fails to convey easily. For example, a particular emoticon might be employed to help convey sarcasm in a statement that would have been obvious through tone of voice. One can think of verbal communication as happening over many channels (e.g., voice, facial expressions, posture, words, etc). Text-based communication technologies provide certain new channels that may be good at conveying certain types of messages but not others. Emoticons, and accents or diacritical marks for that matter, are an attempt to concisely provide missing information that might be taken for granted in spoken conversation.

Any communication technology conveys certain things better than others. Each provides a set of channels that convey some types of messages but not others. The result of a shift toward many technologies is lost channels and an increase in ambiguity.

In spoken Turkish, the open and closed i sounds are easily distinguishable. In written communication, however, things become more difficult. Some writing system are better at conveying these tonal differences. Hebrew, for example, historically contained no vowels at all! And yet, the consequence of not conveying these differences can be profound. As a result, Turkish speakers frequently use diacritics and the open and closed i notation to disambiguate phrases like the one at the center of this saga. Unfortunately the open and closed i technology is not always available to communicators. Notably, it was not available on Ramazan’s mobile phone.

People in Turkey have ways of coping with the lack of accents and diacritical marks. For example, some people would choose to write sıkısınca as SIKISINCA because the capital I in the Roman alphabet has no dot. Emoticons are similar in that they are created by users to work around limitations of the system to convey certain messages and to disambiguate others. In these ways and others, users of technologies find creative ways of working with and around the limitations and affordances imposed on then.

With time though, the users of emoticons and all-caps Turkish words stop seeing and thinking about the limitations that these tactics expose in their technology. In fact, it is only through errors that these limitations become familiar again. While we cannot undo the damage done by Ramazan, Emine and her family, we can “learn from their errors” and reflect on the ways that the limits imposed by our communication technology frames and affects our communications and our lives.

Wordlists and Profanity

Revealing errors are a way of looking at the fact that a technology’s failure to deliver a message can tell us a lot. In this way, there’s an intriguing analogy one can draw between revealing errors and censorship.

Censorship doesn’t usually keep people from saying or writing something — it just keeps them from communicating it. When censorship is effective, however, an audience doesn’t realize that any speech ever occurred or that any censorship has happened — they simply don’t know something and, more importantly perhaps, don’t know that they don’t know. As with invisible technologies, a censored community might never realize their information and interaction with the world is being shaped by someone else’s design.

I once was in an cafe with a large SMS/text message “board.” Patrons could send an SMS to a particular number and it would be displayed on a flat-panel television mounted on the wall that everyone in the restaurant could read. I tested to see if there was a content filter and, sure enough, any message that contained a four-letter word was silently dropped; it simply never showed up on the screen. As the censored party, the failure of my message to show up on the board revealed a censor. Further testing and my success in posting messages with creatively spelled profanity, numbers instead of letters, and the construction of crude ASCII drawings revealed the censor as a piece of software with a blacklist of terms; no human charged with blocking profanity would have allowed “sh1t” through. Through the whole process, the other patrons in the cafe, remained none-the-wiser; they never realized that the blocked messages had been sent.

This desire to create barriers to profanity is widespread in communication technologies. For example, consider the number of times have you been prompted by a spellchecker to review and “fix” a swear word. Offensive as they may be, “fuck” and “shit” are correctly spelled English words. It seems highly unlikely that they were excluded from the spell-checker’s wordlist because the compiler forgot them. They were excluded, quite simply, because their were deemed obscene or inappropriate. While intentional, these words’ omission results in the false identification of all cursing as misspelling — errors we’ve grown so accustomed to that they hardly seem like errors at all!

Now, unlike a book or website which more impressionable children might read, nobody can be expected to find a four-letter word while reading their spell-checking wordlist. These words are not included simply because our spell-checker makers think we shouldn’t use them. The result is that every user who writes a four-letter-word must add that word, by hand, to their “personal” dictionary — they must take explicit credit for using the term. The hope, perhaps, is that we’ll be reminded to use a different, more acceptable word. Every time this happens, the paternalism of the wordlist compiler is revealed.

Connecting back to my recent post on predictive text, here’s a very funny video of Armstrong and Miller lampooning the omission of four-letter words from predictive text databases that make it more difficult to input profanity onto mobile phones (e.g., are you sure you did not mean “shiv” and “ducking”?). You can also or download the video in OGG Theora if you have trouble watching it in Flash.

There’s a great line in there: “Our job … is to offer people not the words that they do use but the words that they should use.”

Most of the errors described on this blog reveal the design of technical systems. While the errors in this case do not stem from technical decisions, they reveal a set of equally human choices. Perhaps more interestingly, the errors themselves are fully intended! The goal of swear-word omission is, in part, the moment of reflection that a revealing error introduces. In that moment, the censors hope, we might reflect on the “problems” in our coarse choice of language and consider communicating differently.

These technologies don’t keep us from swearing any more than other technology designers can control our actions — we usually have the option of using or designing different technologies. But every technology offers affordances that make certain things easier and others more difficult. This may or not be intended but it’s always important. Through errors like those made by our prudish spell-checker and predictive text input systems, some of these affordances, and their sources, are revealed.

Bucklame and Predictive Text Input

I recently heard that “Bucklame,” apparently a nickname for New Zealand’s largest city Auckland, has its source in a technical error that is dear to my heart. It seems that it stems from the fact that many mobile phones’ predictive text input software will suggest the term “Bucklame” if a user tries to input “Auckland” — the latter of which was apparently not in its list of valid words.

In my initial article on revealing errors, I wrote a little about the technology at the source of this error: Tegic’s (now Nuance‘s) T9 predictive text technology which is a frequent way that users of mobile phones with normal keypad (9-12 keys) can quickly type in text messages with 50+ letters, numbers and symbols. Here is how I described the system:

Tegic’s popular T9 software allows users to type in words by pressing the number associated with each letter of each word in quick succession. T9 uses a database to pick the most likely word that maps to that sequence of numbers. While the system allows for quick input of words and phrases on a phone keypad, it also allows for the creation of new types of errors. A user trying to type me might accidentally write of because both words are mapped to the combination of 6 and 3 and because of is a more common word in English. T9 might confuse snow and pony while no human, and no other input method, would.

Mappings of number-sequences to words are based on database that offers words in order of relative frequency. These word frequency lists are based on a corpus of text in the target language pre-programmed into the phone. These corpora, at least initially, were not based on the words people use to communicate using SMS but one a more readily available data source (e.g., in emails or memos of in fiction). This leads to problems common to many systems that built on shaky probabilistic models: what is likely in one context may not be as likely in another. For example, while “but” is an extremely common English word, it might be much less common in SMS where more complex sentence structures are often eschewed due to economy of space (160 character messages) and laborious data-entry. The word “pony” might be more common than “snow” in some situations but it’s certainly not in my usage!

Of course, proper nouns, of which there are many, are often excluded from these systems as well. Since the T9 system does not “know” the word “Auckland”, the nonsensical compound-word “bucklame” seems to be an appropriate mapping for the same number-sequence. Apparently, people liked the error so much they kept using it and, with time perhaps, it stops being an error at all.

As users move to systems with keyboards like Blackberries, Treos, Sidekicks, and iPhones (which use a dual-mode system) these errors become impossible. As a result, the presence of these types of errors (e.g., a swapped “me” and “of”) can tell communicators quite a lot about the type of device they are communicating with.