February 2009 – Revealing Errors

Last year, I talked about some of the dangers of machine translation that resulted in a Chinese restaurant advertised as “Translate Server Error” and another restaurant serving “Stir Fried Wikipedia.” This article from the BBC a couple months ago shows that embarassing translation errors are hardly limited to either China or to machine translation systems.

Mistranslated Welsh road sign

The English half of the sign is printed correctly and says, “No entry for heavy goods vehicles. Residential site only.” Clearly enough, the point of the sign is to prohibit truck drivers from entering a residential neighborhood.

Since the sign was posted in Swansea, Wales, the bottom half of the sign is written in Welsh. The translation of the Welsh is, “I am not in the office at the moment. Send any work to be translated.”

It’s not too hard to piece together what happened. The bottom half of the sign was supposed to be a translation of the English. Unfortunately, the person ordering the sign didn’t speak Welsh. When he or she sent it off to be translated, they received a quick response from an email autoresponder explaining that the email’s intended recipient was temporarily away and that they would be back soon — in Welsh.

Unfortunately, the representative of the Swansea council thought that the autoresponse message — which is coincidentally, about the right length — was the translation. And onto the sign it went. The autoresponse system was clearly, and widely, revealed by the blunder.

One thing we can learn from this mishap is simply to be wary of hidden intermediaries. Our communication systems are long and complex; every message passes through dozens of computers with a possibility of error, interception, surveillance, or manipulation at every step. Although the representative of the Swansea council thought they were getting a human translation, they, in fact, never talked to a human at all. Because the Swansea council didn’t expect a computerized autoresponse, they didn’t consider that the response was not sent by the recipient.

Another important lesson, and one also present in the Chinese examples, is that software needs to give users responses in the language they are interacting in to be interpreted correctly. In the translation context where users plan to use, but may not understand, their program’s output, this is often impossible. That’s why when a person has someone, or some system, translate into a language they do not speak, they open themselves up to these types of errors. If a user does not understand the output of a system they are using, they are put completely at the whim of that system. The fact that we usually do understand our technology’s output provides a set of “sanity checks” that can keep this power in check. We are so susceptible to translation errors because these checks are necessarily removed.

A while ago, Mark Pilgrim wrote about being prompted with a license agreement that looked like this.

Adobe Reader 8 license agreement showing HTML code.

If, like most people, you have trouble parsing the agreement, that’s because it’s not the text of the license agreement that’s being shown but the “marked up” XHTML code. Of course, users are only supposed to see the processed output of the code and not the code itself. Something went wrong here and Mark was shown everything. The result is useless.

Conceptually, computer science can be boiled down to a process of abstraction. In an introductory undergraduate computer science course, students are first taught syntax or the mechanics of writing code that computers can understand. After that, they are taught abstraction. They’ll continue to be taught abstraction, in one way or another, until they graduate. In this sense, programming is just a process of taking complex tasks and then hiding — abstracting — that complexity behind a simplified set of interfaces. Then, programmers build increasingly complex tools on top of these interfaces and the whole cycle repeats. Through this process of abstracting abstractions, programmers build up systems of almost unfathomable complexity. The work of any individual programmer becomes like a tiny cog in a massive, intricate machine.

Mark’s error is interesting because it shows a ruptured black box — an accute failure of abstraction. Of course, many errors, like the dialog shown below, show us very little about the software we’re using.

Unknown Error dialog

With errors like Mark’s, however, users are quite literally presented with a view of parts of the system that programmer was trying to hide.

Here’s another photo I’ve been showing in a my talks that shows a crashed ATM displaying bits of the source code of the application running on the ATM; a bit of unintentional “open sourcing.”

These examples are embarrassing for authors of the software that caused them but are reasonably harmless. Sometimes, however, the window we get into a broken black box can be shocking.

In talks, I’ve mentioned a configuration error on Facebook that resulted in the accidental publication of the Facebook source code. Apparently, people looking at the code found little pieces like these (comments, written by Facebook’s authors, are bolded):

$monitor = array( '42107457' => 1, '9359890' => 1); // Put baddies (hotties?) in here
/* Monitoring these people's profile viewage. Stored in central db on profile_views. Helpful for law enforcement to monitor stalkers and stalkees. */

The first block describes a list of “baddies” and “hotties” represented by user ID numbers that Facebook’s authors have singled out for monitoring. The second stanza should be self-explanatory.

Facebook has since taken steps to avoid future errors like this. As a result, we’re much less likely to get further views into their code. Of course, we have every reason to believe that this code, or other code like it, still runs on Facebook. Of course, as long as Facebook’s black box works better than it has in the past, we may never again know exactly what’s going on.

Like Facebook’s authors, many technologists don’t want us knowing what our technology is doing. Sometimes, like Facebook, for good reason: the technology we use is doing things that we would be shocked and unhappy to hear about it. Errors like these provide a view into some of what we might be missing and reasons to be discomforted by the fact that technologists work so hard to keep us in the dark.

Month: February 2009

The Case of the Welsh Autoresponder

Show Me the Code