Transparency Posted Sat, 17 Oct 2009

I caught this revealing error on the always entertaining Photoshop Disasters and thought it was too good to resist pointing out here:

Bag of Jasmin Rice

The picture, of course, is a bag of Tao brand jasmine rice for sale in Germany. The error is pretty obvious if you understand a little German: the phrase transparentes sichtfeld literally means transparent field of view. In this case, the phrase is a note written by the graphic designer of the rice bag's packaging that was never meant to be read by a consumer. The phrase is supposed to indicate to someone involved in the bag's manufacture than the pink background on which the text is written is supposed to remain unprinted (i.e., as transparent plastic) so that customers get a view directly onto the rice inside the bag.

The error, of course, is that the the pink background and the text was never removed. This was possible, in part, because the the pink background doesn't look horribly out of place on the bag. A more important factor, however, is the fact that the person printing the bag and bagging the rice almost certainly didn't speak German.

In this sense, this bears a lot of similarity with some errors I've written up before --- e.g., the Welsh autoresponder and the Translate server error restaurant. And as in those cases, there are takeaways here about all the things we take for granted when communicating using technology --- things we often don't realize until language barriers make errors like this thrust hidden processes into view.

This error revealed a bit of the processes through which these bags of rice are produced and a little bit about the people and the division of labor that helped bring it to us. Ironically, this error is revealing precisely through the way that the bag fails to reveal its contents.

Akamai and SSL Posted Wed, 12 Aug 2009

SSL stands for "Secure Sockets Layer" and refers to a protocol for using the web in a secure, encrypted, manner. Every time you connect to a website with an address prepended with https://, instead of just http://, you're connecting over SSL. Almost all banks and e-commerce sites, for example, use SSL exclusively.

SSL helps provide security for users in at least two ways. First, it helps keep communication encoded in such a way that only you and the site you are communicating with can read it. The Internet is designed in a way that makes messages susceptible to eavesdropping; SSL helps prevent this. But sending coded messages only offer protection if you trust that the person you are communicating in code with really is who they say they are. For example, if I'm banking, I want to make sure the website I'm using really is my bank's and not some phisher trying to get my account information. The fact that we're talking in a secret code will protect me from eavesdroppers but won't help me if I can't trust the person I'm talking in code with.

To address this, web browsers come with a list of trusted organizations that verify or vouch for websites. When one of these trusted organizations vouches that a website really is who they say they are, they offer what is called a "certificate" that attests to this fact. A certificate for revealingerrors.com would help users verify that that they really are viewing Revealing Errors, and not some intermediary, impostor, or stand-in. If someone were redirect traffic meant for Revealing Errors to an intermediary, users connecting using SSL would get an error message warning them that the certificate offered is invalid and that something might be awry.

That bit of background provides the first part of this explanation for this error message.

whitehouse.gov error message claiming the host is a248.e.akamai.net

In this image, a user attempted to connect to the Whitehouse.gov website over SSL --- visible from the https in the URL bar. Instead of a secure version of the White House website, however, the user saw an error explaining that the certificate attesting to the identity of the website was not from the United States White House, but rather from some other website called a248.e.akamai.net.

This is a revealing error, of course. The SSL system, normally represented by little more than a lock icon in the status bar of a browser, is thrust awkwardly into view. But this particularly revealing error has more to tell. Who is a248.e.akamai.net? Why is their certificate being offered to someone trying to connect to the White House website?

a248.e.akamai.net is the name of a server that belongs to a company called Akamai. Akamai, while unfamiliar to most Internet users, serves between 10 and 20 percent of all web traffic. The company operates a vast network of servers around the world and rents space on these servers to customers who want their websites to work faster. Rather than serving content from their own computers in centralized data centers, Akamai's customers can distribute content from locations close to every user. When a user goes to, say, Whitehouse.gov, their computer is silently redirected to one of Akamai's copies of the Whitehouse website. Often, the user will receive the web page much more quickly than if they had connected directly to the Whitehouse servers. And although Akamai's network delivers more 650 gigabits of data per second around the world, it is almost entirely invisible to the vast majority of its users. Nearly anyone reading this uses Akamai repeatedly throughout the day and never realizes it. Except when Akamai doesn't work.

Akamai is an invisible Internet intermediary on a massive scale. But because SSL is designed to detect and highlight hidden intermediaries, Akamai has struggled to make SSL work with their service. Although Akamai offers a service designed to let their customers use Akamai's service with SSL, many customers do not take advantage of this. The result is that SSL remains one place where, through error messages like the one shown above, Akamai's normally hidden network is thrust into view. An attempt to connect to a popular website over SSL will often reveal Akamai. The White House is hardly the only victim; Microsoft's Bing search engine launched with an identical SSL error revealing Akamai's behind-the-scenes role.

Akamai plays an important role as an intermediary for a large chunk of all activity online. Not unlike Google, Akamai has an enormous power to monitor users' Internet usage and to control or even alter the messages that users send and receive. But while Google is repeatedly --- if not often enough --- held to the fire by privacy and civil liberties advocates, Akamai is mostly ignored.

We appreciate the power that Google has because they are visible --- right there in our URL bar --- every time we connect to Google Search, GMail, Google Calendar, or any of Google's growing stable of services. On the other hand, Akamai's very existence is hidden and their power is obscured. But Akamai's role as an intermediary is no less important due its invisibility. Errors provide one opportunity to highlight Akamai's role and the power they retain.

Deals, Failure, and Fun Posted Sun, 05 Jul 2009

I've found that the always entertaining FAILblog is a rich source for revealing errors. Here's a nice example.

Every reader on FAILblog can chuckle at the idea an item is being offered for $69.98 instead of an original $19.99 as part of a clearance sale. The idea that one can "Save $-49" is icing on the cake. Of course, most readers will immediately assume that no human was involved in the production of this sign; it's hard to imagine that any human even read the sign before it went up on the shelf!

The sign was made by a computer program working from a database or a spreadsheet with a column for the name of the product, a column for the original price, and a column for the sale price. Subtracting the sale price from the original gives the "savings" and, with that data in hand, the sign is printed. The idea of negative savings is a mistake that only a computer will make and, with the error, the sign-producing computer program is revealed.

Errors like this, and FAILblog's work in in general, highlights one of the reasons that I think that errors are such a great way to talk about technology. FAILblog is incredibly popular with millions of people checking in to see the latest pictures and videos of screw-ups, mistakes, and failures. For whatever reason --- sadism, schadenfreude, reflection on things that are surprisingly out of place, or the comfort of knowing that others have it worse --- we all know that a good error can be hilarious and entertaining.

My own goal with Revealing Errors centers on a type of technology education. I want to revealing hidden technology as a way of giving people insight into the degree and the way that our lives are technologically mediated. In the process, I hope to lay the groundwork for talking about the power that this technology has.

But if people are going to want to read anything I write, it should also be entertaining. Errors are appropriate for a project like mine because they give an a view into closed systems, hidden intermediaries and technological black boxes. But they they are great for the project because they are also intrinsically interesting!

Quorum of the Twelve Apostates Posted Sun, 12 Apr 2009

A number of people (including the New York Times) wrote about a costly error at Brigham Young University last week that was originally reported by the Utah Valley Daily Herald. The error itself was subtle. First, it is important to realize that Brigham Young is a private university owned by the Church of Jesus Christ of Latter-day Saints (i.e., the Mormon Church or LDS for short). The front of the the Daily Universe -- the BYU university newspaper --- featured a photograph of a group of men who form one of the most important governing bodies in the LDS church with the heading, "Quorum of the Twelve Apostates."

Quorum of the Twelve Apostates

The caption should have said the "Quorum of the Twelve Apostles" which is the name of the governing body in question. An apostle, of course, is a messenger or ambassador although the term is most often used to refer to Jesus' twelve closest disciples. The term apostle is used in LDS to refer to a special high rank of priest within the church. An apostate is something else entirely; the term refers to a person who is disloyal and unfaithful to a cause -- particularly to a religion.

Shocked that the paper was labeling the highest priests in the church as disloyal and unfaithful, thousands of copies of the paper (18500 by one report) were pulled from news stands around campus. New editions of the paper with a fixed caption were produced and replaced at what must have been enormous cost to BYU and the Daily Universe.

The source of the error, says the university's spokesperson, was in a spellchecker. Working under a tight deadline, the person spell-checking the captions ran across a misspelled version of "apostles" in the text. In a rush, they clicked the first term in the suggestion list which, unfortunately, happened to be a similarly spelled near-antonym of the word they wanted.

From a technical perspective, this error is a version of the Cupertino effect although the impact was much more strongly felt than most examples of Cupertino. Like Cupertino, BYU's small disaster can teach us a whole lot about the power and effect of technological affordances. The spell-checking algorithm made it easier for the Daily Universe's copy editor to write "apostate" than it was to write "apostle" and, as a result, they did exactly that. A system with different affordances would have had different effects.

The affordances in our technological systems are constantly pushing us toward certain choices and actions over others. In an important way, the things we produce and says and the ways we communicate are the product of these affordances. Through errors like BYU's, we get a glimpse of these usually-hidden affordances in every-day technologies.

The Case of the Welsh Autoresponder Posted Sat, 21 Feb 2009

Last year, I talked about some of the dangers of machine translation that resulted in a Chinese restaurant advertised as "Translate Server Error" and another restaurant serving "Stir Fried Wikipedia." This article from the BBC a couple months ago shows that embarassing translation errors are hardly limited to either China or to machine translation systems.

Mistranslated Welsh road sign

The English half of the sign is printed correctly and says, "No entry for heavy goods vehicles. Residential site only." Clearly enough, the point of the sign is to prohibit truck drivers from entering a residential neighborhood.

Since the sign was posted in Swansea, Wales, the bottom half of the sign is written in Welsh. The translation of the Welsh is, "I am not in the office at the moment. Send any work to be translated."

It's not too hard to piece together what happened. The bottom half of the sign was supposed to be a translation of the English. Unfortunately, the person ordering the sign didn't speak Welsh. When he or she sent it off to be translated, they received a quick response from an email autoresponder explaining that the email's intended recipient was temporarily away and that they would be back soon --- in Welsh.

Unfortunately, the representative of the Swansea council thought that the autoresponse message --- which is coincidentally, about the right length --- was the translation. And onto the sign it went. The autoresponse system was clearly, and widely, revealed by the blunder.

One thing we can learn from this mishap is simply to be wary of hidden intermediaries. Our communication systems are long and complex; every message passes through dozens of computers with a possibility of error, interception, surveillance, or manipulation at every step. Although the representative of the Swansea council thought they were getting a human translation, they, in fact, never talked to a human at all. Because the Swansea council didn't expect a computerized autoresponse, they didn't consider that the response was not sent by the recipient.

Another important lesson, and one also present in the Chinese examples, is that software needs to give users responses in the language they are interacting in to be interpreted correctly. In the translation context where users plan to use, but may not understand, their program's output, this is often impossible. That's why when a person has someone, or some system, translate into a language they do not speak, they open themselves up to these types of errors. If a user does not understand the output of a system they are using, they are put completely at the whim of that system. The fact that we usually do understand our technology's output provides a set of "sanity checks" that can keep this power in check. We are so susceptible to transation errors because these checks are necessarily removed.

Show Me the Code Posted Sun, 01 Feb 2009

A while ago, Mark Pilgrim wrote about being prompted with a license agreement that looked like this.

Adobe Reader 8 license agreement showing HTML code.

If, like most people, you have trouble parsing the agreement, that's because it's not the text of the license agreement that's being shown but the "marked up" XHTML code. Of course, users are only supposed to see the processed output of the code and not the code itself. Something went wrong here and Mark was shown everything. The result is useless.

Conceptually, computer science can be boiled down to a process of abstraction. In an introductory undergraduate computer science course, students are first taught syntax or the mechanics of writing code that computers can understand. After that, they are taught abstraction. They'll continue to be taught abstraction, in one way or another, until they graduate. In this sense, programming is just a process of taking complex tasks and then hiding -- abstracting -- that complexity behind a simplified set of interfaces. Then, programmers build increasingly complex tools on top of these interfaces and the whole cycle repeats. Through this process of abstracting abstractions, programmers build up systems of almost unfathomable complexity. The work of any individual programmer becomes like a tiny cog in a massive, intricate machine.

Mark's error is interesting because it shows a ruptured black box -- an accute failure of abstraction. Of course, many errors, like the dialog shown below, show us very little about the software we're using.

Unknown Error dialog

With errors like Mark's, however, users are quite literally presented with a view of parts of the system that programmer was trying to hide.

Here's another photo I've been showing in a my talks that shows a crashed ATM displaying bits of the source code of the application running on the ATM; a bit of unintentional "open sourcing."

Unknown Error dialog

These examples are embarrassing for authors of the software that caused them but are reasonably harmless. Sometimes, however, the window we get into a broken black box can be shocking.

In talks, I've mentioned a configuration error on Facebook that resulted in the accidental publication of the Facebook source code. Apparently, people looking at the code found little pieces like these (comments, written by Facebook's authors, are bolded):

$monitor = array( '42107457' => 1, '9359890' => 1);
// Put baddies (hotties?) in here

/* Monitoring these people's profile viewage.
Stored in central db on profile_views.
Helpful for law enforcement to monitor stalkers and stalkees.
*/

The first block describes a list of "baddies" and "hotties" represented by user ID numbers that Facebook's authors have singled out for monitoring. The second stanza should be self-explanatory.

Facebook has since taken steps to avoid future errors like this. As a result, we're much less likely to get further views into their code. Of course, we have every reason to believe that this code, or other code like it, still runs on Facebook. Of course, as long as Facebook's black box works better than it has in the past, we may never again know exactly what's going on.

Like Facebook's authors, many technologists don't want us knowing what our technology is doing. Sometimes, like Facebook, for good reason: the technology we use is doing things that we would be shocked and unhappy to hear about it. Errors like these provide a view into some of what we might be missing and reasons to be discomforted by the fact that technologists work so hard to keep us in the dark.

Lorem Ipsum Dolor Sit Amet Posted Wed, 28 Jan 2009

I was browsing this store for worker clothes in Germany a few weeks back when I noticed something funny in the bottom corner. I've highlighted the snafu in the screenshot below with a big red arrow.

lorem ipsum screen shoot

The arrow points to paragraph that is definitely not in German. In fact, it's Latin. Well, almost Latin.

The paragraph is a famous piece of Latin nonsense text that starts with, and is usually referred to as, lorem ipsum. Lorem ipsum has apparently been in existence (in one form or another), and in use by the printing and publishing industry, for centuries. Although it's originally derived by a text from Cicero, the Latin is meaningless.

The story behind lorem ipsum is rooted in the fact that when presented with text, people tend to read it. For this reason, and because sometimes text for a document doesn't exist until late in the process, many text and layout designers do what's called Greeking. In Greeking, a designer inserts fake or "dummy" text that looks like real text but, because it doesn't make any sense, lets viewers focus on the layout without the distraction of "real" words. Lorem ipsum was the printing industry's standard dummy text. It continues to be popular in the world of desktop and web publishing.

In fact, lorem ipsum is increasingly popular. The rise of computers and computer-based web and print publishing has made it much easier and more common for text layout and design to be prototyped and much more likely that a document's designer is not the same person or firm that publishes the final version. While both design and publishing would have been done in print houses half a century ago, today's norm is for web, graphic, print and layout designers to give their clients pages or layouts with dummy text -- often the lorem ipsum text itself. Clients -- the "real" text's producers, that is -- are expected to replace the dummy text with the real text before printing or uploading their document to the web.

We can imagine what happened in this example. The clothing shop hired a web design firm who turned over the "greeked" layout to the store owners and managers. The store managers replaced the greeked text with information about their products and services. Not being experts -- or just because they were careless -- they missed a few spots and some of the Greeked text ended up published to the world by mistake.

A quick look around the web shows that this shop is in good company. Although lorem ipsum is often preferred because the spacing makes the text "look like" English from a distance, many other dummy texts are both used and abused. Here's an example from an auto advertisement.

car advertisement with dummy text

Due to rapidly and radically changed roles introduced by desktop publishing -- changes in structure and division of labor that are usually invisible -- you can see accidentally published lorem ipsum text all over the web and in all sorts of places in the printed world as well. We don't often reflect on the changes in the human and technological systems behind web and desktop publishing. Errors like these give an opportunity to do so.

Posted by mako | Permalink | , , | View/Add Comments: 6
Faces of Google Street View Posted Sun, 07 Dec 2008

This error was revealed and written up by Fred Beneson and first published on his blog.

Google Streetview
Blurred Face Example

After receiving criticism for the privacy-violating "feature" of Google Street View that enabled anyone to easily identify people who happened to be on the street as Google's car drove by, the search giant started blurring faces.

What is interesting, and what Mako would consider a "Revealing Error", is when the auto-blur algorithm can not distinguish between an advertisement's face and a regular human's face. For the ad, the model has been compensated to have his likeness (and privacy) commercially exploited for the brand being advertised. On the other hand, there is a legal grey-area as to whether Google can do the same for random people on the street, and rather than face more privacy criticism, Google chooses to blur their identities to avoid raising the issue of whether it is their right to do so, at least in America.

So who cares that the advertisement has been modified? The advertiser, probably. If a 2002 case was any indication, advertisers do not like it when their carefully placed and expensive Manhattan advertisements get digitally altered. While the advertisers lost a case against Sony for changing (and charging for) advertisements in the background of Spiderman scenes located in Times Square, its clear that they were expecting their ads to actually show up in whatever work happened to be created in that space. There are interesting copyright implications here, too, as it demonstrates an implicit desire by big media for work like advertising to be reappropriated and recontextualized because it serves the point of getting a name "out there."

To put my undergraduate philosophy degree to use, I believe these cases bring up deep ethical and ontological questions about the right to control and exhibit realities (Google Street View being one reality, Spiderman's Time Square being another) as they obtain to the real reality. Is it just the difference between a fiction and a non-fiction reality? I don't think so, as no one uses Google maps expecting to retrieve information that is fictional. Regardless, expect these kinds of issues to come up more and more frequently as Google increases its resolution and virtual worlds merge closer to real worlds.

Beef Panties Posted Sun, 26 Oct 2008

Many of the gems from the newspaper correction blog Regret the Error qualify as a revealing errors. One particularly entertaining example was this Reuters syndicated wire story on the recall of beef whose opening paragraph explained that (emphasis mine):

Quaker Maid Meats Inc. on Tuesday said it would voluntarily recall 94,400 pounds of frozen ground beef panties that may be contaminated with E. coli.

ABC News Beef Panties Article

Of course the article was talking about beef patties, not beef panties.

This error can be blamed, at least in part, on a spellchecker. I talked about spellcheckers before when I discussed the Cupertino effect which happens when someone spells a word correctly but is prompted to change it to an incorrect word because the spellchecker does not contain the correct word in its dictionary. The Cupertino effect explains why the New Zealand Herald ran a story with Saddam Hussein's named rendered as Saddam Hussies and Reuters ran a story referring to Pakistan's Muttahida Quami Movement as the Muttonhead Quail Movement.

What's going on in the beef panties example seems to be a little different and more subtle. Both "patties" and "panties" are correctly spelled words that are one letter apart. The typo that changes patties to panties is, unlike swapping Cupertino in for cooperation, an easy one for a human to make. Single letter typos in the middle of a word are easy to make and easy to overlook.

As nearly all word processing programs have come to include spellcheckers, writers have become accustomed to them. We look for the red squiggly lines underneath words indicating a typo and, if we don't see it, we assume we've got things right. We do so because this is usually a correct assumption: spelling errors or typos that result in them are the most common type of error that writers make.

In a sense though, the presence of spellcheckers has made one class of misspellings -- those that result in a correctly spelled but incorrect words -- more likely than before. By making most errors easier to catch, we spend less time proofreading and, in the process, make a smaller class of errors -- in this case, swapped words -- more likely than used to be. The result is errors like "beef panties."

Although we're not always aware of them, the affordances of technology changes the way we work. We proofread differently when we have a spellchecker to aid us. In a way, the presence of a successful error-catching technology makes certain types of errors more likely.

One could make an analogy with the arguments made against some security systems. There's a strong argument in the security community that creation of a bad security system can actually make people less safe. If one creates a new high-tech electronic passport validator, border agents might stop checking the pictures as closely or asking tough questions of the person in front of them. If the system is easy to game, it can end up making the border less safe.

Error-checking systems eliminate many errors. In doing so, they can create affordances that make others more likely! If the error checking system is good enough, we might stop looking for errors as closely as we did before and more errors of the type that are not caught will slip through.

Send in the Clones Posted Thu, 02 Oct 2008

Earlier in the summer, Iran released this image to the international community -- purportedly a photograph of rocket tests carried out recently.

Iran missiles (original image)

There was an interesting response from a number of people that pointed out that the images appeared to have been manipulated. Eventually, the image ended up on the blog Photoshop Disasters (PsD) who released this marked up image highlighting the fact that certain parts of the image seemed similar to each other. Identical in fact; they had been cut and pasted.

Iran missile image marked up by PsD

The blog joked that the photos revealed a "shocking gap in that nation's ability to use the clone tool."

The clone tool -- sometimes called the "rubber stamp tool" -- is a feature available in a number of photo-manipulation programs including Adobe Photoshop, GIMP and Corel Photopaint. The tool lets users easily replace part of a picture with information from another part. The Wikipedia article on the tool offers a good visual example and this description:

The applications of the cloning tool are almost unlimited. The most common usage, in professional editing, is to remove blemishes and uneven skin tones. With a click of a button you can remove a pimple, mole, or a scar. It is also used to remove other unwanted elements, such as telephone wires, an unwanted bird in the sky, and a variety of other things.

Of course, the clone tool can also be used to add things in -- like the clouds of dust and smoke at the bottom of the images of the Iranian test. Used well, the clone tool can be invisible and leave little or no discernible mark. This invisible manipulation can be harmless or, as in the case of the Iranian missiles, it can used for deception.

The clone tool makes perfect copies. Too perfect. And these impossibly perfect reproductions can becoming revealing errors. Through its introduction of unnatural verisimilitude within an image, the clone introduces errors. In doing so, it can reveal both the person manipulating the image and their tools. Through their careless use of the tool, the Iranian government's deception, and their methods, were revealed to the world.

But the Iranian government is hardly the only one caught manipulating images through careless use of the clone tool. Here's an image, annotated by PsD again, of the 20th Century Fox Television logo with "evident clone tool abuse!"

20th Century Fox Image Manipulation

And here's an image from Brazilian Playboy where an editor using a clone tool has become a little overzealous in their removal of blemishes.

Missing navel on Playboy Brazil model

Now we're probably not shocked to find out that Playboy deceptively manipulates images of their models -- although the resulting disregard for anatomy drives the extreme artificially of their productions home in a rather stark way.

In aggregate though, these images (a tiny sample of what I could find with a quick look) help speak to the extent of image manipulation in photographs that, by default, most of us tend to assume are unadulterated. Looking for the clone tool, and for other errors introduced by the process of image manipulation, we can get a hint of just how mediated the images we view the world are -- and we have reason to be shocked.

Here's a final example from Google maps that shows the clear marks of the clone tool in a patch of trees -- obviously cloned to the trained eye -- on what is supposed to be an unadulterated satellite image of land in the Netherlands.

Missing navel on Playboy Brazil model

Apparently, the surrounding area is full of similar artifacts. Someone has been edited out and papered over much of the area -- by hand -- with the clone tool because someone with power is trying to hide something visible on that satellite photograph. Perhaps they have a good reason for doing so. Military bases, for example, are often hidden in this way to avoid enemy or terrorist surveillance. But it's only through the error revealed by sloppy use of the clone tool that we're in any position to question the validity of these reasons or realize the images have been edited at all.

Google News and the UAL Stock Fiasco Posted Mon, 15 Sep 2008

I've beat up on Google News before but something happened this week that made me (and many of you who emailed me) believe it worth revisiting the topic.

On September 9th, a glitch in the Google News crawler caused Google News to redisplay an old article from 2002 that announced that that UAL -- the company that owns and runs United Airlines -- was filing for bankruptcy. The re-publication of this article as news started off a chain-reaction that caused UAL's stock price to plummet from more than USD$11 per share to nearly $3 in 13 minutes! After trading was halted and the company allowed to make a statement, the stock mostly (but not completely) recovered by the end of the day. During that period, USD$1.14 billion dollars of shareholder wealth evaporated.

Initially, officials suspected stock manipulation but it seems to be traced back to a set of automated systems and "honest" technical mistakes. There was no single error behind the fiasco but rather several broken systems working in concert.

The mess started when Chicago Tribune, who published an article about UAL's 2002 bankruptcy back in 2002, started getting increased traffic to that old article for reasons that are not clear. As a result, the old article became listed as a "popular news story" on their website. Seeing the story on the popular stories list, a program running on computers at Google downloaded the article. For reasons Google tried to explain, their program (or "crawler") was not able to correctly identify the article as coming from 2002 and, instead, classified it being a new story and listed it on their website accordingly. Elsewhere, the Tribute claimed that they notified Google of this issue already. Google denies this.

What happens next is somewhat complicated but was carefully detailed by the Sun-Times. It seems that a market research firm called Income Securities Advisers, Inc. was monitoring Google News, saw the story (or, in all probability, just the headline "UAL files for bankruptcy") and filed an alert which was then picked up by the financial news company Bloomberg. At any point, clicking on and reading the article would have made it clear that the story was old. Of course, enough people didn't click and check before starting a sell-off that snow-balled, evaporating UAL's market capital before anyone realized what was actually going on. The president of the research firm, Richard Lehmann, said, "It says something about our capital markets that people make a buy-sell decision based on a headline that flashes across Bloomberg."

Even more intriguing, there's a Wall Street Journal report that claims that the sell-off was actually kick-started by automated trading programs that troll news aggregators -- like Bloomberg and Google news. These programs look for key words and phrases and start selling a companies shares when they get sense "bad" news. Such programs exist and, almost certainly, would have been duped by this chain of events.

While UAL has mostly recovered, the market and many outside of it learned quite a few valuable lessons about the technology that they are trusting their investments and their companies to. Investors understand that the computer programs they use to manage and coordinate their markets are hugely important; Financial services companies spend billions of dollars building robust, error-resistant systems. Google News, of the other hand, quietly became part of this market infrastructure without Google, most investors, or companies realizing it -- that's why officials initially suspected intentional market manipulation and why Google and Tribue were so suprised.

There were several automated programs -- including news-reading automated trading systems -- that have become very powerful market players. Most investors and the public never knew about these because they are designed to work just like humans do -- just faster. When they work, they make money for the people running them because they can be just ahead of the pack in known market moves. These systems were revealed because they made mistakes that no human would make. In the process they lost (if only temporarily) more than a billion dollars!

Our economy is mediated by and, in many ways, resting in the hands of technologies -- many of which we won't know about until they fail. If we're wise, we'll learn from errors and think hard about the way that we use technology and about the power, and threat, that invisible and unaccountable technologies might pose to our economy and beyond.

Google Miscalculator Posted Sat, 06 Sep 2008

This post on a search engine blog pointed out a series of very strange and incorrect search results returned by Google's search engine. A very complicated "black box," many of the errors described highlight and reveal some aspect of Google's search technology.

My favorite was this error from Google Calculator:

Error showing 1.16 as a result for eight days a week

The error, which has been fixed, occurred when users searched for the the phrase "eight days a week" -- the name of a Beatles's song, film, and sitcom.

Google Calculator is a feature of Google's search engine that looks at search strings and, if it thinks you are trying to ask a math question or a units conversion, will give you the answer. You can, for example, search for 5000 times 23 or 10 furlongs per fortnight in kph or 30 miles per gallon in inverse square millimeters -- Google Calculator will give you the right answers. While it would be obvious to any human that "eight days a week" was a figure of a speech, Google thought it was a math problem! It happily converted 1 week to 7 days and then divided 8 by 7: roughly 1.14.

Clearly, the error reveals the absence of human judgment -- but we knew that about Google's search engine already. More intriguing is what this, combined with a series of other Google Calculator errors, might reveal about the Google's black box software.

When Google launched its Calculator feature, it reminded me of GNU Units -- a piece of free/open source software written by volunteers and distributed with an expectation that those who modify it will share with the community. After playing with Google Calculator for a little while, I tried a few "bugs" that had always bothered me in Units. In particular, I tried converting between Fahrenheit and Celsius. Units converts between the amount of degrees (for example, a change in temperature). It does not take into account the fact that the units have a different zero point so it often gives people an unexpected (and apparently incorrect) answer. Sure enough, Google Calculator had the same bug.

Now it's possible that Google implemented their system similarly and ran into similar bugs. But it's also quite likely that Google just took GNU Units and, without telling anyone, plugged it into their system. Google might look bad for using Units without credit and without assisting the community but how would anyone ever find out? Google's Calculator software ran on the Google's private servers!

If Google had released a perfect calculator, nobody would have had any reason to suspect that Google might have borrowed from Units. One expects unit conversion by different pieces of software to be similar -- even identical -- when its working. Identical bugs and idiosyncratic behaviors, however, are much less likely and much more suspicious.

Given the phrase "eight days a week", Units says "1.1428571."

Speed Camera Posted Tue, 26 Aug 2008

In the past, I've talked about how certain errors can reveal a human in what we may imagine is an entirely automated process. I've also shown quite a few errors that reveal the absence of a human just as clearly. Here's a photograph attached to a speeding ticket given by an automated speed camera that shows the latter.

Photograph of a tow-truck
towing a car down a road.

The Daily WTF published this photograph which was sent in by Thomas, one of their readers. The photograph came attached to this summons which arrived in the mail and explained that Thomas had been caught traveling 72 kilometers per hour in a 60 KPH speed zone. The photograph above was attached as evidence of his crime. He was asked to pay a fine or show up in court to contest it.

Obviously, Thomas should never have been fined or threatened. It's obvious from the picture that Thomas' car is being towed. Somebody was going 72 KPH but it was the tow-truck driver, not Thomas! Anybody who looked at the image could see this.

In fact, Thomas was the first person to see the image. The photograph was taken by a speed camera: a radar gun measured a vehicle moving in excess of the speed limit and triggered a camera which took a photograph. A computer subsequently analyzed the image to read the license plate number and look up the driver in a vehicle registration database. The system then printed a fine notice and summons notice and mailed it to the vehicle's owner. The Daily WTF editor points out that proponents of these automated systems often guarantee human oversight in the the implementation of these systems. This error reveals that the human oversight in the application of this particular speed camera is either very little or none and all.

Of course, Thomas will be able to avoid paying the fine -- the evidence that exonerates him is literally printed on his court summons. But it will take work and time. The completely automated nature of this system, revealed by this error, has deep implications for the way that justice is carried out. The system is one where people are watched, accused, fined, and processed without any direct human oversight. That has some benefits -- e.g., computers are unlikely to let people of a certain race, gender, or background off easier than others.

But in addition to creating the possibilities of new errors, the move from a human to a non-human process has important economic, political, and social consequences. Police departments can give more tickets with cameras -- and generate more revenue -- than they could ever do with officers in squad cars. But no camera will excuse a man speeding to the hospital with a wife in labor or a hurt child in the passanger seat. As work to rule or "rule-book slowdowns" -- types of labor protests where workers cripple production by following rules to the letter -- show, many rules are only productive for society because they are selectively enforced. The complex calculus that goes into deciding when to not apply the rules, second nature to humans, is still impossibly out of reach for most computerized expert systems. This is an increasingly important fact we are reminded of by errors like the one described here.

More Google News Posted Thu, 14 Aug 2008

In the very first thing I wrote about Revealing Errors -- an article published in the journal Media/Culture -- one of my core examples was Google News. In my discussion, I described how the fact that Google News aggregates articles without human intervention can become quite visible through the site's routine mistakes -- errors that human editors would never commit. I gave the example of the aggregation of two articles: one from Al Jazeera on how, "Iran offers to share nuclear technology," and another from the Guardian on how, "Iran threatens to hide nuclear program." Were they really discussing the same event? Maybe. But few humans would have made the same call that Google News did.

Google News Share/Hide

Yesterday, I saw this article from Network World that described an error that is even more egregious and that was, apparently, predicted by the article's author ahead of time.

In this case, Google listed a parody by McNamara as the top story about the recent lawsuit filed by the MBTA (the Boston mass transit authority) against security researchers at MIT. In the past, McNamara has pointed to other examples of Google News being duped by obvious spoofs. This long list of possible examples includes a story about congress enlisting the help of YouTube to grill the Attorney General (it was listed as the top story on Google News) and this story (which I dug up) about Paris Hilton's genitals being declared a wonder of the modern world!

McNamara has devoted an extraordinary amount of time to finding and discussing other shortcomings of Google News. For example, he's talked about the fact that GN has trouble filtering out highly-local takes on stories that are of broader interest when presenting them to the general Google News audience, about its sluggishness and inability to react to changing news circumstances, and about the sometimes hilarious and wildly inappropriate mismatches of images on the Google News website. Here's one example I dug up. Imagine what it looked like before it was censored!

Google News Porn

As McNamara points out repeatedly, all of these errors are only possible because Google News employs no human editors. Computers remain pretty horrible at sorting images for relevance to news stories and discerning over-the-top parody from the real thing -- two tasks that most humans don't have too much trouble with. The more generally inappropriate errors wouldn't have made it past a human for multiple reasons!

As I mentioned in my original Revealing Errors article, the decision to use a human editor is an important one with profound effects on the way that users are exposed to news and, as an effect, experience and understand one important part of the world around them. Google News' frequent mistakes gives us repeated opportunity to consider the way that our choice of technology -- and of editors -- frames this understanding.

Olympics Blue Screen of Death Posted Tue, 12 Aug 2008

Thanks to everyone who pointed me to the the Blue Screen of Death (BSoD) that could be seen projected onto part of the roof of the birds nest stadium in the opening ceremony of the Beijing Olympics this week right next to and during the torch lighting! Here's a photograph of the opening ceremony from an article on Gizmodo (there are more photos there) that shows the snafu pretty clearly.

BSOD at Olympics Opening

In the not-so-recent past, a stadium like the Bird's Nest would have been lit up using a large number of lights with gels to add color and texture. As the need for computer control moved on, expensive specialized theatrical computer controlled lighting equipment was introduced to help automate the use of these systems.

Of course, another way to maximize flexibility, coordination, and programmability at a low cost is to skip the lighting control systems altogether and to just hook up a computer to a powerful general purpose video projector. Then, if you want a green light projected, all you have to do is change the background on the screen being projected to green. If you want a blue green gradient, it's just as easy and there are no gels to change. Apparently, that's exactly what the Bird's Nest's designers did.

Unfortunately, with that added flexibility comes the opportunity for new errors. If the computer controlling your light is running Windows, for example, your lighting systems will be susceptible to all the same modes of failure. Apparently, using a video projector for this type of lighting is an increasingly common trick. If it had worked correctly for the Olympic organizers, we might never have known!

Lost in Machine Translation Posted Mon, 21 Jul 2008

While I've been traveling over the last week or so, loads of people sent me a link to this wonderful image of a sign in China reading "Translate Server Error" which has been written up all over the place. Thanks everyone!

Billboard saying

It's pretty easy to imagine the chain of events to led to this revealing error. The sign is describing a restaurant (the Chinese text, 餐厅, means "dining hall"). In the process of making the sign, the producers tried to translate Chinese text into English with a machine translation system. The translation software did not work and produced the error message, "Translation Server Error." Unfortunately, because the software's user didn't know English, they thought that the error message was the translation and the error text went onto the sign.

This class of error is extremely widespread. When users employ machine translations systems, it's because they want to communicate to people with whom they do not have a language in common. What that means is that the users of these systems are often in no position to understand the output (or input, depending on which way the translation is going) of such systems and have to trust the translation technology and its designers to get things right.

Here's another one of my favorite examples that shows a Chinese menu selling stir-fried Wikipedia.

Billboard saying

It's not entirely clear how this error came about but it seems likely that someone did a search for the Chinese word for a type of edible fungus and its translation into English. The most relevant and accurate page very well might have been an article on the fungus on Wikipedia. Unfamiliar with Wikipedia, the user then confused the name of the article with the name of the website. There have been several distinct citings of "wikipedia" on Chinese menus.

There are a few errors revealed in these examples. Of course, there are errors in the use of language and the broken translation server itself. Machine translations tools are powerful intermediaries that determine (often with very little accountability) the content of one's messages. The authors of the translation software might design their tool to avoid certain terminology and word choices over others or to silently censor certain messages. When the software is generating reasonable sounding translations, the authors and readers of machine translated texts are usually unaware of the ways in which messages are being changed. By revealing the presence of a translation system or process, this power is hinted at.

Of course, one might be able to recognize a machine translation system simply by the roughness and nature of a translation. In this particular case, the server itself came explicitly into view; it was mentioned by name! In that sense, the most serious failure was not that the translation server worked or that Wikipedia was used incorrectly, but rather that each system failed to communicate the basic fact that there was an error in the first place.

Tyson Homosexual Posted Mon, 30 Jun 2008

Thanks to everyone who pointed me to the flub below. It was reported all over the place today.

Screenshot showing Tyson Homosexual instead of Tyson Gay

The error occurred on One News Now, a news website run by the conservative Christian American Family Association. The site provides Christian conservative news and commentary. One of the things they do, apparently, is offer a version of the standard Associated Press news feed. Rather than just republishing it, they run software to clean up the language so it more accurately reflects their values and choice of terminology. They do so with a computer program.

The error is a pretty straightforward variant of the clbuttic effect -- a run-away filter trying to clean up text by replacing offensive terms with theoretically more appropriate ones. Among other substitutions, AFA/ONN replaced the term "gay" with "homosexual." In this case, they changed the name of champion sprinter and U.S. Olympic hopeful Tyson Gay to "Tyson Homosexual." In fact, they did it quite a few times as you can see in the screenshot below.

Screenshot showing Tyson Homosexual instead of Tyson Gay.

Now, from a technical perspective, the technology this error reveals is identical to the clbuttic mistake. What's different, however, is the values that the error reveals.

AFA doesn't advertise the fact that it changes words in its AP stories -- it just does it. Most of it's readers probably never know the difference or realize that the messages and terminology they are being communicated to in is being intentionally manipulated. AFA prefers the term "homosexual," which sounds clinical, to "gay" which sounds much less serious. Their substitution, and the error it created, reflects a set of values that AFA and ONN have about the terminology around homosexuality.

It's possible than the AFA/ONN readers already know about AFA's values. This error provides an important reminder and shows, quite clearly, the importance that AFA gives to terminology. It reveals their values and some of the actions they are willing to take to take to protect them.

Medireview Posted Fri, 27 Jun 2008

Medireview is a reference to what has become a classic revealing error. The error was noticed in 2001 and 2002 when people started seeing a series of implausibly misspelled words on a wide variety of websites. In particular, website authors were frequently swapping the nonsense word medireview for medieval. Eventually, the errors were traced back to Yahoo: each webpage containing medireview had been sent as an attachment over Yahoo's free email system.

The explanation of this error shares a lot in common with previous discussions of the the difficulty of searching for AppleScript on Apple's website and my recent description of the term clbuttic. Medireview was caused, yet again, by an overzealous filter. Like the AppleScript error, the filter was attempt to defeat cross site scripting. Nefarious users were sending HTML attachments that, when clicked, might run scripts and cause bad things to happen -- for example, they might gain access to passwords or data without a user's permission or knowledge. To protect its users, Yahoo scanned through all HTML attachments and simply removed any references to "problematic" terms frequently used in cross-site scripting. Yahoo made the follow changes to HTML attachments -- each line shows a term that can be used to invoke a script and the "safe" synonym it was replaced with:

  • javascript → java-script
  • jscript → j-script
  • vbscript → vb-script
  • livescript → live-script
  • eval → review
  • mocha → espresso
  • expression → statement

This caused problems because, like in the Clbuttic error, Yahoo didn't check for word boundaries. This mean that any word containing eval (for example) would be changed to review. The term evaluate was rendered reviewuate. The term medieval was rendered medireview.

Of course, neither sender nor receiver knew that their attachments had been changed! Many people emailed webpages or HTML fragments which, complete with errors introduced by Yahoo, were then put online. The Indian newspaper The Hindu published an article referring to "medireview Mughal emperors of India." Hundreds of others made similar mistakes.

The flawed script was in effect on Yahoo's email system from at least March 2001 through July 2002 before the error was reported by the BBC, New Scientist and others.

Like a growing number of errors that I've covered here, this error pointed out the presence and power of an often hidden intermediary. The person who controls the technology one uses to write, send, and read email has power over one's messages. This error forced some users of Yahoo's system to consider this power and to make a choice about their continued use of the system. Quite a few stopped using Yahoo after this news hit the press. Others turned to other technologies, like public-key cryptography, to help themselves and others verify that their future messages' integrity could be protected from accidental or intentional corruption.

Clbuttic Posted Wed, 18 Jun 2008

Revealings errors are often most powerful when they reveal the presence of or details about a technology's designer. One of my favorite clbuttes classes of revealing errors are those that go one step further and reveal the values of the designers of systems. I've touched on these twice before in my post about T9 input systems and when I talked about profanity in wordlists.

Another wonderful example surfaced in this humorous anecdote about what was supposed to be an invisible anti-profanity system that instead filled a website with nonsensical terms like "clbuttic."

Basically, the script in question tried to look through user input and to swap out instances of profanity with less offensive synonyms. For example, "ass" might become "butt", "shit" might become "poop" or "feces", and so on. To work correctly, the script should have looked for instances of profanity between word boundaries -- i.e., profanity surrounded on both sides by spaces or punctuation. The script in question did not.

The result was hilarious. Not only was "ass" changed to "butt," but any word that contained the letters "ass" were transformed as well! The word "classic" was mangled as "clbuttic."

The mistake was an easy one to make. In fact, other programmers made the same mistake and searches for "clbuttic" turn up thousands of instances of the term on dozens of independent websites. Searching around, one can find references to a mbuttive music quiz, a mbuttive multiplayer online game, references to how the average consumer is a pbutterby, a transit pbuttenger executed by Singapore, Fermin Toro Jimenez (Ambbuttador of Venezuela), the correct way to deal with an buttailant armed with a banana, and much, much more.

You can even find a reference to how Hinckley tried to buttbuttinate Ronald Reagan!

Each error reveals the presence of an anti-profanity script; obviously, no human would accidentally misspell or mistake the words in question in any other situation! In each case, the existence of a designer and an often hidden intermediary is revealed. What's perhaps more shocking than this error is that fact that most programmers won't make this mistake when implementing similar systems. On thousands of websites, our posts and messages and interactions are "cleaned-up" and edited without our consent or knowledge. As a matter of routine, our words are silently and invisibly changed by these systems. Few of us, and even fewer of our readers, ever know the difference. While switching "ass" to "butt" may be harmless enough, it's a stark reminder of the power that technology gives the designers of technical systems to force their own values on their users and to frame -- and perhaps to substantively change -- the messages that their technologies communicate.

Okami Watermark Posted Mon, 02 Jun 2008

The blog Photoshop Disasters recently wrote a story about a small fiasco regarding cover art for the popular video game Okami.

Okami was originally released for the Sony Playstation 2 (PS2) in 2006. The developers of the game, Clover Studios closed up shop several months later. Here is the cover art for the PS2 game which is indicative of the unique sumi-e inspired game art.

Original Okami Cover

Despite Clover's failure, Okami won many award and was a commercial success. It was ported (i.e., made to run on a different platform) to the Nintendo Wii by a video game production house called Ready at Dawn and by the PS2 version's distributor Capcom. The Wii version was released in April, 2008. Here is the cover art for that version:

Okami Cover for Wii version

People looking closely at the cover of the Wii game noticed something strange right near the wolf's mouth. Here's a highlight with the area circled.

Watermark highlight

The blurry symbol near Okami's mouth was a watermark -- an artifact intentionally added to an image to denote the source of the picture and often to prevent others from taking undue credit. In fact, it was the logo for IGN -- a very large video game website and portal. As part of writing reviews, IGN frequently takes screenshots of games, watermarks them, and posts them on their website.

Sure enough, a little bit of digging on the IGN website revealed this high-resolution image from the cover, complete with IGN watermark in the appropriate place. Apparently, a designer working for Capcom had found it easier to use the images posted by IGN than to go and get the original art from the game itself.

Source image from the IGN website

This error revealed quite a bit about the process and constraints that the cover designers for the Wii version were working under. Rather than getting original source images -- which Capcom presumably owned -- they found it easier to take it from the Internet-available source. Through the error, the usually invisible process, people, and technologies involved in this type of artwork preparation were revealed.

Embarrassed by the whole affair, Capcom offered to replace the covers with non-watermarked ones -- free of charge.

Posted by mako | Permalink | , | View/Add Comments: 1
Sıkısınca Posted Mon, 28 Apr 2008

Last week saw the popularization of some older news about a misunderstanding, prompted by an error caused by technological limitations of mobile phones, that resulted in two deaths and three imprisonments. The whole sad story took place in Turkey. You can read the original story in the Turkish language Hürriyet.

Basically, Emine and her husband Ramazan Çalçoban has recently been separated and were feuding daily on their mobile phones and over SMS text messages. At one point, Ramazan sent a message saying, "you change the subject every time you get backed into a corner." The word for "backed into a corner" is sıkısınca. Notice the lack of dots on the i's in the word. The very similar sikisince -- spelled with dots -- means "getting fucked." Ramazan's mobile phone could not produce the "closed" dotless ı so it he wrote the word with dots and sent it anyway. Reading quickly, Emine misinterpreted the message thinking that Ramazan was saying, "you change the subject every time they are fucking you." Emine showed the message to her father and sisters who, outraged that Ramazan was calling Emine a whore, attacked Ramazan with knifes when he showed up at the house later. In the fight, Ramazan stabbed Emine back and she later died of bleeding. Ramazan committed suicide in jail and Emine's father and sisters were all arrested.

This is certainly the gravest example of a revealing error I've looked at yet and it stands as an example of the degree to which tiny technological constraints can have profound unanticipated consequences. In this case, the lack of technological support for characters used in Turkish resulted in the creation of text that was deeply, even fatally, ambiguous.

Of course, many messages sent with SMS, email, or chat systems are ambiguous. Emoticons are an example of a tool that society has created to disambiguate phrases in text-based chatting and their popularity can tell us a lot about what purely word-based chatting fails to convey easily. For example, a particular emoticon might be employed to help convey sarcasm in a statement that would have been obvious through tone of voice. One can think of verbal communication as happening over many channels (e.g., voice, facial expressions, posture, words, etc). Text-based communication technologies provide certain new channels that may be good at conveying certain types of messages but not others. Emoticons, and accents or diacritical marks for that matter, are an attempt to concisely provide missing information that might be taken for granted in spoken conversation.

Any communication technology conveys certain things better than others. Each provides a set of channels that convey some types of messages but not others. The result of a shift toward many technologies is lost channels and an increase in ambiguity.

In spoken Turkish, the open and closed i sounds are easily distinguishable. In written communication, however, things become more difficult. Some writing system are better at conveying these tonal differences. Hebrew, for example, historically contained no vowels at all! And yet, the consequence of not conveying these differences can be profound. As a result, Turkish speakers frequently use diacritics and the open and closed i notation to disambiguate phrases like the one at the center of this saga. Unfortunately the open and closed i technology is not always available to communicators. Notably, it was not available on Ramazan's mobile phone.

People in Turkey have ways of coping with the lack of accents and diacritical marks. For example, some people would choose to write sıkısınca as SIKISINCA because the capital I in the Roman alphabet has no dot. Emoticons are similar in that they are created by users to work around limitations of the system to convey certain messages and to disambiguate others. In these ways and others, users of technologies find creative ways of working with and around the limitations and affordances imposed on then.

With time though, the users of emoticons and all-caps Turkish words stop seeing and thinking about the limitations that these tactics expose in their technology. In fact, it is only through errors that these limitations become familiar again. While we cannot undo the damage done by Ramazan, Emine and her family, we can "learn from their errors" and reflect on the ways that the limits imposed by our communication technology frames and affects our communications and our lives.

Posted by mako | Permalink | , | View/Add Comments: 3
Interpolation Posted Sun, 20 Apr 2008

One set of errors that almost everyone has seen -- even if they don't know it -- involve the failure of a very common process in computer programming called interpolation. While they look quite different, both of the following errors -- each taken from the Daily WTF's Error'd Series -- represent an error whose source would be obvious to most computer programmers.

You Saved a total of {@Total-Tkt-Discount} off list prices.

The file

The term interpolation, of course, is not unique to programmers. It is a much older term that was historically used to describe errors in hand-copied documents. Interpolation in a manuscript refers to text not written by an original author that was inserted over time -- either through nefarious adulteration or just by accident. As texts were copied by hand, this type of error ended up happening quite frequently! In its article on manuscript interpolation, Wikipedia describes one way that these errors occurred:

If a scribe made an error when copying a text and omitted some lines, he would have tended to include the omitted material in the margin. However, margin notes made by readers are present in almost all manuscripts. Therefore a different scribe seeking to produce a copy of the manuscript perhaps many years later could find it very difficult to determine whether a margin note was an omission made by the previous scribe (which should be included in the text), or simply a note made by a reader (which should be ignored or kept in the margin).

But while manuscript interpolation described a type of error, interpolation in computer programming refers to a type of text swapping that is fully intentional.

Computer interpolation happens when computers create customized and contextualized messages -- and they do so constantly. Whereas a newspaper or a book will be the same for each of its readers, computers create custom pages designed for each user -- you see these all the time as most messages that computers print are, in some way, dynamic. In many cases, these dynamic messages are created through a process called string or variable interpolation. For those who are unfamiliar with the process, an explanation of the errors above can reveal the details.

In the first example, the receipt read (emphasis mine):

You Saved a total of {@Total-Tkt-Discount} off list prices.

In fact, the computer is supposed to swap out the phrase {@Total-Tkt-Discount} for the value of a variable called Total-Tkt-Discount. The {@SOMETHING} syntax is one programming language's way of signifying to the computer, "take the variable called SOMETHING and use its value in this string instead of the everything between (and including) the curly braces." Of course, something didn't quite work right and the unprocessed -- or uninterpolated -- text was spit out instead. With this error, the computer program that is supposed to be computing our ticket price was revealed. Additionally, we have a glimpse into the program, its variable names, and even its programming language.

The second error from a (not very helpful) dialog box in Mozilla Firefox is a more complicated but fundamentally similar example (emphasis mine):

The file "#3" is of type #2 (#1), and #4 does not know how to handle this file type.

The numbers, in this case, reflect a series of variables. The dialog is supposed to be passed a list of values including the file name (#3), the file type (#2 and #1), and the name of the program that is trying to open it (#4). This list is supposed to be swapped in from placeholder values -- interpolated -- before any user sees it. Again, something went wrong here and a user was presented with the empty template that only the programmer and the program are ever supposed to see.

Nearly every message a computer or a computerized system presents us will be processed and interpolated in this way. In this sense, computer programs act as powerful intermediaries processing and displaying data. Perhaps more importantly, interpolation reveals just how limited computers' expression really is. These messages are not more complicated than simple fill-in-the-blank messages. Simple as they may be, they are entirely typical of the way that computers communicate with us.

From a user's perspective, it's easy to imagine sophisticated systems creating and presenting highly dynamic messages to us -- or to simply not think about it at all. In reality, few computer programs' ability to communicate with us is more sophisticated than a game of Mad Libs. The simplicity of these systems, the limitations that they impose on what computers can and can't say, and the limitations they place on we can and can't say with computers, are revealed through these simple, common, interpolation errors. To understand all of this, we need only recognize these errors and reflect on what they might reveal.

The Cupertino Effect Posted Mon, 10 Mar 2008

I recently wrote about spellcheckers and profanity. Of course, spellcheckers are the site of many other notable revealing errors.

One well-known class of errors is called the Cupertino Effect. The effect is named after an error caused by the fact that some early spellchecker wordlists contained the hyphenated co-operation but not cooperation (both are correct while the former is less common). The ultimate effect, due to the fact that spellchecking algorithms treat hyphenated words as separate words, was that several spellcheckers would suggest Cupertino as a substitute for the "misspelled" cooperation. As the lone suggestion, some people "corrected" cooperation to Cupertino in haste. The weblog Language Log noticed that quite a few people made the mistake in official documents from the UN, EU, NATO and more! These included the following examples found in real documents:

Within the GEIT BG the Cupertino with our Italian comrades proved to be very fruitful. (NATO Stabilisation Force, "Atlas raises the world," 14 May 2003)

Could you tell us how far such policy can go under the euro zone, and specifically where the limits of this Cupertino would be? (European Central Bank press conference, 3 Nov. 1998)

While Language Log authors were incredulous about the idea that there might be spellchecking dictionaries that contain the word Cupertino and not the unhyphenated co-operation, a reader sent in this screenshot from Microsoft Outlook Express circa 1996 using a Microsoft word list from Houghton Mifflin Company. Sure enough, they'd found the culprit.

Cupertino spellchecker screenshot.

Of course, the Cupertino effect is by no means limited to the word cooperation. The Oxford University Press also points out how the Cupertino Effect can rear its head when foreign words and proper nouns are involved. This lead to Reuters referring to Pakistan's Muttahida Quami Movement as the Muttonhead Quail Movement and to Rocky Mountain News naming Leucadia National as La-De-Da National instead. To top that off, Language Log found examples of confusion that led to discussion of copulation which make Cupertino look entirely excusable:

The Western Balkan countries confirmed their intention to further liberalise trade amongst each other. They requested that they be included in the pan-european system of diagonal copulation, which would benefit trade and economic development. (International Organization for Migration, Foreign Ministers Meeting, 22 Nov. 2004)

Of course, the Cupertino Effect is possible every time any spellchecking correction is suggested and the top result is incorrect. As a result, many common misspellings open the door to humorous errors. In a follow-up post, Language Log pointed out if one leaves the "i" off "identified", Microsoft Word 97 will give exactly one suggestion: denitrified which describes the the state of having nitrogen removed. That has led newspapers to report that, "Police denitrified the youths and seized the paintball guns." Which seems unlikely. Similarly, if you leave out the "c" from acquainted, spellcheckers frequently suggest aquatinted as a substitute. As the Oxford University Press blogs pointed out, folks who want to get aquatinted do not often want to be etched with nitric acid!

You can find parallels to the Cupertino effect in the Bucklame Effect I discussed previously. Many of the take-away lessons are the same. Spellcheckers make it easier to say some things correctly and place an additional cost on others. The effect our communication may be subtle but it's real. For example, a spelling mistake might be less forgivable in an era of spellcheckers. Like many communication technologies spellcheckers are normally invisible in the documents they create; nobody is reminded of spellcheckers by a perfectly spelled document. It is only through errors like the Cupertino effect that spellcheckers are revealed.

Further, these nonsensicle suggestions are made only because of the particular way that spellcheckers are built. Microsoft's Natural Language team is apparently working on "contextual" spellcheckers that will be smart enough to guess that you probably don't mean "Cupertino" when you mean cooperation. Of course other errors will remain and new ones will be introduced.

Mojibake Posted Mon, 25 Feb 2008

One of my favorite Japanese words is mojibake (文字化け) which literally translates as "character changing." The term is used to describe an error experienced frequently by computers users who read and write non-Latin scripts -- like Japanese. When readers of non-Latin scripts open a document, email, web page, or some other text, text is sometimes displayed mangled and unreadable. Japanese speakers refer to the resulting garbage as "mojibake." Here's a great example from the mojibake article in Wikipedia (the image is supposed to be in Japanese and to display the the Mojibake article itself).

The UTF-8-encoded Japanese Wikipedia article for mojibake, as
displayed in the Windows-1252 ('ISO-8859-1') encoding.

The problem has been so widespread in Japanese that webpages would often place small images in the top corners of pages that say "mojibake." If a user cannot read the content on the page, the image links to pages which will try to fix the problem for the user.

From a more technical perspective, mojibake might be better described as, "incorrect character decoding," and it hints at a largely hidden part of the way our computers handle text that we usually take for granted.

Of course, computers don't understand Latin or Japanese characters. Instead they operate on bits and bytes -- ones and zeros that represent numbers. In order to input or or output text, computer scientists created mappings of letters and characters to numbers represented by bits and bytes. These mappings end up forming a sequence of characters or letters in a particular order often called a character set. To display two letters, a computers might ask for the fifth and tenth characters from a particular set. These character sets are codes; they map numbers (i.e., positions in the list) to letters just as Morse code maps dots and dashes to letters. Letters can be converted to numbers by a computer for storage and then converted back to be redisplayed. The process is called character encoding and decoding and it happens every time a computer inputs or outputs text.

While there may be some natural orderings, (e.g., A through Z), there are many ways to encode or map a set of letters and numbers (e.g., Should one put numbers before letters in the set? Should capital and lowercase letters be interspersed?). The most important computer character encoding is a ASCII which was first defined in 1963 and is the de facto standard for almost all modern computers. It defines 128 characters including the letters and numbers used in English. But ASCII says nothing about how one should encode accented characters in Latin, scientific symbols, or the characters in any other scripts -- they are simply not in the list of letters and numbers ASCII provides and no mapping is available. Users of ASCII can only use the characters in the set.

Left with computers unable to represent their languages, many non-English speakers have added to and improved on ASCII to create new encodings -- different mappings of bits and bytes to different sets of letters. Japanese text can frequently be found in encodings with obscure technical names likes EUC-JP, ISO-2022-JP, Shift_JIS, and UTF-8. It's not important to understand how they differ -- although I'll come back to this in a future blog post. It's merely important to realize that these each represents different ways to map a set of bits and bytes into letters, numbers, and punctuation.

For example The set of bytes that says "文字化け" (the word for "mojibake" in Japanese) encoded in UTF-8 would show up as "��絖�����" in EUC-JP, "������������" in ISO-2022-JP, and "文字化け" in ISO-8859-1. Each of the strings above is a valid decoding of identical data -- the same ones and zeros. But of course, only the first is correct and comprehensible by a human. Although the others are are displaying the same data, the data is unreadable by humans because it is decoded according to a different character sets mapping! This is mojibake.

For every scrap of text that a computer shows to or takes from a human, the computer needs to keep track of the encoding the data is in. Every web browser must know the encoding of the page it is receiving and the encoding that it will be displayed to the user in. If the data sent is a different format than the one that will be displayed, the computer must convert the text from one encoding to another. Although we don't notice it. Encoding metadata is passed along with almost every webpage we read and every email we send. Data is being converted between encodings millions of times each day. We don't even notice that text is encoded -- until it doesn't decode properly.

Mojibake makes this usually invisible process extremely visible and provides an opportunity to understand that our text is coded -- and how. Encoding introduces important limitations -- it limits our expression to the things that are listed in pre-defined character sets. Until the creation of an encoding called Unicode, one couldn't mix Japanese and Thai in the same document; while there were encodings for both, there were no character sets that encoded the letters for both. Apparently, in Chinese, there are older more obscure characters that no computers can encode yet. Computer users simply can't write these letters on computers. I've seen computers users in Ethiopia emailing each other in English because support for Amharic encodings at the time was so poor and uneven! All of these limits, and many more, are part and parcel of our character encoding systems. They become visible only when the usually invisible process of character encoding is thrust into view. Mojibake provides one such opportunity.

Bad Signs Posted Wed, 13 Feb 2008

I caught another revealing crash screen over on The Daily WTF.

Travelex Crash Screen

Although the folks at WTF did not draw attention to the fact, a close examination revealed that the dialog box on the crashed screen is rotated 90 degrees.

If you step back and look at the sign, it makes sense. The folks at Travelex wanted a tall poster-sized electronic bulletin board to display currency information and promotions. Unfortunately long screens are rare and LCD screens of usual sizes are extremely expensive. Travelex appears to have done the very sensible thing of taking a readily available and low-cost wide-screen LCD television, turned it on its side, and hooked it up to a computer.

Of course, screens have tops and bottoms. To display correctly on a sideways screen, a computer needs to be configured to display information sideways -- a non-trivial tasks on many systems. If you look a the Windows "Start" menu and task-bar along the right side (i.e., bottom) of the screen and the shape of the dialog, it seems that Travelex simply didn't bother. They used the screen to display images, or sequences of images and found it easy enough to simply rotate each of the images to be display 90 degrees as well. They simply showed a full-screen slide-show of sideways images on their sideways screen. And no user ever noticed until the system crashed.

It's a neat trick that many users might find useful but most would not think to do. Although they might after seeing this crash!

A close-up of the screen reveals even more.

Travelex Crash Screen Closeup

Apparently, the dialog has popped up because the computer running the sign has a virus! Viruses are usually acquired through user interaction with a computer (e.g., opening a bad attachment) or through the Internet. It seems likely that the computer is plugged into the Internet -- perhaps the slide-show is updated automatically -- or that the image is being displayed from a computer used to do other things. In any case, it's a worrying "sign" from a financial services company.

Posted by mako | Permalink | , | View/Add Comments: 3
Picture of a Process Posted Thu, 07 Feb 2008

I enjoyed seeing this image in an article in The Register.

finger shown in Google book

The picture is a screen shot from Google Books viewing a page from a 1855 issue of The Gentleman's Magazine. The latex-clad fingers belong to one of the people whose job it is to scan the books for Google's book project.

Information technologies often hide the processes that bring us the information we interact with. Revealing errors give a picture of what these processes look like or involve. In an extremely literal way, this errors shows us just such a picture.

We can learn quite a lot from this image. For example, since the fingers are not pressed against glass, we might conclude that Google is not using a traditional flatbed scanner. Instead, it is likely that they are using a system similar to the one that the the Internet Archive has built that is designed specifically for scanning books.

But perhaps the most important thing that this error reveals is something we know, but often take for granted -- the human involved in the process.

The decision on where to automate a process, and where leave it up to a human, is sometimes a very complicated one. Human involvement in a process can prevent and catch many types of errors but can cause new ones. Both choices introduce risks and benefits. For example, an automated bank transaction system may allow human to catch obvious errors and to detect suspicious use that a computer without "common sense" might miss. On the other hand, a human banker might commit fraud to try to enrich themselves with others money -- something a machine would never do.

In our interaction with technological systems, we rarely reflect on the fact, and the ways, that the presence of humans in these areas is important to determining the behavior, quality, reliability, and the nature and degree of trust that we have in a technology.

In our interactions with complex processes through simple and abstract user interfaces, it is often only through errors -- distinctly human errors, if not usually quite as clearly human as this one -- that information workers' important presence is revealed.

Wordlists and Profanity Posted Tue, 29 Jan 2008

Revealing errors are a way of looking at the fact that a technology's failure to deliver a message can tell us a lot. In this way, there's an intriguing analogy one can draw between revealing errors and censorship.

Censorship doesn't usually keep people from saying or writing something -- it just keeps them from communicating it. When censorship is effective, however, an audience doesn't realize that any speech ever occurred or that any censorship has happened -- they simply don't know something and, more importantly perhaps, don't know that they don't know. As with invisible technologies, a censored community might never realize their information and interaction with the world is being shaped by someone else's design.

I once was in an cafe with a large SMS/text message "board." Patrons could send an SMS to a particular number and it would be displayed on a flat-panel television mounted on the wall that everyone in the restaurant could read. I tested to see if there was a content filter and, sure enough, any message that contained a four-letter word was silently dropped; it simply never showed up on the screen. As the censored party, the failure of my message to show up on the board revealed a censor. Further testing and my success in posting messages with creatively spelled profanity, numbers instead of letters, and the construction of crude ASCII drawings revealed the censor as a piece of software with a blacklist of terms; no human charged with blocking profanity would have allowed "sh1t" through. Through the whole process, the other patrons in the cafe, remained none-the-wiser; they never realized that the blocked messages had been sent.

This desire to create barriers to profanity is widespread in communication technologies. For example, consider the number of times have you been prompted by a spellchecker to review and "fix" a swear word. Offensive as they may be, "fuck" and "shit" are correctly spelled English words. It seems highly unlikely that they were excluded from the spell-checker's wordlist because the compiler forgot them. They were excluded, quite simply, because their were deemed obscene or inappropriate. While intentional, these words' omission results in the false identification of all cursing as misspelling -- errors we've grown so accustomed to that they hardly seem like errors at all!

Now, unlike a book or website which more impressionable children might read, nobody can be expected to find a four-letter word while reading their spell-checking wordlist. These words are not included simply because our spell-checker makers think we shouldn't use them. The result is that every user who writes a four-letter-word must add that word, by hand, to their "personal" dictionary -- they must take explicit credit for using the term. The hope, perhaps, is that we'll be reminded to use a different, more acceptable word. Every time this happens, the paternalism of the wordlist compiler is revealed.

Connecting back to my recent post on predictive text, here's a very funny video of Armstrong and Miller lampooning the omission of four-letter words from predictive text databases that make it more difficult to input profanity onto mobile phones (e.g., are you sure you did not mean "shiv" and "ducking"?). You can also or download the video in OGG Theora if you have trouble watching it in Flash.

There's a great line in there: "Our job ... is to offer people not the words that they do use but the words that they should use."

Most of the errors described on this blog reveal the design of technical systems. While the errors in this case do not stem from technical decisions, they reveal a set of equally human choices. Perhaps more interestingly, the errors themselves are fully intended! The goal of swear-word omission is, in part, the moment of reflection that a revealing error introduces. In that moment, the censors hope, we might reflect on the "problems" in our coarse choice of language and consider communicating differently.

These technologies don't keep us from swearing any more than other technology designers can control our actions -- we usually have the option of using or designing different technologies. But every technology offers affordances that make certain things easier and others more difficult. This may or not be intended but it's always important. Through errors like those made by our prudish spell-checker and predictive text input systems, some of these affordances, and their sources, are revealed.

Bucklame and Predictive Text Input Posted Sun, 27 Jan 2008

I recently heard that "Bucklame," apparently a nickname for New Zealand's largest city Auckland, has its source in a technical error that is dear to my heart. It seems that it stems from the fact that many mobile phones' predictive text input software will suggest the term "Bucklame" if a user tries to input "Auckland" -- the latter of which was apparently not in its list of valid words.

In my initial article on revealing errors, I wrote a little about the technology at the source of this error: Tegic's (now Nuance's) T9 predictive text technology which is a frequent way that users of mobile phones with normal keypad (9-12 keys) can quickly type in text messages with 50+ letters, numbers and symbols. Here is how I described the system:

Tegic’s popular T9 software allows users to type in words by pressing the number associated with each letter of each word in quick succession. T9 uses a database to pick the most likely word that maps to that sequence of numbers. While the system allows for quick input of words and phrases on a phone keypad, it also allows for the creation of new types of errors. A user trying to type me might accidentally write of because both words are mapped to the combination of 6 and 3 and because of is a more common word in English. T9 might confuse snow and pony while no human, and no other input method, would.

Mappings of number-sequences to words are based on database that offers words in order of relative frequency. These word frequency lists are based on a corpus of text in the target language pre-programmed into the phone. These corpora, at least initially, were not based on the words people use to communicate using SMS but one a more readily available data source (e.g., in emails or memos of in fiction). This leads to problems common to many systems that built on shaky probabilistic models: what is likely in one context may not be as likely in another. For example, while "but" is an extremely common English word, it might be much less common in SMS where more complex sentence structures are often eschewed due to economy of space (160 character messages) and laborious data-entry. The word "pony" might be more common than "snow" in some situations but it's certainly not in my usage!

Of course, proper nouns, of which there are many, are often excluded from these systems as well. Since the T9 system does not "know" the word "Auckland", the nonsensical compound-word "bucklame" seems to be an appropriate mapping for the same number-sequence. Apparently, people liked the error so much they kept using itand, with time perhaps, it stops being an error at all.

As users move to systems with keyboards like Blackberries, Treos, Sidekicks, and iPhones (which use a dual-mode system) these errors become impossible. As a result, the presence of these types of errors (e.g., a swapped "me" and "of") can tell communicators quite a lot about the type of device they are communicating with.

Posted by mako | Permalink | , | View/Add Comments: 9
Creating Kanji Posted Tue, 15 Jan 2008

Errors reveal characteristics of the languages we use and the technologies we use to communicate them -- everything from scripts and letter forms (which while very fundamental to written communication are technologies nonetheless) to the computer software we use to create and communicate text.

I've spent the last few weeks in Japan. In the process, I've learned a bit about the Japanese language; no small part of this through errors. Here's one error that taught me quite a lot. The sentence is shown in Japanese and then followed by a translation into English:

今年から貝が胃に棲み始めました。
This year, a clam started living in my stomach.

Needless to say perhaps, this was an error. It was supposed to say:

今年から海外に住み始めました。
This year, I started living abroad.

When the sentences are translated into romaji (i.e., Japanese written in an Roman script) the similarity becomes much more clear to readers that don't understand Japanese:

Kotoshikara kaiga ini sumihajimemashita.
Kotoshikara kaigaini sumihajimemashita.

Kotoshikara means "since this year." Sumihajimemashita means, "has started living." The word kaigaini means "abroad" or "overseas." Kaiga ini (two words) means "clam in stomach." When written phonetically in romaji, the only difference in the two sentences lie in the introduction of a word-break in the middle of "kaigaini." Written out in Japanese, the sentences are quite different; even without understanding, one can see that more than a few of the characters in the sentences differ.

In English word spacing plays an essential role in making written language understandable. Japanese, however, is normally written without spaces between words.

This isn't a problem in Japanese because the Japanese script uses a combination of logograms -- called kanji -- and phonetic characters -- called hiragana and katakana or simply kana -- to delimit words and to describe structure. The result, to Japanese readers, is unambiguous. Phonetically and without spaces, the two sentences are identical in either kana or romaji:

ことしからかいがいにすみはじめました。
Kotoshikarakaigainisumihajimemashita.

In purely phonetic form, the sentence is ambiguous. Using kanji, as shown in the opening examples, this ambiguity is removed. While phonetically identical, "kaigaini" (abroad) and "kaiga ini" (clam in stomach) are very different when kanji is used; they are written "海外に" and "貝が胃に" respectively and are not easily confusable by Japanese readers.

This error, and many others like it, stems from the way that Japanese text is input into computers. Because there are more than 4,000 kanji in frequent use in Japan, there simply are not enough keys on a keyboard to input kanji directly. Instead, text in Japanese is input into computers phonetically (i.e., in kana) without spaces or explicit word boundaries. Once the kana is input, users then transform the phonetic representation of their sentence or phrase into a version using the appropriate kanji logograms. To do so, Japanese computer users employ special software that contains a database of mappings of kana to kanji. In the process, this software makes educated guesses about where word boundaries are. Usually, computers guess correctly. When computers get it wrong, users need to go back and tweak the conversion by hand or select from other options in a list. Sometimes, when users are in a rush, they use an incorrect kana to kanji conversion. It would be obvious to any Japanese computer users that this is precisely what happened in the sentence above.

This type of error has few parallels in English but is extremely common in Japanese writing. The effects, like this one, are often confusing or hilarious. For a Japanese reader, this error reveals the kana to kanji mapping system and the computer software that implements it -- nobody would make such a mistake with a pen and paper. For a person less familiar with Japanese, the error reveals a number of technical particularities about the Japanese writing system and, in the process, about the ways in Japanese differs from other languages they might speak.

Precision Expiration Posted Sat, 22 Dec 2007

Here is a photograph (and a closeup) of a bag of pretzels I was given on a cross-country plane trip today.

Bag of Synder's Pretzels Big and Closeup

When I first saw "May 11 DC20 2008 00:12," I thought, "Wow! That's an extremely precise expiration date!" In transit over several time zones I then thought, what time zone do they mean?

Of course, expiration dates are ballpark figures that mark thresholds in the gradual process of product degradation. They are arbitrary, of course. It's not as if these pretzels will be great on May 10th and inedible two days later. Unless the pretzels have been set to self-destruct, the addition of an expiration hour and an expiration minute seems, well, unnecessary.

What's happened here is a design error. The label is, in fact, two different types of data printed in two separate columns. "May 11 2008" is the expiration date. "DC20 00:12" is the number of the machine or production line that produced the bag and the time at which the pretzels were made. Taken together, the information can be used by the producer, Synder's of Hanover, for quality control purposes to find out what machines, workers, and batches of supplies produced a particular bag of pretzels. In all likelihood, Snyder's prints these labels with a system that, for cost reasons, tries to minimize the amount of printed area on each bag.

For Snyder's employees familiar with the system, the labels are completely clear. But those of us not familiar with the system are left confused. Error can be thought of as the chasm between user expectations and technical interaction. Like most of the errors I discuss here, this flub represents failed communication and reveals the mediating technologies.

Cross Site Descripting Posted Wed, 28 Nov 2007

Blogger Jordan Wiens recently noticed a funny thing about the Apple website. When one tries to search for "applescript" (Apple's scripting and automation product) on Apple's website, they end up with this search result:

Applescript search results from Apple.com

Until the issue is fixed, you can see for yourself by navigating to http://www.apple.com/search/?q=applescript.

On the search result page, the Apple search software seems to change the term "applescript" into "apple." A search for the term "apple" on the Apple website is, as one might imagine, not a particularly useful way to find information about Applescript. To most users, this error is confounding. To a trained eye, it reveals an overzealous security system attempting to prevent what's called cross-site scripting or XSS -- a way that spammers, phishers, and nefarious system-crackers can sneakily work around privacy and security systems in web browsers by exploiting two features of modern web browsers.

First, through the use of a programming language called Javascript, many web pages run small computer programs inside users' browsers. These Javascript programs allow for applications that are more responsive than would have been possible before (think Google Maps for a good example). Running random programs is risky, of course. To protect users and their privacy, web browsers limit Javascript programs in several ways. One common technique is to limit access granted to a Javascript program from a given website to information from the site the Javascript originated at. This security system is designed to bar one website's programs from accessing and relaying sensitive information, like login information or credit card numbers, from another website.

Second, a large number of applications allow input from users that is subsequently displayed on web pages. This can come in the form of edits and additions to Wikipedia pages, comments on forums, articles, or blogs, or even the fact that when you run a web search, the search terms are displayed back to you at the top of your page.

A security vulnerability, it turns out, lies in the combination of the two features. This vulnerability, XSS, happens when a nefarious user embeds small Javascript programs in input (e.g., a comment) which is run each time a page is subsequently viewed. Masquerading to the browser as a legitimate script created by the website creator, these programs can access sensitive information from the website stored on the user's computer (e.g., login information) and then send this information to the author of the script without the violated user's permission or knowledge.

When an attacker executes an XSS attack, they do so by trying to include Javascript in input that will be displayed to the user. This usually comes in the form of:

    <script>some code send to private information</script>

In HTML, the "<script>" and "</script>" tags signify to the web browser that the text between is a program to be run.

XSS has become a large problem. To combat and prevent it, web developers take great care to protect their users and their applications from attacks by blocking, removing, or disabling attempts to include programs in user input. One frequently employed method of doing so is to simply remove the "<script>" tags that cause programs to be run. Without the tags, malicious code may remain, but will never be executed on users' computers.

With this knowledge of XSS we can begin to understand the puzzling behavior of Apple's website. By trying several other searches, we can confirm that Apple's search engine is, in fact, removing all mentions of the term "script" from input to the site. The system is almost certainly designed to block XSS. While it is likely to succeed in doing so, the side effects, in the case of users searching for Applescript, are extremely inconvenient.

Through the error, Apple reveals their overzealous system designed to prevent XSS. Those who dig deeper to understand the source of this initially baffling behavior can gain new respect for implicit trust that that our browsers give to code on the websites we visit and the ways in which this trust can be abused.

In all likelihood, we have all been the victims of XSS attacks as users -- although most of us have been lucky enough to avoid divulging sensitive information in the process. Apple's error represents "collateral damage" in a a war fought between crackers, spammers, phishers on one side and web applications developers on the other. While we are rarely aware of it, this battle affects the way our web applications are designed and the features they do, and do not, include. We are, indirectly, affected by XSS even when we're not looking for information on Applescript. By revealing one anti-XSS security system, Apple's mistep points to that fact.

Thunderbird and the Nature of Spam Posted Tue, 20 Nov 2007

I found this beautiful and simple example of a revealing error featured in the fantastic (and very funny) Error'd series on Worse Than Failure:

Thunderbird showing it's own welcome message as spam.

My guess is that before most users start the Mozilla Thunderbird email client for the first time, they don't know that the software has a spam detection feature. That said, when the welcome message that automatically shows up in the inbox of every new Thunderbird user is prefixed by a notice that the message in question might be "junk,", users' ignorance on the matter will quickly be put to rest!

Of course, much more than the simple existence of the spam-flagging system is revealed by this error. With a little reflection, we can infer some of the criteria that Thunderbird must be using to sort spam or junk from legitimate email. Most mail systems, including Thunderbird use a variety of methods which, in aggregate, are used to determine the likelihood of a message being spam. Thunderbird's welcome message is not addressed directly to the user in question and it makes extensive use of rich-text HTML and images -- both common characteristics to spam.

Central to most modern spam-checkers is a statistical analysis of words used in the content of the email. Since spammers are trying to communicate a message, a prevalence of certain words and an absence of others is usually sufficient to sort out the junk. Sure enough, the Thunderbird welcome message is written using rather impersonal and marketing-speak terms that would be less likely in personal email (e.g., offering "product information").

From the perspective of the Thunderbird developers, the flagging of this message as spam seems to be in error. From the perspective of the user though, it is not quite as clear. The Thunderbird message is both unsolicited and commercial in nature -- essentially the definition of spam. In the "looks like a duck" sense, it uses words that make it "read" like spam.

While this simple error can teach Thunderbird users about the existence and the nature of their spam-checker, it might also teach the folks responsible for the Thunderbird welcome message something about the way the their messages might seem to their users.

Identity Crisis Posted Fri, 09 Nov 2007

This error was revealed and written up by Karl Fogel.

Yesterday I received email from a hotel, confirming a reservation for a room. But it wasn't meant for me; it was meant for "Kathy Fogel" (whom I've never met), and was sent to "k.fogel@gmail.com".

Now, I do have the account "kfogel@gmail.com", but I'd never received email for "k.fogel" before. As I'd always thought "." was a significant character in email addresses, I didn't see how I could have gotten this mail. It turns out, though, that Google ignores "." when it's in the username portion of a gmail address. My friend Brian Fitzpatrick knew this already, and pointed me to Google's explanation. (I learned later that others have been suprised by this behavior too.)

So the error revealed a feature -- at least, I'm fairly sure Google would consider it a feature, although the exact motivation for it is still not clear to me. It might be a technical requirement caused by merging several legacy user databases, or it might simply be to prevent confusion among addresses that only differ by dots.

Anyway, I called the hotel, and eventually managed to make them understand that I had no idea who Kathy Fogel was, and that I'd accidentally gotten an email intended for her. They said they'd resend, and of course I said "Wait, no, it'll just come to me again!" But they swore they had a different email address on file for her, and indeed, I haven't gotten a second email.

Which raises another question: how did they send the mail to "k.fogel@gmail.com" in the first place? Clearly, Kathy Fogel cannot have that address, because Google will not allow any other "dot variants" of an address to be registered after the first. (Besides, if she did have that address, we'd be getting each other's mail all the time, and we're not.) It's also unlikely that she mistakenly given them that address herself, since they already had another address in place by the time I called.

A computer wouldn't substitute domain names in an email address like that. The only thing I can think of is that somehow, humans are, at least in some cases, intimately involved in sending out confirmation emails from DoubleTree hotels. I say "intimately" because this was no mere cut-and-paste mistake. Someone had to transcribe an email address by hand, and accidentally put "gmail.com" where the original said "yahoo.com" or "aol.com" or whatever.

I hope Kathy has a nice trip.

Posted by mako | Permalink | , | View/Add Comments: 9
Computer Generated Crossword Puzzles Posted Wed, 31 Oct 2007

There are two free daily newspapers in Boston. The Boston Metro and the Boston Now. Both run crossword puzzles. The Now runs a puzzle edited by Stanley Newman. The Metro's puzzle is unattributed. When my friend Seth Schoen was in town for several days, he did several crossword puzzles in the Metro. He pointed out to me that a clue in the crossword was repeated on two consecutive days. The crosswords in the Metro, he concluded, were computer generated.

I picked up the Metro each day for several weeks and, sure enough, there was a large amount of overlap in answers. "ALSO" and NIL" were answers three times in two weeks. More suggestive, however, were the clues. In all three instances of each repeated answer, the clues were the same. The clue for "ALSO" was always, "Part of a.k.a.," while the clue for "NIL" was "Zilch." Capitalization and punctuation, even for the uncapitalized "a.k.a.", was consistent. Despite the fact that there was some variation in clues, I found some answers with different clues on different days. The high degree of consistency was undeniable.

Unassisted by a computer, no human editor would use the same clue for puzzles two days in a row. Frequent reuse of clues makes puzzles too easy for regular players and slight variation in clues is easy for a human puzzle editor to do. But even if the puzzles had been written in a different order than they were run in the paper, it is unlikely that a puzzle maker would repeatedly have come up with the same clues. The chance of capitalization, phrasing, and style resulting in identical clue text is even more improbable. Humans simply aren't that consistent. Computers are. Through the reuse of the clues, a computerized provenance is revealed.

Perhaps a little ignorant, I'd always assumed that crosswords were human generated. In fact, computer generated crosswords are widespread. There have been published papers on computer generation of crosswords since the 1970s and a New York Times article on the subject was published in 1996 when the practice was beginning to take off. Computers are able to generate puzzles quickly and in quantity and, as a result, are in common use in magazines and on websites.

There's resistance, however, from both human crossword editors and from solvers who find computer generated puzzles unsatisfying. Great crossword puzzles, they argue, showcase wit and creativity with language; answers are often tied together by themes and wordplay. Computers excel at taking a database of answers and creating grids that match up correctly; they are much faster and more accurate than humans. But as the error that revealed the computer to my friend Seth illustrates, computers are less adept at varying when or how they employ answers and clues in puzzles.

Quoted in an article in Tulsa World, Mark Lagasse, senior executive editor with Dell Magazines, justified his magazine's choice to fund the more laborious human methods of crossword production saying, "with themes and the better, larger puzzles, it's best to have a constructor working them out and filling in the diagrams. A lot of the words are a bit more dry and boring when done with computers." Ultimately, he concludes, computer-generated puzzles simply are not as entertaining as those made by humans.

I did the crossword puzzles in both the Now and the Metro for a couple weeks and I agree with Lagasse. The human generated puzzles are less repetitive, more interesting, and ultimately more satisfying. The computer generated puzzles almost never use word play and have no thematic connections between answers or clues. Of course, I did both Metro and Now puzzles in the past and I always preferred the Now puzzles and found them more fun. But I would have been hard-pressed to justify my feelings. It was not until Seth pointed out the repeated clues, an error, that I was able to understand why I felt the way I did.

Posted by mako | Permalink | , | View/Add Comments: 4
Only Yesterday Posted Thu, 25 Oct 2007

I only recently stumbled across this old revealing error in the wonderful Doh, The Humanity! weblog:

It may seem like only yesterday
(Wednesday, 26 July) when...

In the days of newspapers and broadcast media, it was only likely for someone to read a news article on the day it was published. If the publication were weekly or monthly, it would be reasonable to expect readers got to it within the week or month. While libraries and others might keep archived versions, it was always clear to readers of archived material that their material -- and any relative dates mentioned therein -- were out of date.

Even today, news is still written primarily be consumed immediately and the vast majority of readers of an article will read it while it is fresh. But, websites have made archived material live on for months and years. While this is generally a good thing, it creates all sorts of problems for people who use relative dates in articles. The point of reference -- today -- becomes unstable. As a result, if an entertainment reporter describes a show as happening, "next Tuesday," it might appear to refer to any number of incorrect Tuesdays depending on when someone has stumbled across the archived version.

News companies have responded by converting relative dates into absolute ones. No doubt, this was often done by editors but today is also done by computer programs. These programs parse each news story looking for relative dates. When they find one, they compute the corresponding absolute date from the relative one, and add it into the text of the article in a parenthetical aside.

Most people, including myself, never knew or even imagined that articles were being parsed like this until the system screwed up as it did in the screenshot above. No human editor would have thought to provide an absolute date for "yesterday" in the phrase, "it may seem like only yesterday." With this misstep, the script at work is revealed. With the mistakes, the program's previous work -- hopefully more accurate and less noticeable in old articles -- becomes visible as well. Since seeing this image, I've noticed these date absolutefiers at work everywhere.

Posted by mako | Permalink | , | View/Add Comments: 2