<?xml version="1.0" encoding="utf-8"?>

<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title type="text">Revealing Errors</title>
<subtitle type="html"><![CDATA[
looking below the water
]]></subtitle>
<id>http://revealingerrors.com/</id>
<link rel="alternate" type="text/html" href="http://revealingerrors.com" />
<link rel="self" type="text/xml" href="http://revealingerrors.com/" />

<author>
<name>Benjamin Mako Hill</name>
<uri>http://revealingerrors.com/</uri>
<email>mako@atdot.cc</email>
</author>
<rights>Copyright 2006-7 Benjamin Mako Hill</rights>
<generator uri="http://pyblosxom.sourceforge.net/" version="1.3.2 2/13/2006">
PyBlosxom http://pyblosxom.sourceforge.net/ 1.3.2 2/13/2006
</generator>

<updated>2008-04-28T18:56:44Z</updated>
<!-- icon?  logo?  -->

<entry>
<title type="html">Sıkısınca</title>
<category term="" />
<id>http://revealingerrors.com/2008/04/28/turkish_sms_disaster</id>
<updated>2008-04-28T18:56:44Z</updated>
<published>2008-04-28T18:56:44Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/turkish_sms_disaster" />
<content type="html">
&lt;p&gt;Last week saw the popularization of &lt;a href=&quot;http://gizmodo.com/382026/a-cellphones-missing-dot-kills-two-people-puts-three-more-in-jail&quot;&gt;some older news&lt;/a&gt; about a
   misunderstanding, prompted by an error caused by technological
   limitations of mobile phones, that resulted in two deaths and three 
   imprisonments. The whole sad story took place in Turkey. You can read
   &lt;a href=&quot;http://www.hurriyet.com.tr/gundem/8748359.asp?gid=229&amp;sz=65966&quot;&gt;the original story&lt;/a&gt; in the Turkish language Hürriyet.
&lt;/p&gt;
&lt;p&gt;Basically, Emine and her husband Ramazan Çalçoban has recently been
   separated and were feuding daily on their mobile phones and over SMS
   text messages. At one point, Ramazan sent a message saying, &amp;quot;you change
   the subject every time you get backed into a corner.&amp;quot; The word for
   &amp;quot;backed into a corner&amp;quot; is &lt;em&gt;sıkısınca&lt;/em&gt;. Notice the lack of dots on the
   &lt;em&gt;i&apos;s&lt;/em&gt; in the word. The very similar &lt;em&gt;sikisince&lt;/em&gt; -- spelled with dots --
   means &amp;quot;getting fucked.&amp;quot; Ramazan&apos;s mobile phone could not produce the
   &amp;quot;closed&amp;quot; dotless &lt;em&gt;ı&lt;/em&gt; so it he wrote the word with dots and sent it
   anyway. Reading quickly, Emine misinterpreted the message thinking that
   Ramazan was saying, &amp;quot;you change the subject every time &lt;em&gt;they are fucking
you&lt;/em&gt;.&amp;quot; Emine showed the message to her father and sisters who, outraged
   that Ramazan was calling Emine a whore, attacked Ramazan with knifes
   when he showed up at the house later.  In the fight, Ramazan stabbed
   Emine back and she later died of bleeding. Ramazan committed
   suicide in jail and Emine&apos;s father and sisters were all arrested.
&lt;/p&gt;
&lt;p&gt;This is certainly the gravest example of a revealing error I&apos;ve looked
   at yet and it stands as an example of the degree to which tiny
   technological constraints can have profound unanticipated consequences.
   In this case, the lack of technological support for characters used in
   Turkish resulted in the creation of text that was deeply, even &lt;em&gt;fatally&lt;/em&gt;,
   ambiguous.
&lt;/p&gt;
&lt;p&gt;Of course, many messages sent with SMS, email, or chat systems are
   ambiguous. &lt;a href=&quot;http://en.wikipedia.org/wiki/Emoticon&quot;&gt;Emoticons&lt;/a&gt; are an example of a tool that society
   has created to disambiguate phrases in text-based chatting and their
   popularity can tell us a lot about what purely word-based chatting fails
   to convey easily. For example, a particular emoticon might be employed
   to help convey sarcasm in a statement that would have been obvious
   through tone of voice. One can think of verbal communication as
   happening over many channels (e.g., voice, facial expressions, posture,
   words, etc).  Text-based communication technologies provide certain new
   channels that may be good at conveying certain types of messages but not
   others. Emoticons, and accents or diacritical marks for that matter, are
   an attempt to concisely provide missing information that might be
   taken for granted in spoken conversation.
&lt;/p&gt;
&lt;p&gt;Any communication technology conveys certain things better than others.
   Each provides a set of channels that convey some types of messages but
   not others. The result of a shift toward many technologies is lost
   channels and an increase in ambiguity.
&lt;/p&gt;
&lt;p&gt;In spoken Turkish, the open and closed &lt;em&gt;i&lt;/em&gt; sounds are easily
   distinguishable. In written communication, however, things become more
   difficult. Some writing system are better at conveying these tonal
   differences. Hebrew, for example, historically contained no vowels at
   all! And yet, the consequence of not conveying these differences can be
   profound.  As a result, Turkish speakers frequently use diacritics and
   the open and closed &lt;em&gt;i&lt;/em&gt; notation to disambiguate phrases like the one at
   the center of this saga. Unfortunately the open and closed i technology
   is not always available to communicators. Notably, it was not available
   on Ramazan&apos;s mobile phone.
&lt;/p&gt;
&lt;p&gt;People in Turkey have ways of coping with the lack of accents and
   diacritical marks. For example, some people would choose to write
   sıkısınca as SIKISINCA because the capital &lt;em&gt;I&lt;/em&gt; in the Roman alphabet has
   no dot.  Emoticons are similar in that they are created by users to work
   around limitations of the system to convey certain messages and to
   disambiguate others. In these ways and others, users of technologies
   find creative ways of working with and around the limitations and
   affordances imposed on then.
&lt;/p&gt;
&lt;p&gt;With time though, the users of emoticons and all-caps Turkish words stop
   seeing and thinking about the limitations that these tactics expose in
   their technology.  In fact, it is only through errors that these
   limitations become familiar again. While we cannot undo the damage done
   by Ramazan, Emine and her family, we can &amp;quot;learn from their errors&amp;quot; 
   and reflect on the ways that the limits imposed by our communication
   technology frames and affects our communications and our lives.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Interpolation</title>
<category term="" />
<id>http://revealingerrors.com/2008/04/20/interpolation</id>
<updated>2008-04-20T15:33:10Z</updated>
<published>2008-04-20T15:33:10Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/interpolation" />
<content type="html">
&lt;p&gt;One set of errors that almost everyone has seen -- even if they don&apos;t
   know it -- involve the failure of a very common process in computer
   programming called interpolation. While they look quite
   different, both of the following errors -- each taken from the &lt;a href=&quot;http://thedailywtf.com/&quot;&gt;Daily
WTF&apos;s&lt;/a&gt; &lt;em&gt;Error&apos;d&lt;/em&gt; Series -- represent an error whose source
   would be obvious to most computer programmers.
&lt;/p&gt;
&lt;p&gt; &lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/interpolated_recpt.png&quot;
alt=&quot;You Saved a total of {@Total-Tkt-Discount} off list prices.&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt; &lt;a href=&quot;/images/interpolated_dialog_full.png&quot;&gt;
   &lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/interpolated_dialog.png&quot;
alt=&quot;The file &quot;#3&quot; is of type #2 (#1), and #4 does not know how to
handle this file type.&quot; /&gt;
   &lt;/a&gt; 
&lt;/p&gt;
&lt;p&gt;The term &lt;em&gt;interpolation&lt;/em&gt;, of course, is not unique to programmers. It is a
   much older term that was historically used to describe errors in
   hand-copied documents. Interpolation in a manuscript refers to text not
   written by an original author that was inserted over time -- either
   through nefarious adulteration or just by accident.  As texts were
   copied by hand, this type of error ended up happening quite frequently!
   In its article on &lt;a href=&quot;http://en.wikipedia.org/wiki/Interpolation_(manuscripts)&quot;&gt;manuscript interpolation&lt;/a&gt;, Wikipedia
   describes one way that these errors occurred:
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;If a scribe made an error when copying a text and omitted some lines,
   he would have tended to include the omitted material in the margin.
   However, margin notes made by readers are present in almost all
   manuscripts. Therefore a different scribe seeking to produce a copy of
   the manuscript perhaps many years later could find it very difficult
   to determine whether a margin note was an omission made by the
   previous scribe (which should be included in the text), or simply a
   note made by a reader (which should be ignored or kept in the margin).
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;But while manuscript interpolation described a type of error,
   interpolation in computer programming refers to a type of text swapping
   that is fully intentional.
&lt;/p&gt;
&lt;p&gt;Computer interpolation happens when computers create customized and
   contextualized messages -- and they do so constantly. Whereas a
   newspaper or a book will be the same for each of its readers, computers
   create custom pages designed for each user -- you see these all the time
   as most messages that computers print are, in some way, dynamic. In many
   cases, these dynamic messages are created through a process called
   string or variable interpolation. For those who are unfamiliar with the
   process, an explanation of the errors above can reveal the details.
&lt;/p&gt;
&lt;p&gt;In the first example, the receipt read (emphasis mine):
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;You Saved a total of &lt;em&gt;{@Total-Tkt-Discount}&lt;/em&gt; off list prices.
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;In fact, the computer is supposed to swap out the phrase
   &lt;em&gt;{@Total-Tkt-Discount}&lt;/em&gt; for the &lt;em&gt;value&lt;/em&gt; of a variable called
   &lt;em&gt;Total-Tkt-Discount&lt;/em&gt;. The &lt;em&gt;{@SOMETHING}&lt;/em&gt; syntax is one programming
   language&apos;s way of signifying to the computer, &amp;quot;take the variable called
   SOMETHING and use its value in this string instead of the everything
   between (and including) the curly braces.&amp;quot; Of course, something didn&apos;t
   quite work right and the unprocessed -- or uninterpolated -- text was
   spit out instead. With this error, the computer program that is supposed
   to be computing our ticket price was revealed.  Additionally, we have a
   glimpse into the program, its variable names, and even its programming
   language.
&lt;/p&gt;
&lt;p&gt;The second error from a (not very helpful) dialog box in Mozilla Firefox
   is a more complicated but fundamentally similar example (emphasis mine):
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The file &amp;quot;&lt;em&gt;#3&lt;/em&gt;&amp;quot; is of type &lt;em&gt;#2&lt;/em&gt; (&lt;em&gt;#1&lt;/em&gt;), and &lt;em&gt;#4&lt;/em&gt; does not know how to
   handle this file type.
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;The numbers, in this case, reflect a series of variables. The dialog is
   supposed to be passed a list of values including the file name (#3), the
   file type (#2 and #1), and the name of the program that is trying to
   open it (#4). This list is supposed to be swapped in from placeholder
   values -- interpolated -- before any user sees it. Again, something went
   wrong here and a user was presented with the empty template that only
   the programmer and the program are ever supposed to see.
&lt;/p&gt;
&lt;p&gt;Nearly every message a computer or a computerized system presents us
   will be processed and interpolated in this way. In this sense, computer
   programs act as powerful intermediaries processing and
   displaying data. Perhaps more importantly, interpolation reveals just
   how limited computers&apos; expression really is. These messages are not more
   complicated than simple fill-in-the-blank messages.  Simple as they may
   be, they are entirely typical of the way that computers communicate with
   us. 
&lt;/p&gt;
&lt;p&gt;From a user&apos;s perspective, it&apos;s easy to imagine sophisticated systems
   creating and presenting highly dynamic messages to us -- or to simply
   not think about it at all. In reality, few computer programs&apos; ability to
   communicate with us is more sophisticated than a game of Mad Libs. The
   simplicity of these systems, the limitations that they impose on what
   computers can and can&apos;t say, and the limitations they place on &lt;em&gt;we&lt;/em&gt; can
   and can&apos;t say with computers, are revealed through these simple, common,
   interpolation errors. To understand all of this, we need only recognize
   these errors and reflect on what they might reveal.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">The Cupertino Effect</title>
<category term="" />
<id>http://revealingerrors.com/2008/03/10/cupertino_effect</id>
<updated>2008-03-10T17:54:10Z</updated>
<published>2008-03-10T17:54:10Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/cupertino_effect" />
<content type="html">
&lt;p&gt;I recently wrote about &lt;a href=&quot;http://revealingerrors.com/wordlist_profanity&quot;&gt;spellcheckers and profanity&lt;/a&gt;. Of
   course, spellcheckers are the site of many other notable revealing
   errors.
&lt;/p&gt;
&lt;p&gt;One well-known class of errors is called the &lt;em&gt;Cupertino Effect&lt;/em&gt;.  The
   effect is named after an error caused by the fact that some early
   spellchecker wordlists contained the hyphenated &lt;em&gt;co-operation&lt;/em&gt; but not
   &lt;em&gt;cooperation&lt;/em&gt; (both are correct while the former is less common). The
   ultimate effect, due to the fact that spellchecking algorithms treat
   hyphenated words as separate words, was that several spellcheckers
   would suggest &lt;em&gt;Cupertino&lt;/em&gt; as a substitute for the &amp;quot;misspelled&amp;quot;
   &lt;em&gt;cooperation&lt;/em&gt;. As the lone suggestion, some people &amp;quot;corrected&amp;quot;
   &lt;em&gt;cooperation&lt;/em&gt; to &lt;em&gt;Cupertino&lt;/em&gt; in haste.  The weblog &lt;a href=&quot;http://itre.cis.upenn.edu/~myl/languagelog&quot;&gt;Language Log&lt;/a&gt;
   noticed that &lt;a href=&quot;http://itre.cis.upenn.edu/~myl/languagelog/archives/002911.html&quot;&gt;quite a few people made the mistake&lt;/a&gt; in official
   documents from the UN, EU, NATO and more! These included the following
   examples found in real documents:
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Within the GEIT BG the Cupertino with our Italian comrades proved to
   be very fruitful. (&lt;a href=&quot;http://www.nato.int/sfor/indexinf/articles/030514b/t030514b.htm&quot;&gt;NATO Stabilisation Force&lt;/a&gt;, &amp;quot;Atlas raises the
   world,&amp;quot; 14 May 2003)
&lt;/p&gt;
&lt;p&gt;Could you tell us how far such policy can go under the euro zone, and
   specifically where the limits of this Cupertino would be? (&lt;a href=&quot;http://www.ecb.int/press/key/date/1998/html/sp981103.en.html&quot;&gt;European
Central Bank press conference&lt;/a&gt;, 3 Nov. 1998)
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;While Language Log authors were incredulous about the idea that there
   might be spellchecking dictionaries that contain the word &lt;em&gt;Cupertino&lt;/em&gt;
   and not the unhyphenated &lt;em&gt;co-operation&lt;/em&gt;, a reader sent in this
   screenshot from Microsoft Outlook Express circa 1996 using a Microsoft
   word list from Houghton Mifflin Company. Sure enough, they&apos;d found the
   culprit.
&lt;/p&gt;
&lt;p&gt; &lt;img sytle=&quot;border: 1px black solid;&quot; src=&quot;/images/cupertino_screenshot.jpg&quot;
 alt=&quot;Cupertino spellchecker screenshot.&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;Of course, the Cupertino effect is by no means limited to the word
   &lt;em&gt;cooperation&lt;/em&gt;.  The Oxford University Press also &lt;a href=&quot;http://blog.oup.com/2007/11/spellchecker/&quot;&gt;points out&lt;/a&gt;
   how the Cupertino Effect can rear its head when foreign words and proper
   nouns are involved. This lead to Reuters &lt;a href=&quot;http://blogs.reuters.com/blog/2007/05/14/muttonhead-quail/&quot;&gt;referring&lt;/a&gt; to Pakistan&apos;s
   &lt;em&gt;Muttahida Quami Movement&lt;/em&gt; as the &lt;em&gt;Muttonhead Quail Movement&lt;/em&gt; and to
   Rocky Mountain News &lt;a href=&quot;http://www.rockymountainnews.com/drmn/opinion_columnists/article/0,2777,DRMN_23972_4388141,00.html&quot;&gt;naming&lt;/a&gt; &lt;em&gt;Leucadia National&lt;/em&gt; as &lt;em&gt;La-De-Da
National&lt;/em&gt; instead. To top that off, Language Log found examples of
   confusion that led to discussion of &lt;em&gt;copulation&lt;/em&gt; which make &lt;em&gt;Cupertino&lt;/em&gt;
   look entirely excusable:
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;The Western Balkan countries confirmed their intention to further
   liberalise trade amongst each other. They requested that they be
   included in the pan-european system of diagonal copulation, which
   would benefit trade and economic development. (&lt;a href=&quot;http://www.belgium.iom.int/News_Details.asp?sm=315&quot;&gt;International
Organization for Migration&lt;/a&gt;, Foreign Ministers Meeting, 22 Nov. 2004) 
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Of course, the Cupertino Effect is possible every time &lt;em&gt;any&lt;/em&gt;
   spellchecking correction is suggested and the top result is incorrect.
   As a result, many common misspellings open the door to humorous errors.
   In a follow-up post, Language Log &lt;a href=&quot;http://itre.cis.upenn.edu/~myl/languagelog/archives/003629.html&quot;&gt;pointed out&lt;/a&gt; if one leaves
   the &amp;quot;i&amp;quot; off &amp;quot;identified&amp;quot;, Microsoft Word 97 will give exactly one
   suggestion: &lt;em&gt;denitrified&lt;/em&gt; which describes the the state of having
   nitrogen removed.  That has led newspapers to report that, &amp;quot;Police
   &lt;strong&gt;denitrified&lt;/strong&gt; the youths and seized the paintball guns.&amp;quot; Which seems
   unlikely. Similarly, if you leave out the &amp;quot;c&amp;quot; from &lt;em&gt;acquainted&lt;/em&gt;,
   spellcheckers frequently suggest &lt;em&gt;aquatinted&lt;/em&gt; as a substitute. As the
   Oxford University Press blogs pointed out, folks who want to &lt;a href=&quot;http://www.google.com/search?q=get%7Cgot%7Cgetting-aquatinted&quot;&gt;get
aquatinted&lt;/a&gt; do not often want to be &lt;a href=&quot;http://en.wikipedia.org/wiki/Aquatint&quot;&gt;etched with nitric
acid&lt;/a&gt;!
&lt;/p&gt;
&lt;p&gt;You can find parallels to the Cupertino effect in the &lt;a href=&quot;http://revealingerrors.com/bucklame&quot;&gt;Bucklame
Effect&lt;/a&gt; I discussed previously. Many of the take-away lessons
   are the same. Spellcheckers make it easier to say some things correctly
   and place an additional cost on others. The effect our communication may
   be subtle but it&apos;s real. For example, a spelling mistake might be less
   forgivable in an era of spellcheckers. Like many communication
   technologies spellcheckers are normally invisible in the documents they
   create; nobody is reminded of spellcheckers by a perfectly spelled
   document. It is only through errors like the Cupertino effect that
   spellcheckers are revealed.
&lt;/p&gt;
&lt;p&gt;Further, these nonsensicle suggestions are made only because of the
   particular way that spellcheckers are built. &lt;a href=&quot;http://blogs.msdn.com/naturallanguage/default.aspx&quot;&gt;Microsoft&apos;s Natural
Language team&lt;/a&gt; is apparently
   working on &amp;quot;contextual&amp;quot; spellcheckers that will be smart enough to
   guess that you probably don&apos;t mean &amp;quot;Cupertino&amp;quot; when you mean
   cooperation. Of course other errors will remain and new ones will be
   introduced.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Mojibake</title>
<category term="" />
<id>http://revealingerrors.com/2008/02/25/mojibake</id>
<updated>2008-02-25T15:44:36Z</updated>
<published>2008-02-25T15:44:36Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/mojibake" />
<content type="html">
&lt;p&gt;One of my favorite Japanese words is &lt;em&gt;mojibake&lt;/em&gt; (文字化け) which
   literally translates as &amp;quot;character changing.&amp;quot; The term is used to
   describe an error experienced frequently by computers users who read and
   write non-Latin scripts -- like Japanese. When readers of non-Latin
   scripts open a document, email, web page, or some other text, text
   is sometimes displayed mangled and unreadable. Japanese speakers refer
   to the resulting garbage as &amp;quot;mojibake.&amp;quot; Here&apos;s a great example from the
   &lt;a href=&quot;http://en.wikipedia.org/wiki/Mojibake&quot;&gt;mojibake article&lt;/a&gt; in Wikipedia (the image is supposed to be in
   Japanese and to display the &lt;a href=&quot;http://en.wikipedia.org/wiki/Mojibake&quot;&gt;the Mojibake article&lt;/a&gt; itself).
&lt;/p&gt;
&lt;p&gt; &lt;a href=&quot;http://en.wikipedia.org/wiki/Image:Mojibake.png&quot;&gt;
   &lt;img src=&quot;/images/mojibake-thumb.png&quot;
alt=&quot;The UTF-8-encoded Japanese Wikipedia article for mojibake, as
displayed in the Windows-1252 (&apos;ISO-8859-1&apos;) encoding.&quot; /&gt;
   &lt;/a&gt; 
&lt;/p&gt;
&lt;p&gt;The problem has been so widespread in Japanese that webpages would often
   place small images in the top corners of pages that say &amp;quot;mojibake.&amp;quot; If a
   user cannot read the content on the page, the image links to pages which
   will try to fix the problem for the user.
&lt;/p&gt;
&lt;p&gt;From a more technical perspective, mojibake might be better described
   as, &amp;quot;incorrect character decoding,&amp;quot; and it hints at a largely hidden
   part of the way our computers handle text that we usually take for
   granted.
&lt;/p&gt;
&lt;p&gt;Of course, computers don&apos;t understand Latin or Japanese characters.
   Instead they operate on bits and bytes -- ones and zeros that represent
   numbers. In order to input or or output text, computer scientists
   created mappings of letters and characters to numbers represented by
   bits and bytes. These mappings end up forming a sequence of characters
   or letters in a particular order often called a character set. To
   display two letters, a computers might ask for the fifth and tenth
   characters from a particular set.  These character sets are codes; they
   map numbers (i.e., positions in the list) to letters just as &lt;a href=&quot;http://en.wikipedia.org/wiki/Morse_code&quot;&gt;Morse
code&lt;/a&gt; maps dots and dashes to letters. Letters can be converted
   to numbers by a computer for storage and then converted back to be
   redisplayed. The process is called &lt;a href=&quot;http://en.wikipedia.org/wiki/Character_encoding&quot;&gt;character encoding&lt;/a&gt; and
   decoding and it happens every time a computer inputs or outputs text.
&lt;/p&gt;
&lt;p&gt;While there may be some natural orderings, (e.g., A through Z), there
   are many ways to encode or map a set of letters and numbers (e.g.,
   Should one put numbers before letters in the set? Should capital and
   lowercase letters be interspersed?). The most important computer
   character encoding is a &lt;a href=&quot;http://en.wikipedia.org/wiki/ASCII&quot;&gt;ASCII&lt;/a&gt; which was first defined in 1963
   and is the de facto standard for almost all modern computers. It defines
   128 characters including the letters and numbers used in English. But
   ASCII says nothing about how one should encode accented characters in
   Latin, scientific symbols, or the characters in any other scripts --
   they are simply not in the list of letters and numbers ASCII provides
   and no mapping is available. Users of ASCII can only use the characters
   in the set.
&lt;/p&gt;
&lt;p&gt;Left with computers unable to represent their languages, many
   non-English speakers have added to and improved on ASCII to create new
   encodings -- different mappings of bits and bytes to different sets of
   letters.  Japanese text can frequently be found in encodings with
   obscure technical names likes EUC-JP, ISO-2022-JP, Shift_JIS, and UTF-8.
   It&apos;s not important to understand how they differ -- although I&apos;ll come
   back to this in a future blog post.  It&apos;s merely important to realize
   that these each represents different ways to map a set of bits and bytes
   into letters, numbers, and punctuation.
&lt;/p&gt;
&lt;p&gt;For example The set of bytes that says &amp;quot;文字化け&amp;quot; (the word for
   &amp;quot;mojibake&amp;quot; in Japanese) encoded in UTF-8 would show up as &amp;quot;��絖�����&amp;quot; in
   EUC-JP, &amp;quot;������������&amp;quot; in ISO-2022-JP, and &amp;quot;æ–‡å­—åŒ–ã‘&amp;quot; in
   ISO-8859-1.  Each of the strings above is a valid decoding of identical
   data -- the same ones and zeros. But of course, only the first is
   correct and comprehensible by a human. Although the others are are
   displaying the same data, the data is unreadable by humans because it is
   decoded according to a different character sets mapping! This is mojibake.
&lt;/p&gt;
&lt;p&gt;For every scrap of text that a computer shows to or takes from a human,
   the computer needs to keep track of the encoding the data is in. Every
   web browser must know the encoding of the page it is receiving and the
   encoding that it will be displayed to the user in. If the data sent is a
   different format than the one that will be displayed, the computer must
   convert the text from one encoding to another.  Although we don&apos;t notice
   it. Encoding metadata is passed along with almost every webpage we read
   and every email we send. Data is being converted between encodings 
   millions of times each day. We don&apos;t even notice that text is encoded --
   until it doesn&apos;t decode properly.
&lt;/p&gt;
&lt;p&gt;Mojibake makes this usually invisible process extremely visible and
   provides an opportunity to understand that our text is coded -- and how.
   Encoding introduces important limitations -- it limits our expression to
   the things that are listed in pre-defined character sets. Until the
   creation of an encoding called &lt;a href=&quot;http://en.wikipedia.org/wiki/Unicode&quot;&gt;Unicode&lt;/a&gt;, one couldn&apos;t mix
   Japanese and Thai in the same document; while there were encodings for
   both, there were no character sets that encoded the letters for both.
   Apparently, in Chinese, there are older more obscure characters that no
   computers can encode yet. Computer users simply can&apos;t write these
   letters on computers. I&apos;ve seen computers users in Ethiopia emailing
   each other in English because support for Amharic encodings at the time
   was so poor and uneven! All of these limits, and many more, are part and
   parcel of our character encoding systems.  They become visible only when
   the usually invisible process of character encoding is thrust into view.
   Mojibake provides one such opportunity.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Bad Signs</title>
<category term="" />
<id>http://revealingerrors.com/2008/02/13/sideways_monitor</id>
<updated>2008-02-14T04:34:08Z</updated>
<published>2008-02-14T04:34:08Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/sideways_monitor" />
<content type="html">
&lt;p&gt;I caught &lt;a href=&quot;http://thedailywtf.com/Articles/Getting-Past-Security.aspx&quot;&gt;another revealing crash screen&lt;/a&gt; over on &lt;em&gt;The Daily
WTF&lt;/em&gt;.
&lt;/p&gt;
&lt;p&gt; &lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/travelex_dialog_1.jpg&quot; alt=&quot;Travelex Crash Screen&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;Although the folks at WTF did not draw attention to the fact, a close
   examination revealed that the dialog box on the crashed screen is
   rotated 90 degrees.
&lt;/p&gt;
&lt;p&gt;If you step back and look at the sign, it makes sense. The folks at
   Travelex wanted a tall poster-sized electronic bulletin board to display
   currency information and promotions. Unfortunately long screens are rare
   and LCD screens of usual sizes are extremely expensive.  Travelex
   appears to have done the very sensible thing of taking a readily
   available and low-cost wide-screen LCD television, turned it on its
   side, and hooked it up to a computer.
&lt;/p&gt;
&lt;p&gt;Of course, screens have tops and bottoms. To display correctly on a
   sideways screen, a computer needs to be configured to display
   information sideways -- a non-trivial tasks on many systems.  If you
   look a the Windows &amp;quot;Start&amp;quot; menu and task-bar along the right side (i.e.,
   bottom) of the screen and the shape of the dialog, it seems that
   Travelex simply didn&apos;t bother. They used the screen to display images,
   or sequences of images and found it easy enough to simply rotate each
   of the images to be display 90 degrees as well.  They simply showed a
   full-screen slide-show of sideways images on their sideways screen.  And
   no user ever noticed until the system crashed. 
&lt;/p&gt;
&lt;p&gt;It&apos;s a neat trick that many users might find useful but most would not
   think to do. Although they might after seeing this crash!
&lt;/p&gt;
&lt;p&gt;A close-up of the screen reveals even more.
&lt;/p&gt;
&lt;p&gt; &lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/travelex_dialog_2.jpg&quot; alt=&quot;Travelex Crash Screen Closeup&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;Apparently, the dialog has popped up because the computer running the
   sign has a virus! Viruses are usually acquired through user interaction
   with a computer (e.g., opening a bad attachment) or through the
   Internet. It seems likely that the computer is plugged into the Internet
   -- perhaps the slide-show is updated automatically -- or that the image
   is being displayed from a computer used to do other things. In any case,
   it&apos;s a worrying &amp;quot;sign&amp;quot; from a financial services company.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Picture of a Process</title>
<category term="" />
<id>http://revealingerrors.com/2008/02/07/google_books_hands</id>
<updated>2008-02-07T14:24:24Z</updated>
<published>2008-02-07T14:24:24Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/google_books_hands" />
<content type="html">
&lt;p&gt;I enjoyed seeing this image in &lt;a href=&quot;http://www.theregister.co.uk/2007/12/05/google_books/&quot;&gt;an article&lt;/a&gt; in &lt;a href=&quot;http://www.theregister.co.uk&quot;&gt;The
Register&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt; &lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/google_books_finger.jpg&quot; alt=&quot;finger shown in Google book&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;The picture is a screen shot from &lt;a href=&quot;http://books.google.com&quot;&gt;Google Books&lt;/a&gt; viewing a page
   from a 1855 issue of &lt;em&gt;The Gentleman&apos;s Magazine&lt;/em&gt;.  The latex-clad fingers
   belong to one of the people whose job it is to scan the books for
   Google&apos;s book project.
&lt;/p&gt;
&lt;p&gt;Information technologies often hide the processes that bring us the
   information we interact with. Revealing errors give a picture of what
   these processes look like or involve. In an extremely literal way, this
   errors shows us just such a picture.
&lt;/p&gt;
&lt;p&gt;We can learn quite a lot from this image. For example, since the fingers
   are not pressed against glass, we might conclude that Google is not
   using a traditional flatbed scanner. Instead, it is likely that they are
   using a system similar to &lt;a href=&quot;http://redjar.org/jared/blog/archives/2005/10/28/internet-archives-book-scanner/&quot;&gt;the one&lt;/a&gt; that the the &lt;a href=&quot;http://www.archive.org&quot;&gt;Internet
Archive&lt;/a&gt; has built that is designed specifically for scanning books.
&lt;/p&gt;
&lt;p&gt;But perhaps the most important thing that this error reveals is
   something we know, but often take for granted -- the human involved in
   the process.
&lt;/p&gt;
&lt;p&gt;The decision on where to automate a process, and where leave it up to a
   human, is sometimes a very complicated one. Human involvement in a
   process can prevent and catch many types of errors but can cause new
   ones. Both choices introduce risks and benefits. For example, an
   automated bank transaction system may allow human to catch obvious
   errors and to detect suspicious use that a computer without &amp;quot;common
   sense&amp;quot; might miss. On the other hand, a human banker might commit fraud
   to try to enrich themselves with others money -- something a machine
   would never do.
&lt;/p&gt;
&lt;p&gt;In our interaction with technological systems, we rarely reflect on the
   fact, and the ways, that the presence of humans in these areas is
   important to determining the behavior, quality, reliability, and the
   nature and degree of trust that we have in a technology.
&lt;/p&gt;
&lt;p&gt;In our interactions with complex processes through simple and abstract
   user interfaces, it is often only through errors -- distinctly human
   errors, if not usually quite as clearly human as this one -- that
   information workers&apos; important presence is revealed.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Wordlists and Profanity</title>
<category term="" />
<id>http://revealingerrors.com/2008/01/29/wordlist_profanity</id>
<updated>2008-01-29T17:19:22Z</updated>
<published>2008-01-29T17:19:22Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/wordlist_profanity" />
<content type="html">
&lt;p&gt;Revealing errors are a way of looking at the fact that a technology&apos;s
   failure to deliver a message can tell us a lot. In this way, there&apos;s an
   intriguing analogy one can draw between revealing errors and censorship.
&lt;/p&gt;
&lt;p&gt;Censorship doesn&apos;t usually keep people from saying or writing something
   -- it just keeps them from communicating it.  When censorship is
   effective, however, an audience doesn&apos;t realize that any speech ever
   occurred or that any censorship has happened -- they simply don&apos;t know
   something and, more importantly perhaps, don&apos;t know that they don&apos;t
   know.  As with invisible technologies, a censored community might
   never realize their information and interaction with the world is being
   shaped by someone else&apos;s design.
&lt;/p&gt;
&lt;p&gt;I once was in an cafe with a large SMS/text message &amp;quot;board.&amp;quot; Patrons
   could send an SMS to a particular number and it would be displayed on a
   flat-panel television mounted on the wall that everyone in the
   restaurant could read. I tested to see if there was a content filter
   and, sure enough, any message that contained a &lt;a href=&quot;http://en.wikipedia.org/wiki/Four-letter_word&quot;&gt;four-letter word&lt;/a&gt;
   was silently dropped; it simply never showed up on the screen. As the
   censored party, the failure of my message to show up on the board
   revealed a censor. Further testing and my success in posting messages
   with creatively spelled profanity, numbers instead of letters, and the
   construction of crude ASCII drawings revealed the censor as a piece of
   software with a blacklist of terms; no human charged with blocking
   profanity would have allowed &amp;quot;sh1t&amp;quot; through. Through the whole process,
   the other patrons in the cafe, remained none-the-wiser; they never
   realized that the blocked messages had been sent.
&lt;/p&gt;
&lt;p&gt;This desire to create barriers to profanity is widespread in
   communication technologies. For example, consider the number of times
   have you been prompted by a spellchecker to review and &amp;quot;fix&amp;quot; a swear
   word. Offensive as they may be, &amp;quot;fuck&amp;quot; and &amp;quot;shit&amp;quot; are correctly spelled
   English words.  It seems highly unlikely that they were excluded from
   the spell-checker&apos;s wordlist because the compiler forgot them.  They
   were excluded, quite simply, because their were deemed obscene or
   inappropriate.  While intentional, these words&apos; omission results in the
   false identification of all cursing as misspelling -- errors we&apos;ve grown
   so accustomed to that they hardly seem like errors at all! 
&lt;/p&gt;
&lt;p&gt;Now, unlike a book or website which more impressionable children might
   read, nobody can be expected to find a four-letter word while reading
   their spell-checking wordlist. These words are not included simply
   because our spell-checker makers think we &lt;em&gt;shouldn&apos;t&lt;/em&gt; use them.  The
   result is that every user who writes a four-letter-word must add that
   word, by hand, to their &amp;quot;personal&amp;quot; dictionary -- they must take explicit
   credit for using the term.  The hope, perhaps, is that we&apos;ll be reminded
   to use a different, more acceptable word.  Every time this happens, the
   paternalism of the wordlist compiler is revealed. 
&lt;/p&gt;
&lt;p&gt;Connecting back to &lt;a href=&quot;/bucklame&quot;&gt;my recent post on predictive text&lt;/a&gt;, here&apos;s a
   very funny video of &lt;a href=&quot;http://en.wikipedia.org/wiki/Armstrong_and_Miller&quot;&gt;Armstrong and Miller&lt;/a&gt; lampooning the
   omission of four-letter words from predictive text databases that make
   it more difficult to input profanity onto mobile phones (e.g., are you
   sure you did not mean &amp;quot;shiv&amp;quot; and &amp;quot;ducking&amp;quot;?). You can also or &lt;a href=&quot;/images/predictive_text_office.ogg&quot;&gt;download
the video&lt;/a&gt; in OGG Theora if you have trouble watching it in
   Flash.
&lt;/p&gt;
&lt;p&gt; &lt;object width=&quot;425&quot; height=&quot;355&quot;&gt; &lt;param name=&quot;movie&quot; value=&quot;http://www.youtube.com/v/6hcoT6yxFoU&amp;rel=1&quot;&gt; &lt;/param&gt; &lt;param name=&quot;wmode&quot; value=&quot;transparent&quot;&gt; &lt;/param&gt; &lt;embed src=&quot;http://www.youtube.com/v/6hcoT6yxFoU&amp;rel=1&quot; type=&quot;application/x-shockwave-flash&quot; wmode=&quot;transparent&quot; width=&quot;425&quot; height=&quot;355&quot;&gt; &lt;/embed&gt; &lt;/object&gt; 
&lt;/p&gt;
&lt;p&gt;There&apos;s a great line in there: &amp;quot;Our job ... is to offer people not the
   words that they do use but the words that they should use.&amp;quot;
&lt;/p&gt;
&lt;p&gt;Most of the errors described on this blog reveal the design of technical
   systems. While the errors in this case do not stem from technical
   decisions, they reveal a set of equally human choices.  Perhaps more
   interestingly, the errors themselves are fully intended! The goal of
   swear-word omission is, in part, the moment of reflection that a
   revealing error introduces. In that moment, the censors hope, we might
   reflect on the &amp;quot;problems&amp;quot; in our coarse choice of language and consider
   communicating differently.
&lt;/p&gt;
&lt;p&gt;These technologies don&apos;t keep us from swearing any more than other
   technology designers can control our actions -- we usually have the
   option of using or designing different technologies. But &lt;em&gt;every&lt;/em&gt;
   technology offers affordances that make certain things easier and others
   more difficult.  This may or not be intended but it&apos;s always important.
   Through errors like those made by our prudish spell-checker and
   predictive text input systems, some of these affordances, and their
   sources, are revealed.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Bucklame and Predictive Text Input</title>
<category term="" />
<id>http://revealingerrors.com/2008/01/27/bucklame</id>
<updated>2008-01-27T22:36:36Z</updated>
<published>2008-01-27T22:36:36Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/bucklame" />
<content type="html">
&lt;p&gt;I recently heard that &amp;quot;Bucklame,&amp;quot; apparently &lt;a href=&quot;http://www.urbandictionary.com/define.php?term=Bucklame&quot;&gt;a nickname&lt;/a&gt; for New
   Zealand&apos;s largest city Auckland, has its source in a technical error that is
   dear to my heart. It seems that it stems from the fact that many mobile
   phones&apos; predictive text input software will suggest the term &amp;quot;Bucklame&amp;quot;
   if a user tries to input &amp;quot;Auckland&amp;quot; -- the latter of which was
   apparently not in its list of valid words.
&lt;/p&gt;
&lt;p&gt;In my &lt;a href=&quot;http://journal.media-culture.org.au/0710/01-hill.php&quot;&gt;initial article on revealing errors&lt;/a&gt;, I wrote a little
   about the technology at the source of this error: Tegic&apos;s (now
   &lt;a href=&quot;http://www.nuance.com/&quot;&gt;Nuance&lt;/a&gt;&apos;s) &lt;a href=&quot;http://en.wikipedia.org/wiki/T9_%28predictive_text%29&quot;&gt;T9&lt;/a&gt; predictive text technology which is a
   frequent way that users of mobile phones with normal keypad (9-12 keys)
   can quickly type in text messages with 50+ letters, numbers and symbols.
   Here is how I described the system:
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;Tegic’s popular T9 software allows users to type in words by pressing
   the number associated with each letter of each word in quick
   succession. T9 uses a database to pick the most likely word that maps
   to that sequence of numbers. While the system allows for quick input
   of words and phrases on a phone keypad, it also allows for the
   creation of new types of errors. A user trying to type me might
   accidentally write of because both words are mapped to the combination
   of 6 and 3 and because of is a more common word in English. T9 might
   confuse snow and pony while no human, and no other input method,
   would.
&lt;/p&gt;
&lt;/blockquote&gt;&lt;p&gt;Mappings of number-sequences to words are based on database that offers
   words in order of relative frequency.  These word frequency lists are
   based on a corpus of text in the target language pre-programmed into the
   phone. These corpora, at least initially, were not based on the words
   people use to communicate using SMS but one a more readily available
   data source (e.g., in emails or memos of in fiction). This leads to
   problems common to many systems that built on shaky probabilistic
   models: what is likely in one context may not be as likely in another.
   For example, while &amp;quot;but&amp;quot; is an extremely common English word, it might
   be much less common in SMS where more complex sentence structures are
   often eschewed due to economy of space (160 character messages) and
   laborious data-entry. The word &amp;quot;pony&amp;quot; might be more common than &amp;quot;snow&amp;quot;
   in some situations but it&apos;s certainly not in my usage!
&lt;/p&gt;
&lt;p&gt;Of course, proper nouns, of which there are many, are often excluded
   from these systems as well. Since the T9 system does not &amp;quot;know&amp;quot; the word
   &amp;quot;Auckland&amp;quot;, the nonsensical compound-word &amp;quot;bucklame&amp;quot; seems to be an
   appropriate mapping for the same number-sequence. Apparently, people
   liked the error so much they kept using itand, with time perhaps, it
   stops being an error at all.
&lt;/p&gt;
&lt;p&gt;As users move to systems with keyboards like Blackberries, Treos,
   Sidekicks, and iPhones (which use a dual-mode system) these errors
   become impossible. As a result, the presence of these types of errors
   (e.g., a swapped &amp;quot;me&amp;quot; and &amp;quot;of&amp;quot;) can tell communicators quite a lot about
   the type of device they are communicating with.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Creating Kanji</title>
<category term="" />
<id>http://revealingerrors.com/2008/01/15/creating_kanji</id>
<updated>2008-01-15T18:25:25Z</updated>
<published>2008-01-15T18:25:25Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/creating_kanji" />
<content type="html">
&lt;p&gt;Errors reveal characteristics of the languages we use and the
   technologies we use to communicate them -- everything from scripts and
   letter forms (which while very fundamental to written communication are
   technologies nonetheless) to the computer software we use to create and
   communicate text.
&lt;/p&gt;
&lt;p&gt;I&apos;ve spent the last few weeks in Japan. In the process, I&apos;ve learned a
   bit about the Japanese language; no small part of this through errors.
   Here&apos;s one error that taught me quite a lot. The sentence is shown in
   Japanese and then followed by a translation into English:
&lt;/p&gt;
&lt;blockquote&gt;今年から貝が胃に棲み始めました。&lt;br /&gt;
This year, a clam started living in my stomach.&lt;/blockquote&gt;

&lt;p&gt;Needless to say perhaps, this was an error. It was supposed to say:
&lt;/p&gt;
&lt;blockquote&gt;今年から海外に住み始めました。&lt;br /&gt;
This year, I started living abroad.&lt;/blockquote&gt;

&lt;p&gt;When the sentences are translated into &lt;a href=&quot;http://en.wikipedia.org/wiki/Romaji&quot;&gt;romaji&lt;/a&gt; (i.e., Japanese
   written in an Roman script) the similarity becomes much more clear to
   readers that don&apos;t understand Japanese:
&lt;/p&gt;
&lt;blockquote&gt;Kotoshikara kaiga ini sumihajimemashita.&lt;br /&gt;
Kotoshikara kaigaini sumihajimemashita.&lt;/blockquote&gt;

&lt;p&gt; &lt;em&gt;Kotoshikara&lt;/em&gt; means &amp;quot;since this year.&amp;quot; &lt;em&gt;Sumihajimemashita&lt;/em&gt; means, &amp;quot;has
   started living.&amp;quot; The word &lt;em&gt;kaigaini&lt;/em&gt; means &amp;quot;abroad&amp;quot; or &amp;quot;overseas.&amp;quot;
   &lt;em&gt;Kaiga ini&lt;/em&gt; (two words) means &amp;quot;clam in stomach.&amp;quot;  When written
   phonetically in romaji, the only difference in the two sentences lie in
   the introduction of a word-break in the middle of &amp;quot;kaigaini.&amp;quot; Written
   out in Japanese, the sentences are quite different; even without
   understanding, one can see that more than a few of the characters in the
   sentences differ.
&lt;/p&gt;
&lt;p&gt;In English word spacing plays an essential role in making written
   language understandable. Japanese, however, is normally written without
   spaces between words.
&lt;/p&gt;
&lt;p&gt;This isn&apos;t a problem in Japanese because the Japanese script uses a
   combination of &lt;a href=&quot;http://en.wikipedia.org/wiki/Ideograms&quot;&gt;logograms&lt;/a&gt; -- called &lt;a href=&quot;http://en.wikipedia.org/wiki/Kanji&quot;&gt;kanji&lt;/a&gt; -- and
   phonetic characters -- called &lt;a href=&quot;http://en.wikipedia.org/wiki/Hiragana&quot;&gt;hiragana&lt;/a&gt; and
   &lt;a href=&quot;http://en.wikipedia.org/wiki/Katakana&quot;&gt;katakana&lt;/a&gt; or simply &lt;a href=&quot;http://en.wikipedia.org/wiki/Kana&quot;&gt;kana&lt;/a&gt; -- to delimit words and to
   describe structure. The result, to Japanese readers, is unambiguous.
   Phonetically and without spaces, the two sentences are identical in
   either kana or romaji:
&lt;/p&gt;
&lt;blockquote&gt;ことしからかいがいにすみはじめました。&lt;br /&gt;
Kotoshikarakaigainisumihajimemashita.&lt;/blockquote&gt;

&lt;p&gt;In purely phonetic form, the sentence is ambiguous. Using kanji, as
   shown in the opening examples, this ambiguity is removed.  While
   phonetically identical, &amp;quot;kaigaini&amp;quot; (abroad) and &amp;quot;kaiga ini&amp;quot; (clam in
   stomach) are very different when kanji is used; they are written
   &amp;quot;海外に&amp;quot; and &amp;quot;貝が胃に&amp;quot; respectively and are not easily confusable by
   Japanese readers.
&lt;/p&gt;
&lt;p&gt;This error, and many others like it, stems from the way that Japanese
   text is input into computers.  Because there are more than 4,000 kanji
   in frequent use in Japan, there simply are not enough keys on a keyboard
   to input kanji directly. Instead, text in Japanese is input into
   computers phonetically (i.e., in kana) without spaces or explicit word
   boundaries.  Once the kana is input, users then transform the phonetic
   representation of their sentence or phrase into a version using the
   appropriate kanji logograms. To do so, Japanese computer users employ
   special software that contains a database of mappings of kana to kanji.
   In the process, this software makes educated guesses about where word
   boundaries are. Usually, computers guess correctly.  When computers get
   it wrong, users need to go back and tweak the conversion by hand or
   select from other options in a list.  Sometimes, when users are in a
   rush, they use an incorrect kana to kanji conversion.  It would be
   obvious to any Japanese computer users that this is precisely what
   happened in the sentence above.
&lt;/p&gt;
&lt;p&gt;This type of error has few parallels in English but is extremely common
   in Japanese writing.  The effects, like this one, are often confusing or
   hilarious.  For a Japanese reader, this error reveals the kana to kanji
   mapping system and the computer software that implements it -- nobody
   would make such a mistake with a pen and paper. For a person less
   familiar with Japanese, the error reveals a number of technical
   particularities about the Japanese writing system and, in the process,
   about the ways in Japanese differs from other languages they might speak.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Precision Expiration</title>
<category term="" />
<id>http://revealingerrors.com/2007/12/22/precision_expiration</id>
<updated>2007-12-22T19:34:15Z</updated>
<published>2007-12-22T19:34:15Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/precision_expiration" />
<content type="html">
&lt;p&gt;Here is a photograph (and a closeup) of a bag of pretzels I was given on
   a cross-country plane trip today.
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;/images/pretzel_bag-both.jpg&quot;
     alt=&quot;Bag of Synder&apos;s Pretzels Big and Closeup&quot;
     style=&quot;border: 1px black solid;&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;When I first saw &amp;quot;May 11 DC20 2008 00:12,&amp;quot; I thought, &amp;quot;Wow! That&apos;s an
   &lt;em&gt;extremely&lt;/em&gt; precise expiration date!&amp;quot; In transit over several time zones
   I then thought, what time zone do they mean?
&lt;/p&gt;
&lt;p&gt;Of course, expiration dates are ballpark figures that mark thresholds in
   the gradual process of product degradation. They are arbitrary, of
   course. It&apos;s not as if these pretzels will be great on May 10th and
   inedible two days later. Unless the pretzels have been set to
   self-destruct, the addition of an expiration hour and an expiration
   &lt;em&gt;minute&lt;/em&gt; seems, well, unnecessary.  &lt;br /&gt; 
&lt;/p&gt;
&lt;p&gt;What&apos;s happened here is a design error. The label is, in fact, two
   different types of data printed in two separate columns. &amp;quot;May 11 2008&amp;quot;
   is the expiration date. &amp;quot;DC20 00:12&amp;quot; is the number of the machine or
   production line that produced the bag and the time at which the pretzels
   were made. Taken together, the information can be used by the producer,
   &lt;em&gt;Synder&apos;s of Hanover&lt;/em&gt;, for quality control purposes to find out what
   machines, workers, and batches of supplies produced a particular bag of
   pretzels.  In all likelihood, &lt;em&gt;Snyder&apos;s&lt;/em&gt; prints these labels with a
   system that, for cost reasons, tries to minimize the amount of printed
   area on each bag.
&lt;/p&gt;
&lt;p&gt;For &lt;em&gt;Snyder&apos;s&lt;/em&gt; employees familiar with the system, the labels are
   completely clear.  But those of us not familiar with the system are left
   confused. Error can be thought of as the chasm between user expectations
   and technical interaction.  Like most of the errors I discuss here, this
   flub represents failed communication and reveals the mediating
   technologies.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Writing Type</title>
<category term="" />
<id>http://revealingerrors.com/2007/12/16/typewriter_fonts</id>
<updated>2007-12-16T20:01:44Z</updated>
<published>2007-12-16T20:01:44Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/typewriter_fonts" />
<content type="html">
&lt;p&gt;You have probably seen text produced by computers in fonts that are
   meant to look like they were typed on typewriters. The word
   &amp;quot;bookselling&amp;quot; caught my eye in a presentation by Lawrence Lessig. I&apos;ve
   rendered a blown up version here in &lt;a href=&quot;http://www.p22.com/ihof/typewriter.html&quot;&gt;P22 Typewriter&lt;/a&gt;, the font he
   used in his presentation.
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;/images/typefont_booksellers_p22.png&quot; alt=&quot;Bookselling in P22 Typewriter&quot; style=&quot;border: 1px black solid; padding: 10px;&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;Here is &amp;quot;bookselling&amp;quot; rendered in another typewriter font, &lt;a href=&quot;http://bit-fonts.com/ttf.html&quot;&gt;Old
Typewriter&lt;/a&gt;, which is a similiar, but more extreme, example.
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;/images/typefont_booksellers_ot.png&quot; alt=&quot;Bookselling in Old Typewriter&quot; style=&quot;border: 1px black solid; padding: 10px;&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;I was struck by the fact that while the font looked messy, it was
   &lt;em&gt;consistently&lt;/em&gt; messy. The back-to-back &lt;em&gt;o&apos;s&lt;/em&gt; and &lt;em&gt;l&apos;s&lt;/em&gt;  in &amp;quot;bookselling&amp;quot;
   are perfect copies of each other. No typewriter would have produced
   &lt;em&gt;identically&lt;/em&gt; messy letters. However, because they are produced using a
   computer, the distortion is perfectly consistent between instances of a
   given letter.
&lt;/p&gt;
&lt;p&gt;To appreciate the revealing error, you must understand that the process
   of printing with inked pieces of metal type is messy. In letterpress
   printing, ink is rolled onto type using rollers or inkballs. In
   typewriters, letters are inked individually or the ink is pressed onto
   the page through an ink soaked ribbon.  The result in both cases is
   letterforms that are slightly deformed due to irregular application of
   ink to type, globbing of the ink, the rough texture of the paper, and
   the splattering of ink across the page when the type hits the page. In
   part to prevent confusion due to these errors, typewriter typefaces
   employ exaggerated serifs to make each letter&apos;s form more distinct and
   resistant to distortion and noise.
&lt;/p&gt;
&lt;p&gt;However, on a computer screen or on a modern printer, letterforms are
   perfectly reproduced. Printers and screens build letters out of 
   patterns of dots in tiny grids.  The dots making up letters are
   precisely placed and microscopic.  Screen don&apos;t splatter ink. In order
   to present an accurate typewriter font on screen or to be printed by a
   modern printer, font designers must also represent the types of errors
   that typewriters would make. You can see the messiness clearly in
   typewriter font samples.
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;/images/typefont_pangrams.png&quot; alt=&quot;Pangrams in a typewriter font&quot; style=&quot;border: 1px black solid; padding: 10px;&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;However, just as the sloppiness of typewritten documents reveals the
   typewriters that produced it, the computer reproduction of that error
   introduces another revealing mistake. While most letterforms produced by
   a typewriter are malformed, they are &lt;em&gt;uniquely&lt;/em&gt; malformed. Like
   snowflakes, each letter printed by a typewriter is subtly different for
   every other letter. The computer reveals itself by reproducing the &lt;em&gt;same&lt;/em&gt;
   messiness of a letterform each time it is reproduced.
&lt;/p&gt;
&lt;p&gt;A typewriter might produce the first &lt;em&gt;o&lt;/em&gt;; in fact, a real typewriter
   was probably the source of that letterform. But no typewriter would
   produce that &lt;em&gt;o&lt;/em&gt; identically twice.  &lt;em&gt;That&lt;/em&gt; takes a computer. To be very
   convincing, a typewriter font would need to produce different versions
   of each character or to distort them randomly. I&apos;ve been told that there
   are now fonts that do exactly this.
&lt;/p&gt;
&lt;p&gt;While the imperfections of the typewritten characters reveals a
   typewriter, the reproduction of these errors with perfect verisimilitude
   reveals a computer. In the process of trying to emulate the errors
   created by a typewriter, the computer commits a new error and reveals
   the whole process.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Cross Site Descripting</title>
<category term="" />
<id>http://revealingerrors.com/2007/11/28/apple_xss</id>
<updated>2007-11-28T15:54:39Z</updated>
<published>2007-11-28T15:54:39Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/apple_xss" />
<content type="html">
&lt;p&gt;Blogger Jordan Wiens recently &lt;a href=&quot;http://wantingseed.com/sprout/2007/11/19/how-not-to-protect-your-webapp&quot;&gt;noticed a funny thing about the Apple
website&lt;/a&gt;. When one tries to search for &amp;quot;&lt;a href=&quot;http://www.apple.com/applescript/&quot;&gt;applescript&lt;/a&gt;&amp;quot;
   (Apple&apos;s scripting and automation product) on &lt;a href=&quot;http://www.apple.com/search/&quot;&gt;Apple&apos;s
website&lt;/a&gt;, they end up with this search result:
&lt;/p&gt;
&lt;p&gt; &lt;img style=&quot;border: 1px solid black&quot;
     src=&quot;/images/applescript_search_results.png&quot;
     alt=&quot;Applescript search results from Apple.com&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;Until the issue is fixed, you can see for yourself by navigating to
   &lt;a href=&quot;http://www.apple.com/search/?q=applescript&quot;&gt;http://www.apple.com/search/?q=applescript&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;On the search result page, the Apple search software seems to change the
   term &amp;quot;applescript&amp;quot; into &amp;quot;apple.&amp;quot; A search for the term &amp;quot;apple&amp;quot; on the
   Apple website is, as one might imagine, not a particularly useful way to
   find information about Applescript. To most users, this error is
   confounding. To a trained eye, it reveals an overzealous security system
   attempting to prevent what&apos;s called &lt;a href=&quot;http://en.wikipedia.org/wiki/Cross-site_scripting&quot;&gt;cross-site scripting&lt;/a&gt; or XSS
   -- a way that spammers, &lt;a href=&quot;http://en.wikipedia.org/wiki/Phishing&quot;&gt;phishers&lt;/a&gt;, and nefarious system-crackers
   can sneakily work around privacy and security systems in web browsers by
   exploiting two features of modern web browsers.
&lt;/p&gt;
&lt;p&gt;First, through the use of a programming language called &lt;em&gt;Javascript&lt;/em&gt;,
   many web pages run small computer programs inside users&apos; browsers.
   These Javascript programs allow for applications that are more
   responsive than would have been possible before (think &lt;a href=&quot;http://maps.google.com&quot;&gt;Google
Maps&lt;/a&gt; for a good example).  Running random
   programs is risky, of course. To protect users and their privacy, web
   browsers limit Javascript programs in several ways. One common technique
   is to limit access granted to a Javascript program from a given website
   to information from the site the Javascript originated at. This security
   system is designed to bar one website&apos;s programs from accessing and
   relaying sensitive information, like login information or credit card
   numbers, from another website.
&lt;/p&gt;
&lt;p&gt;Second, a large number of applications allow input from users that is
   subsequently displayed on web pages. This can come in the form of edits
   and additions to Wikipedia pages, comments on forums, articles, or
   blogs, or even the fact that when you run a web search, the search terms
   are displayed back to you at the top of your page.
&lt;/p&gt;
&lt;p&gt;A security vulnerability, it turns out, lies in the combination of the
   two features. This vulnerability, XSS, happens when a nefarious user
   embeds small Javascript programs in input (e.g., a comment) which is run
   each time a page is subsequently viewed.  Masquerading to the browser as
   a legitimate script created by the website creator, these programs can
   access sensitive information from the website stored on the user&apos;s
   computer (e.g., login information) and then send this information to the
   author of the script without the violated user&apos;s permission or
   knowledge.
&lt;/p&gt;
&lt;p&gt;When an attacker executes an XSS attack, they do so by trying to include
   Javascript in input that will be displayed to the user. This usually
   comes in the form of:
&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    &amp;lt;script&amp;gt;some code send to private information&amp;lt;/script&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;In HTML, the &amp;quot;&amp;lt;script&amp;gt;&amp;quot; and &amp;quot;&amp;lt;/script&amp;gt;&amp;quot; tags signify to the
   web browser that the text between is a program to be run.
&lt;/p&gt;
&lt;p&gt;XSS has become a large problem. To combat and prevent it, web developers
   take great care to protect their users and their applications from
   attacks by blocking, removing, or disabling attempts to include programs
   in user input.  One frequently employed method of doing so is to simply
   remove the &amp;quot;&amp;lt;script&amp;gt;&amp;quot; tags that cause programs to be run. Without
   the tags, malicious code may remain, but will never be executed on
   users&apos; computers.
&lt;/p&gt;
&lt;p&gt;With this knowledge of XSS we can begin to understand the puzzling
   behavior of Apple&apos;s website. By trying several other searches, we can
   confirm that Apple&apos;s search engine is, in fact, removing &lt;em&gt;all&lt;/em&gt; mentions
   of the term &amp;quot;script&amp;quot; from input to the site. The system is almost
   certainly designed to block XSS. While it is likely to succeed in doing
   so, the side effects, in the case of users searching for Applescript,
   are &lt;em&gt;extremely&lt;/em&gt; inconvenient.
&lt;/p&gt;
&lt;p&gt;Through the error, Apple reveals their overzealous system designed to
   prevent XSS. Those who dig deeper to understand the source of this
   initially baffling behavior can gain new respect for implicit trust that
   that our browsers give to code on the websites we visit and the ways in
   which this trust can be abused.
&lt;/p&gt;
&lt;p&gt;In all likelihood, we have all been the victims of XSS attacks as users
   -- although most of us have been lucky enough to avoid divulging
   sensitive information in the process.  Apple&apos;s error represents
   &amp;quot;collateral damage&amp;quot; in a a war fought between crackers, spammers,
   phishers on one side and web applications developers on the other. While
   we are rarely aware of it, this battle affects the way our web
   applications are designed and the features they do, and do not, include.
   We are, indirectly, affected by XSS even when we&apos;re not looking for
   information on Applescript. By revealing one anti-XSS security system,
   Apple&apos;s mistep points to that fact.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Thunderbird and the Nature of Spam</title>
<category term="" />
<id>http://revealingerrors.com/2007/11/20/thunderbird_spam</id>
<updated>2007-11-20T09:46:30Z</updated>
<published>2007-11-20T09:46:30Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/thunderbird_spam" />
<content type="html">
&lt;p&gt;I found this beautiful and simple &lt;a href=&quot;http://worsethanfailure.com/Articles/Internal-Feud.aspx&quot;&gt;example&lt;/a&gt; of a revealing error
   featured in the fantastic (and very funny) &lt;a href=&quot;http://worsethanfailure.com/Series/Error_0x27_d.aspx&quot;&gt;Error&apos;d series&lt;/a&gt; on &lt;a href=&quot;http://www.worsethanfailure.com&quot;&gt;Worse
Than Failure&lt;/a&gt;:
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;/images/thunderbird_spam.png&quot;
     alt=&quot;Thunderbird showing it&apos;s own welcome message as spam.&quot;
     style=&quot;border: 1px black solid&quot; /&gt; 
&lt;/p&gt;
&lt;p&gt;My guess is that before most users start the Mozilla Thunderbird email
   client for the first time, they don&apos;t know that the software has a spam
   detection feature. That said, when the welcome message that
   automatically shows up in the inbox of every new Thunderbird user is
   prefixed by a notice that the message in question might be &amp;quot;junk,&amp;quot;,
   users&apos; ignorance on the matter will quickly be put to rest!
&lt;/p&gt;
&lt;p&gt;Of course, much more than the simple existence of the spam-flagging
   system is revealed by this error. With a little reflection, we can infer
   some of the criteria that Thunderbird must be using to sort spam or junk
   from legitimate email. Most mail systems, including Thunderbird use a
   variety of methods which, in aggregate, are used to determine the
   likelihood of a message being spam. Thunderbird&apos;s welcome message is not
   addressed directly to the user in question and it makes extensive use of
   rich-text HTML and images -- both common characteristics to spam. 
&lt;/p&gt;
&lt;p&gt;Central to most modern spam-checkers is a statistical analysis of words
   used in the content of the email. Since spammers are trying to
   communicate a message, a prevalence of certain words and an absence of
   others is usually sufficient to sort out the junk. Sure enough, the
   Thunderbird welcome message is written using rather impersonal and
   marketing-speak terms that would be less likely in personal email (e.g.,
   offering &amp;quot;product information&amp;quot;).
&lt;/p&gt;
&lt;p&gt;From the perspective of the Thunderbird developers, the flagging of this
   message as spam seems to be in error. From the perspective of the user
   though, it is not quite as clear. The Thunderbird message is both
   unsolicited and commercial in nature -- essentially the definition of
   spam. In the &amp;quot;looks like a duck&amp;quot; sense, it uses words that make it
   &amp;quot;read&amp;quot; like spam.
&lt;/p&gt;
&lt;p&gt;While this simple error can teach Thunderbird users about the existence
   and the nature of their spam-checker, it might also teach the folks
   responsible for the Thunderbird welcome message something about the way
   the their messages might seem to their users.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Identity Crisis</title>
<category term="" />
<id>http://revealingerrors.com/2007/11/09/identity_crisis</id>
<updated>2007-11-09T18:36:35Z</updated>
<published>2007-11-09T18:36:35Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/identity_crisis" />
<content type="html">
&lt;div style=&quot;margin-left: 3em; font-size: 0.8em; font-style: italic;&quot;&gt;

&lt;p&gt;This error was revealed and written up by &lt;a
href=&quot;http://www.red-bean.com/kfogel/&quot;&gt;Karl Fogel&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt; &lt;/div&gt; 
&lt;/p&gt;
&lt;p&gt;Yesterday I received email from a hotel, confirming a reservation for a
   room.  But it wasn&apos;t meant for me; it was meant for &amp;quot;Kathy Fogel&amp;quot; (whom
   I&apos;ve never met), and was sent to &amp;quot;k.fogel@gmail.com&amp;quot;.
&lt;/p&gt;
&lt;p&gt;Now, I do have the account &amp;quot;kfogel@gmail.com&amp;quot;, but I&apos;d never received
   email for &amp;quot;k.fogel&amp;quot; before.  As I&apos;d always thought &amp;quot;.&amp;quot; was a significant
   character in email addresses, I didn&apos;t see how I could have gotten this
   mail.  It turns out, though, that Google ignores &amp;quot;.&amp;quot; when it&apos;s in the
   username portion of a gmail address.  My
   friend Brian Fitzpatrick knew this already, and pointed me to &lt;a href=&quot;http://mail.google.com/support/bin/answer.py?ctx=%67mail&amp;hl=en&amp;answer=10313&quot;&gt;Google&apos;s
explanation&lt;/a&gt;.  (I learned later that 
   &lt;a href=&quot;http://arstechnica.com/news.ars/post/20060120-6022.html&quot;&gt;others have been suprised by this
behavior&lt;/a&gt; too.)
&lt;/p&gt;
&lt;p&gt;So the error revealed a feature -- at least, I&apos;m fairly sure Google
   would consider it a feature, although the exact motivation for it is
   still not clear to me.  It might be a technical requirement caused by
   merging several legacy user databases, or it might simply be to
   prevent confusion among addresses that only differ by dots.
&lt;/p&gt;
&lt;p&gt;Anyway, I called the hotel, and eventually managed to make them
   understand that I had no idea who Kathy Fogel was, and that I&apos;d
   accidentally gotten an email intended for her.  They said they&apos;d
   resend, and of course I said &amp;quot;Wait, no, it&apos;ll just come to me again!&amp;quot;
   But they swore they had a different email address on file for her, and
   indeed, I haven&apos;t gotten a second email.
&lt;/p&gt;
&lt;p&gt;Which raises another question: how did they send the mail to
   &amp;quot;k.fogel@gmail.com&amp;quot; in the first place?  Clearly, Kathy Fogel cannot
   have that address, because Google will not allow any other &amp;quot;dot
   variants&amp;quot; of an address to be registered after the first.  (Besides,
   if she did have that address, we&apos;d be getting each other&apos;s mail all
   the time, and we&apos;re not.)  It&apos;s also unlikely that she mistakenly
   given them that address herself, since they already had another
   address in place by the time I called.
&lt;/p&gt;
&lt;p&gt;A computer wouldn&apos;t substitute domain names in an email address like
   that.  The only thing I can think of is that somehow, humans are, at
   least in some cases, intimately involved in sending out confirmation
   emails from DoubleTree hotels.  I say &amp;quot;intimately&amp;quot; because this was no
   mere cut-and-paste mistake.  Someone had to transcribe an email
   address by hand, and accidentally put &amp;quot;gmail.com&amp;quot; where the original
   said &amp;quot;yahoo.com&amp;quot; or &amp;quot;aol.com&amp;quot; or whatever.
&lt;/p&gt;
&lt;p&gt;I hope Kathy has a nice trip.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Computer Generated Crossword Puzzles</title>
<category term="" />
<id>http://revealingerrors.com/2007/10/31/computer_generated_crosswords</id>
<updated>2007-10-31T20:18:15Z</updated>
<published>2007-10-31T20:18:15Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/computer_generated_crosswords" />
<content type="html">
&lt;p&gt;There are two free daily newspapers in Boston. The &lt;a href=&quot;http://boston.metro.us/&quot;&gt;Boston Metro&lt;/a&gt;
   and the &lt;a href=&quot;http://www.bostonnow.com/&quot;&gt;Boston Now&lt;/a&gt;. Both run crossword puzzles. The &lt;em&gt;Now&lt;/em&gt; runs a
   puzzle edited by &lt;a href=&quot;http://www.stanxwords.com/&quot;&gt;Stanley Newman&lt;/a&gt;. The &lt;em&gt;Metro&apos;s&lt;/em&gt; puzzle is
   unattributed.  When my friend &lt;a href=&quot;http://www.loyalty.org/~schoen/&quot;&gt;Seth Schoen&lt;/a&gt; was in town for
   several days, he did several crossword puzzles in the &lt;em&gt;Metro&lt;/em&gt;.  He
   pointed out to me that a clue in the crossword was repeated on two
   consecutive days. The crosswords in the &lt;em&gt;Metro&lt;/em&gt;, he concluded, were
   computer generated.
&lt;/p&gt;
&lt;p&gt;I picked up the &lt;em&gt;Metro&lt;/em&gt; each day for several weeks and, sure enough,
   there was a large amount of overlap in answers. &amp;quot;ALSO&amp;quot; and NIL&amp;quot; were
   answers three times in two weeks.  More suggestive, however, were the
   clues. In all three instances of each repeated answer, the clues were
   the same.  The clue for &amp;quot;ALSO&amp;quot; was always, &amp;quot;Part of a.k.a.,&amp;quot; while the
   clue for &amp;quot;NIL&amp;quot; was &amp;quot;Zilch.&amp;quot; Capitalization and punctuation, even for the
   uncapitalized &amp;quot;a.k.a.&amp;quot;, was consistent. Despite the fact that there was
   some variation in clues, I found some answers with different clues on
   different days. The high degree of consistency was undeniable.
&lt;/p&gt;
&lt;p&gt;Unassisted by a computer, no human editor would use the same clue for
   puzzles two days in a row. Frequent reuse of clues makes puzzles too
   easy for regular players and slight variation in clues is easy for a
   human puzzle editor to do. But even if the puzzles had been written in a
   different order than they were run in the paper, it is unlikely that a
   puzzle maker would repeatedly have come up with the same clues.  The
   chance of capitalization, phrasing, and style resulting in identical
   clue text is even more improbable. Humans simply aren&apos;t that consistent.
   Computers are. Through the reuse of the clues, a computerized
   provenance is revealed.
&lt;/p&gt;
&lt;p&gt;Perhaps a little ignorant, I&apos;d always assumed that crosswords were human
   generated. In fact, computer generated crosswords are widespread. There
   have been &lt;a href=&quot;http://portal.acm.org/citation.cfm?id=905884&quot;&gt;published&lt;/a&gt; &lt;a href=&quot;http://locus.siam.org/fulltext/.ICOMP/volume-05/0205004.pdf&quot;&gt;papers&lt;/a&gt; on computer generation of
   crosswords since the 1970s and a &lt;a href=&quot;http://www.crosswordtournament.com/articles/nyt081996.htm&quot;&gt;New York Times article&lt;/a&gt; on the
   subject was published in 1996 when the practice was beginning to take
   off.  Computers are able to generate puzzles quickly and in quantity
   and, as a result, are in common use in magazines and on websites.
&lt;/p&gt;
&lt;p&gt;There&apos;s resistance, however, from both human crossword editors and from solvers
   who find computer generated puzzles unsatisfying.  Great crossword puzzles,
   they argue, showcase wit and creativity with language; answers are often tied
   together by themes and wordplay.  Computers excel at taking a database of
   answers and creating grids that match up correctly; they are much faster and
   more accurate than humans.  But as the error that revealed the computer to my
   friend Seth illustrates, computers are less adept at varying when or how they
   employ answers and clues in puzzles.
&lt;/p&gt;
&lt;p&gt;Quoted in &lt;a href=&quot;http://www.tulsaworld.com/entertainment/article.aspx?articleID=071029_8_D1_hCros43382&quot;&gt;an article&lt;/a&gt; in &lt;em&gt;Tulsa World&lt;/em&gt;, Mark Lagasse, senior
   executive editor with &lt;em&gt;Dell Magazines&lt;/em&gt;, justified his magazine&apos;s choice to fund
   the more laborious human methods of crossword production saying, &amp;quot;with themes
   and the better, larger puzzles, it&apos;s best to have a constructor working them
   out and filling in the diagrams. A lot of the words are a bit more dry and
   boring when done with computers.&amp;quot; Ultimately, he concludes, computer-generated
   puzzles simply are not as entertaining as those made by humans.
&lt;/p&gt;
&lt;p&gt;I did the crossword puzzles in both the &lt;em&gt;Now&lt;/em&gt; and the &lt;em&gt;Metro&lt;/em&gt; for a
   couple weeks and I agree with Lagasse. The human generated puzzles are
   less repetitive, more interesting, and ultimately more satisfying. The
   computer generated puzzles almost never use word play and have no
   thematic connections between answers or clues.  Of course, I did both
   &lt;em&gt;Metro&lt;/em&gt; and &lt;em&gt;Now&lt;/em&gt; puzzles in the past and I always preferred the &lt;em&gt;Now&lt;/em&gt;
   puzzles and found them more fun. But I would have been hard-pressed to
   justify my feelings.  It was not until Seth pointed out the repeated
   clues, an error, that I was able to understand why I felt the way I did.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Only Yesterday</title>
<category term="" />
<id>http://revealingerrors.com/2007/10/25/only_yesterday</id>
<updated>2007-10-25T14:21:50Z</updated>
<published>2007-10-25T14:21:50Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/only_yesterday" />
<content type="html">
&lt;p&gt;I only recently stumbled across &lt;a href=&quot;http://www.xcom2002.com/doh/index.php?s=06090112oth&quot;&gt;this old revealing error&lt;/a&gt; in the
   wonderful &lt;a href=&quot;http://www.xcom2002.com/doh/&quot;&gt;Doh, The Humanity!&lt;/a&gt; weblog:
&lt;/p&gt;
&lt;p&gt; &lt;img src=&quot;images/only_yesterday-cropped.png&quot; alt=&quot;It may seem like only yesterday
(Wednesday, 26 July) when...&quot; border=&quot;1&quot;/&gt; 
&lt;/p&gt;
&lt;p&gt;In the days of newspapers and broadcast media, it was only likely for
   someone to read a news article on the day it was published. If the
   publication were weekly or monthly, it would be reasonable to expect
   readers got to it within the week or month. While libraries and others
   might keep archived versions, it was always clear to readers of archived
   material that their material -- and any relative dates mentioned therein
   -- were out of date.
&lt;/p&gt;
&lt;p&gt;Even today, news is still written primarily be consumed immediately and
   the vast majority of readers of an article will read it while it is
   fresh. But, websites have made archived material live on for months and
   years. While this is generally a good thing, it creates all sorts of
   problems for people who use relative dates in articles. The point of
   reference -- today -- becomes unstable. As a result, if an entertainment
   reporter describes a show as happening, &amp;quot;next Tuesday,&amp;quot; it might appear
   to refer to any number of incorrect Tuesdays depending on when someone
   has stumbled across the archived version.
&lt;/p&gt;
&lt;p&gt;News companies have responded by converting relative dates into absolute
   ones. No doubt, this was often done by editors but today is also done by
   computer programs. These programs parse each news story looking for
   relative dates. When they find one, they compute the corresponding
   absolute date from the relative one, and add it into the text of the
   article in a parenthetical aside.
&lt;/p&gt;
&lt;p&gt;Most people, including myself, never knew or even imagined that articles
   were being parsed like this until the system screwed up as it did in the
   screenshot above. No human editor would have thought to provide an
   absolute date for &amp;quot;yesterday&amp;quot; in the phrase, &amp;quot;it may seem like only
   yesterday.&amp;quot; With this misstep, the script at work is revealed. With the
   mistakes, the program&apos;s previous work -- hopefully more accurate and
   less noticeable in old articles -- becomes visible as well.  Since
   seeing this image, I&apos;ve noticed these date absolutefiers at work
   everywhere.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Technology-Assisted Deception</title>
<category term="" />
<id>http://revealingerrors.com/2007/10/18/technology_aided_deception</id>
<updated>2007-10-18T14:22:44Z</updated>
<published>2007-10-18T14:22:44Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/technology_aided_deception" />
<content type="html">
&lt;p&gt;When I told a friend about this project, she (let&apos;s call her Alice) told me a story. 
&lt;/p&gt;
&lt;p&gt;Alice lives a major city with fairly good cell coverage, except for a long tunnel that goes underground. Alice often gets calls from her mom. Alice likes her mom, of course, no problems there, but, as we all do, finds long phone calls with her a bit trying. So often when she&apos;s on the phone with her mom, Alice will say &amp;quot;Oops, sorry mom, got to go into the tunnel now!&amp;quot; Even when she&apos;s not going into the tunnel.
&lt;/p&gt;
&lt;p&gt;This is a lie that new technology makes possible. But it only makes it possible because there are known failures in the technology. If we only had old landline phones, you&apos;d never be able to claim such a thing -- landline phones don&apos;t randomly stop working. And if we had perfect cell coverage, it similarly wouldn&apos;t make sense. But since the technology only half-works, the unfinished space can be used for very human forms of deception.
&lt;/p&gt;

</content>
</entry>

<entry>
<title type="html">Welcome to Revealing Errors</title>
<category term="" />
<id>http://revealingerrors.com/2007/10/11/about</id>
<updated>2007-10-11T14:22:59Z</updated>
<published>2007-10-11T14:22:59Z</published>
<link rel="alternate" type="text/html" href="http://revealingerrors.com/about" />
<content type="html">
&lt;p&gt;Welcome to the &lt;em&gt;Revealing Errors&lt;/em&gt; weblog. Our goal is to reveal errors
   that reveal the technology around us to learn how technology affects our
   lives.
&lt;/p&gt;

&lt;h1&gt; Introduction&lt;/h1&gt;
&lt;div style=&quot;margin-left: 3em; font-size: 0.8em; font-style: italic;&quot;&gt;

&lt;p&gt;This introduction is adapted from the introduction to the &lt;a
href=&quot;http://journal.media-culture.org.au/0710/01-hill.php&quot;&gt;Revealing
   Errors article&lt;/a&gt; published as the featured article in the &lt;a
href=&quot;http://journal.media-culture.org.au/0710/&quot;&gt;Error issue (10.5)&lt;/a&gt;
   of the peer-reviewed &lt;a href=&quot;http://www.media-culture.org.au/&quot;&gt;
   Media/Culture Journal&lt;/a&gt; from the University of Melborne.  Please read
   &lt;a href=&quot;http://journal.media-culture.org.au/0710/01-hill.php&quot;&gt;the
   article&lt;/a&gt; for more in depth treatment of the subject.
&lt;/p&gt;
&lt;p&gt; &lt;/div&gt; 
&lt;/p&gt;
&lt;p&gt;In The &lt;em&gt;World Is Not a Desktop&lt;/em&gt;, Marc Weisner, the principal scientist and
   manager of the computer science laboratory at Xerox PARC, stated that,
   “a good tool is an invisible tool.” Weisner cited eyeglasses as an ideal
   technology because with spectacles, he argued, “you look at the world,
   not the eyeglasses.” Through repetition, and by design, technologies
   blend into our lives. While technologies, and communications
   technologies in particular, have a powerful mediating impact, many of
   the most pervasive effects are taken for granted by most users. When
   technology works smoothly, its nature and effects are invisible. But
   technologies do not always work smoothly. A tiny fracture or a smudge on
   a lens renders glasses quite visible to the wearer.
&lt;/p&gt;
&lt;div style=&quot;width: 480px; font-size: 0.9em; font-style: italic;&quot;&gt;
&lt;a href=&quot;http://commons.wikimedia.org/wiki/Image:Blue_screen_%28Windows_2000%2C_Seoul_Subway%29.jpg&quot;&gt;
&lt;img style=&quot;border: 1px black solid;&quot; src=&quot;/images/blue_screen_seoul_subway.jpg&quot; alt=&quot;Blue Screen of Death&quot; /&gt;
&lt;/a&gt;

&lt;p&gt;The Microsoft Windows “Blue Screen of Death” on subway in Seoul.
   &lt;/div&gt; 
&lt;/p&gt;
&lt;p&gt;Anyone who has seen a famous “Blue Screen of Death”—the iconic signal of
   a Microsoft Windows crash—on a public screen or terminal knows how
   errors can thrust the technical details of previously invisible systems
   into view. Nobody knows that their ATM runs Windows until the system
   crashes. Of course, the operating system chosen for a sign or bank
   machine has important implications for its users. Windows, or an
   alternative operating system, creates affordances and imposes
   limitations. Faced with a crashed ATM, a consumer might ask herself if,
   with its rampant viruses and security holes, she should really trust an
   ATM running Windows?
&lt;/p&gt;
&lt;p&gt;Technologies make previously impossible actions possible and many
   actions easier. In the process, they frame and constrain possible
   actions. They mediate. Communication technologies allow users to
   communicate in new ways but constrain communication in the process. In a
   very fundamental way, communication technologies define what their users
   can say, to whom they say it, and how they can say it—and what, to whom,
   and how they cannot.
&lt;/p&gt;
&lt;p&gt;Technology activists, like those at the &lt;a href=&quot;http://www.fsf.org&quot;&gt;Free Software Foundation
(FSF)&lt;/a&gt; and the &lt;a href=&quot;http://www.eff.org&quot;&gt;Electronic Frontier Foundation
(EFF)&lt;/a&gt;, understand the power, importance, and
   limitations of technology and technological mediation. Largely
   constituted by technical members, both organisations, like humanist
   scholars studying technology, have struggled to communicate their
   messages to a less-technical public. Before one can argue for the
   importance of individual control over who owns technology, as both FSF
   and EFF do, an audience must first appreciate the power and effect that
   their technology and its designers have. To understand the power that
   technology has on its users, users must first see the technology in
   question. Most users do not.
&lt;/p&gt;
&lt;p&gt;Errors are under-appreciated and under-utilised in their ability to
   reveal technology around us. By painting a picture of how certain
   technologies facilitate certain mistakes, one can better show how
   technology mediates. By revealing errors, scholars and activists can
   reveal previously invisible technologies and their effects more
   generally. Errors can reveal technology—and its power and can do so in
   ways that users of technologies confront daily and understand
   intimately.
&lt;/p&gt;

&lt;h1&gt; About Us&lt;/h1&gt;
&lt;p&gt;This weblog is maintained by &lt;a href=&quot;http://mako.cc&quot;&gt;Benjamin Mako Hill&lt;/a&gt;, a
   free/open source technology activist, developer and consultant. This
   project is done as part of a Fellowship at the &lt;a href=&quot;http://civic.mit.edu&quot;&gt;MIT Center for Future
Civic Media&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;Contributions, both in terms of suggestions or pointers to revealing
   errors or in the form of full articles is graciously accepted. Please
   email &lt;a href=&quot;mailto:mako@atdot.cc&quot;&gt;mako@atdot.cc&lt;/a&gt; with any such suggestions.
&lt;/p&gt;

</content>
</entry>
</feed>
