Cross Site Descripting

Blogger Jordan Wiens recently noticed a funny thing about the Apple website. When one tries to search for “applescript” (Apple’s scripting and automation product) on Apple’s website, they end up with this search result:

Applescript search results from Apple.com

Until the issue is fixed, you can see for yourself by navigating to http://www.apple.com/search/?q=applescript.

On the search result page, the Apple search software seems to change the term “applescript” into “apple.” A search for the term “apple” on the Apple website is, as one might imagine, not a particularly useful way to find information about Applescript. To most users, this error is confounding. To a trained eye, it reveals an overzealous security system attempting to prevent what’s called cross-site scripting or XSS — a way that spammers, phishers, and nefarious system-crackers can sneakily work around privacy and security systems in web browsers by exploiting two features of modern web browsers.

First, through the use of a programming language called Javascript, many web pages run small computer programs inside users’ browsers. These Javascript programs allow for applications that are more responsive than would have been possible before (think Google Maps for a good example). Running random programs is risky, of course. To protect users and their privacy, web browsers limit Javascript programs in several ways. One common technique is to limit access granted to a Javascript program from a given website to information from the site the Javascript originated at. This security system is designed to bar one website’s programs from accessing and relaying sensitive information, like login information or credit card numbers, from another website.

Second, a large number of applications allow input from users that is subsequently displayed on web pages. This can come in the form of edits and additions to Wikipedia pages, comments on forums, articles, or blogs, or even the fact that when you run a web search, the search terms are displayed back to you at the top of your page.

A security vulnerability, it turns out, lies in the combination of the two features. This vulnerability, XSS, happens when a nefarious user embeds small Javascript programs in input (e.g., a comment) which is run each time a page is subsequently viewed. Masquerading to the browser as a legitimate script created by the website creator, these programs can access sensitive information from the website stored on the user’s computer (e.g., login information) and then send this information to the author of the script without the violated user’s permission or knowledge.

When an attacker executes an XSS attack, they do so by trying to include Javascript in input that will be displayed to the user. This usually comes in the form of:

    <script>some code send to private information</script> 

In HTML, the “<script>” and “</script>” tags signify to the web browser that the text between is a program to be run.

XSS has become a large problem. To combat and prevent it, web developers take great care to protect their users and their applications from attacks by blocking, removing, or disabling attempts to include programs in user input. One frequently employed method of doing so is to simply remove the “<script>” tags that cause programs to be run. Without the tags, malicious code may remain, but will never be executed on users’ computers.

With this knowledge of XSS we can begin to understand the puzzling behavior of Apple’s website. By trying several other searches, we can confirm that Apple’s search engine is, in fact, removing all mentions of the term “script” from input to the site. The system is almost certainly designed to block XSS. While it is likely to succeed in doing so, the side effects, in the case of users searching for Applescript, are extremely inconvenient.

Through the error, Apple reveals their overzealous system designed to prevent XSS. Those who dig deeper to understand the source of this initially baffling behavior can gain new respect for implicit trust that that our browsers give to code on the websites we visit and the ways in which this trust can be abused.

In all likelihood, we have all been the victims of XSS attacks as users — although most of us have been lucky enough to avoid divulging sensitive information in the process. Apple’s error represents “collateral damage” in a a war fought between crackers, spammers, phishers on one side and web applications developers on the other. While we are rarely aware of it, this battle affects the way our web applications are designed and the features they do, and do not, include. We are, indirectly, affected by XSS even when we’re not looking for information on Applescript. By revealing one anti-XSS security system, Apple’s mistep points to that fact.

Thunderbird and the Nature of Spam

I found this beautiful and simple example of a revealing error featured in the fantastic (and very funny) Error’d series on Worse Than Failure:

Thunderbird showing it's own welcome message as spam.

My guess is that before most users start the Mozilla Thunderbird email client for the first time, they don’t know that the software has a spam detection feature. That said, when the welcome message that automatically shows up in the inbox of every new Thunderbird user is prefixed by a notice that the message in question might be “junk,”, users’ ignorance on the matter will quickly be put to rest!

Of course, much more than the simple existence of the spam-flagging system is revealed by this error. With a little reflection, we can infer some of the criteria that Thunderbird must be using to sort spam or junk from legitimate email. Most mail systems, including Thunderbird use a variety of methods which, in aggregate, are used to determine the likelihood of a message being spam. Thunderbird’s welcome message is not addressed directly to the user in question and it makes extensive use of rich-text HTML and images — both common characteristics to spam.

Central to most modern spam-checkers is a statistical analysis of words used in the content of the email. Since spammers are trying to communicate a message, a prevalence of certain words and an absence of others is usually sufficient to sort out the junk. Sure enough, the Thunderbird welcome message is written using rather impersonal and marketing-speak terms that would be less likely in personal email (e.g., offering “product information”).

From the perspective of the Thunderbird developers, the flagging of this message as spam seems to be in error. From the perspective of the user though, it is not quite as clear. The Thunderbird message is both unsolicited and commercial in nature — essentially the definition of spam. In the “looks like a duck” sense, it uses words that make it “read” like spam.

While this simple error can teach Thunderbird users about the existence and the nature of their spam-checker, it might also teach the folks responsible for the Thunderbird welcome message something about the way the their messages might seem to their users.

Identity Crisis

This error was revealed and written up by Karl Fogel.

Yesterday I received email from a hotel, confirming a reservation for a room. But it wasn’t meant for me; it was meant for “Kathy Fogel” (whom I’ve never met), and was sent to “k.fogel@gmail.com”.

Now, I do have the account “kfogel@gmail.com”, but I’d never received email for “k.fogel” before. As I’d always thought “.” was a significant character in email addresses, I didn’t see how I could have gotten this mail. It turns out, though, that Google ignores “.” when it’s in the username portion of a gmail address. My friend Brian Fitzpatrick knew this already, and pointed me to Google’s explanation. (I learned later that others have been suprised by this behavior too.)

So the error revealed a feature — at least, I’m fairly sure Google would consider it a feature, although the exact motivation for it is still not clear to me. It might be a technical requirement caused by merging several legacy user databases, or it might simply be to prevent confusion among addresses that only differ by dots.

Anyway, I called the hotel, and eventually managed to make them understand that I had no idea who Kathy Fogel was, and that I’d accidentally gotten an email intended for her. They said they’d resend, and of course I said “Wait, no, it’ll just come to me again!” But they swore they had a different email address on file for her, and indeed, I haven’t gotten a second email.

Which raises another question: how did they send the mail to “k.fogel@gmail.com” in the first place? Clearly, Kathy Fogel cannot have that address, because Google will not allow any other “dot variants” of an address to be registered after the first. (Besides, if she did have that address, we’d be getting each other’s mail all the time, and we’re not.) It’s also unlikely that she mistakenly given them that address herself, since they already had another address in place by the time I called.

A computer wouldn’t substitute domain names in an email address like that. The only thing I can think of is that somehow, humans are, at least in some cases, intimately involved in sending out confirmation emails from DoubleTree hotels. I say “intimately” because this was no mere cut-and-paste mistake. Someone had to transcribe an email address by hand, and accidentally put “gmail.com” where the original said “yahoo.com” or “aol.com” or whatever.

I hope Kathy has a nice trip.