Geek Activity Page: Web Libs

Build a content filter that rewrites the Web – your way, Mad Lib style!

If you are one of the many who feel that the media are unforgivably biased, the Web now has a solution for you. Greasemonkey, an add-on for the open-source Firefox browser, can act as a programmable content filter, sanitizing or scandalizing the news before you see it. For fun, we wrote a simple script (detailed below) that lets Greasemonkey rewrite the news ungrammatically, or render it politically incorrect or even offensive. No matter where you stand on the political spectrum, you’ll see that Greasemonkey and related technologies are destroying one of the last one-way streets in the media world. While the Internet may be interactive, many of the most trusted and reputable websites still treat readers as passive recipients of content. Pages are rendered on the computer screen more or less the way the publishers intended, and your job is to consume, not to participate.

But of course, Web pages are nothing more than large collections of bits, and bits are easy to flip, cut, and splice. Nothing can stop the data that the New York Times or MSNBC sends to your computer from being modified before it is displayed.

It used to be hard to write programs that hacked Web pages in real time. Mozilla Firefox changed that with a plug-in architecture and a series of extensions. One of the best-known Firefox extensions is Adblock, which lets you suppress any website advertisement you choose.

This story is part of our August 2005 Issue
See the rest of the issue

More interesting for the programmer is Greasemonkey, a nifty extension by Aaron Boodman and Jeremy Dunck that lets you write JavaScript programs that can rip apart Web pages on the fly. Greasemonkey hooks JavaScript into the innards of the browser, making it much easier to hack a Web page. This frees you to concentrate on what’s fun – for example, writing a program that inverts a website’s stated intent.

That’s what we did with Doubletake, a wacky script that subverts a page’s original HTML with a list of specified substitutions. It’s like Mad Libs for the Web: Web Libs.

If you download Firefox, install Greasemonkey, and activate Doubletake, every Web page you view will be carefully rewritten using words of your own choosing. If a particular politician seems a bit mentally challenged, you can replace his name with “Village Idiot.” Or whatever.

Doubletake is engineered to take advantage of built-in JavaScript functions such as the replace method, which can act upon the document object containing the HTML for a Web page. Repeatedly calling the replace function for each word will rewrite the document. This approach is sluggish. The time required is proportional to the size of the document multiplied by the length of the list of words to be replaced.

To create a snappier version, we used JavaScript’s built-in hash tables to store the list of words to be replaced. We preprocessed this list and built a table called matchTable, then broke the document apart and replaced every word appearing in the table.

if (typeof matchTable[word]!=”undefined”){
} else {

However long the list of words to be replaced, the matchTable function finds each match in a constant amount of time, so the time required is proportional only to the size of the document.

The technologies at work here have more-practical applications as well. For example, Greasemonkey scripts can modify the style sheets that control how Web pages are displayed, so your browser could, say, display all text as black type on a white background in 14-point font size – just the thing for the 20 million Americans who have significant vision problems.

Firefox and Greasemonkey show the inherently democratizing power of open-source software. Giving everyone the ability to rewrite source code is upsetting the balance of power between programmers and users, and between publishers and readers. Of course, website authors who don’t want their artistic integrity eroded can fight back: one of the most common techniques for sabotaging end-user control is to put text inside graphics or multimedia Flash presentations. But these tricks make websites inaccessible for the blind (who rely on text readers) and impossible to navigate using cell phones. The battle for the future of mass communication is just beginning.

Code and instructions at

Simson Garfinkel is a programmer and researcher in the field of computer security and the author of Database Nation: The Death of Privacy in the 21st Century. Peter Wayner is a programmer and the author of Translucent Databases.

Tech Obsessive?
Become an Insider to get the story behind the story — and before anyone else.
Subscribe today

Uh oh–you've read all five of your free articles for this month.

Insider Premium

$179.95/yr US PRICE

More from undefined

Want more award-winning journalism? Subscribe to Insider Premium.

  • Insider Premium {! insider.prices.premium !}*

    {! insider.display.menuOptionsLabel !}

    Our award winning magazine, unlimited access to our story archive, special discounts to MIT Technology Review Events, and exclusive content.

    See details+

    What's Included

    Bimonthly home delivery and unlimited 24/7 access to MIT Technology Review’s website.

    The Download. Our daily newsletter of what's important in technology and innovation.

    Access to the Magazine archive. Over 24,000 articles going back to 1899 at your fingertips.

    Special Discounts to select partner offerings

    Discount to MIT Technology Review events

    Ad-free web experience

    First Look. Exclusive early access to stories.

    Insider Conversations. Join in and ask questions as our editors talk to innovators from around the world.

You've read of free articles this month.