We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

Geek Activity Page: Web Libs

Build a content filter that rewrites the Web – your way, Mad Lib style!

If you are one of the many who feel that the media are unforgivably biased, the Web now has a solution for you. Greasemonkey, an add-on for the open-source Firefox browser, can act as a programmable content filter, sanitizing or scandalizing the news before you see it. For fun, we wrote a simple script (detailed below) that lets Greasemonkey rewrite the news ungrammatically, or render it politically incorrect or even offensive. No matter where you stand on the political spectrum, you’ll see that Greasemonkey and related technologies are destroying one of the last one-way streets in the media world. While the Internet may be interactive, many of the most trusted and reputable websites still treat readers as passive recipients of content. Pages are rendered on the computer screen more or less the way the publishers intended, and your job is to consume, not to participate.

But of course, Web pages are nothing more than large collections of bits, and bits are easy to flip, cut, and splice. Nothing can stop the data that the New York Times or MSNBC sends to your computer from being modified before it is displayed.

It used to be hard to write programs that hacked Web pages in real time. Mozilla Firefox changed that with a plug-in architecture and a series of extensions. One of the best-known Firefox extensions is Adblock, which lets you suppress any website advertisement you choose.

This story is part of our August 2005 Issue
See the rest of the issue

More interesting for the programmer is Greasemonkey, a nifty extension by Aaron Boodman and Jeremy Dunck that lets you write JavaScript programs that can rip apart Web pages on the fly. Greasemonkey hooks JavaScript into the innards of the browser, making it much easier to hack a Web page. This frees you to concentrate on what’s fun – for example, writing a program that inverts a website’s stated intent.

That’s what we did with Doubletake, a wacky script that subverts a page’s original HTML with a list of specified substitutions. It’s like Mad Libs for the Web: Web Libs.

If you download Firefox, install Greasemonkey, and activate Doubletake, every Web page you view will be carefully rewritten using words of your own choosing. If a particular politician seems a bit mentally challenged, you can replace his name with “Village Idiot.” Or whatever.

Doubletake is engineered to take advantage of built-in JavaScript functions such as the replace method, which can act upon the document object containing the HTML for a Web page. Repeatedly calling the replace function for each word will rewrite the document. This approach is sluggish. The time required is proportional to the size of the document multiplied by the length of the list of words to be replaced.

To create a snappier version, we used JavaScript’s built-in hash tables to store the list of words to be replaced. We preprocessed this list and built a table called matchTable, then broke the document apart and replaced every word appearing in the table.

if (typeof matchTable[word]!=”undefined”){
} else {

However long the list of words to be replaced, the matchTable function finds each match in a constant amount of time, so the time required is proportional only to the size of the document.

The technologies at work here have more-practical applications as well. For example, Greasemonkey scripts can modify the style sheets that control how Web pages are displayed, so your browser could, say, display all text as black type on a white background in 14-point font size – just the thing for the 20 million Americans who have significant vision problems.

Firefox and Greasemonkey show the inherently democratizing power of open-source software. Giving everyone the ability to rewrite source code is upsetting the balance of power between programmers and users, and between publishers and readers. Of course, website authors who don’t want their artistic integrity eroded can fight back: one of the most common techniques for sabotaging end-user control is to put text inside graphics or multimedia Flash presentations. But these tricks make websites inaccessible for the blind (who rely on text readers) and impossible to navigate using cell phones. The battle for the future of mass communication is just beginning.

Code and instructions at doubletake.ex.com.

Simson Garfinkel is a programmer and researcher in the field of computer security and the author of Database Nation: The Death of Privacy in the 21st Century. Peter Wayner is a programmer and the author of Translucent Databases.

Want to go ad free? No ad blockers needed.

Become an Insider
Already an Insider? Log in.
Want more award-winning journalism? Subscribe to Insider Plus.
  • Insider Plus {! insider.prices.plus !}*

    {! insider.display.menuOptionsLabel !}

    Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

    See details+

    Print + Digital Magazine (6 bi-monthly issues)

    Unlimited online access including all articles, multimedia, and more

    The Download newsletter with top tech stories delivered daily to your inbox

    Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

    10% Discount to MIT Technology Review events and MIT Press

    Ad-free website experience

You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.