Skip to Content

Wikipedians Promise New Search Engine

But Google is likely to be a far harder target than Encyclopaedia Britannica.
March 16, 2007

It’s been no secret that Jimmy Wales, Wikipedia’s founder, has been thinking it’s time for a new style of Internet search engine. He has made it plain in public remarks, in postings to electronic mailing lists, and elsewhere over the past six months that he sees drawbacks to fully software-driven search engines such as Google and Yahoo. He has also made plain that he thinks the collaborative, decentralized publishing process behind Wikipedia might just be the answer.

Search solution? Jimmy Wales’s Wikia, which hosts hundreds of special-interest wikis, is getting into the search business with a project that will attempt to incorporate user feedback and Wikipedia-style editorial collaboration into the selection and ranking of search results.

Wales made his intentions official at a March 8 news conference in Tokyo, where he said that his new for-profit company, Wikia, would lead a project to launch a community-driven, open-source search site by the end of 2007. While the technical workings of Wikia Search are still being debated, it’s clear that the project will combine fully robotic Web exploration, or “spidering,” with Web-based tools that are in the hands of humans–both volunteer editors, who will organize and highlight the best content, and average users, who will vote on the usefulness of each search result, thereby influencing how high these results rank in future searches.

Wikia, based in San Mateo, CA, will host the search site and collect the advertising revenue it generates, but Wales believes that the site can be designed and built mainly by volunteers, through open-source collaboration of the type that gave rise to the Linux operating system. More than 700 volunteer developers are “already hacking away” at the problem using test servers donated to Wikia by supporters, according to Wikia CEO Gil Penchina.

The science of search is still an arcane one. It takes a deep understanding of file systems, index architectures, hard disk performance, networked storage, and fast query-time ranking–not to mention thousands of servers and an extreme amount of bandwidth–to build and run a major search engine. But today, Penchina and Wales argue, there is enough expertise outside the walls of the major search companies to design a competitive open-source search engine–one that they project could attract millions of users daily and capture as much as 5 percent of the $7 billion market for search-related advertising.

All that’s left is to build it. Not only does the Wikia search engine not exist yet; the company is still gathering technical suggestions at the most basic level (as can be seen by browsing the project’s mailing-list archive).

Wikia Search volunteers–the equivalent of Wikipedia’s core community of contributors and editors–might have jobs similar to those of the category editors at the Open Directory Project, a human-edited Web directory hosted by AOL for which each volunteer is in charge of tracking Web resources on a specific subject. Alternatively, all subjects might be open to all editors, who could use a Wikipedia-like system to rank or annotate search results and track other people’s revisions (and reverse them in cases of vandalism).

End users, meanwhile, might be asked to give a “thumbs up” or “thumbs down” vote about each individual search result, the way they can at the collaborative news aggregator Digg. Or they might be asked to “tag” or add descriptive words to results, helping others find them later, in the style of the social search sites TagWorld and Prefound and the photo-sharing community Flickr.

“We have a lot of things developing in parallel, and some of our projects are actually competing with each other, so we’ll have to see what the outcome is,” says Penchina. That will take patience, but Wikia is in no hurry to finalize a solution, he says. “We have this saying internally: ‘No a priori thinking.’ Communities evolve in their own special way, and anyone who thinks they know where the crowd is going to go generally doesn’t understand crowd psychology.”

Only a few matters are settled, according to Penchina. One is the basic premise: that Wikia Search will combine the strengths of software and people. “Computers are useful for large-scale problem solving like building an index, but machine judgment is usually never as good as human judgment, so you need a blend of the two,” he says.

Another settled matter is that Wikia Search’s developers won’t attempt to reinvent the wheel: the purely algorithmic components of Wikia Search will be built on top of the existing open-source search engines Nutch and Lucene, both initiated by independent software developer Doug Cutting.

Reaction to the news of Wikia’s ambitions is mixed. Some in the technical community say that Internet users deserve a search engine whose workings are open for all to examine, in contrast to the closely guarded ranking algorithms used by Google and its peers.

Others have underscored the huge challenges in going up against the likes of Google, which employs many of the world’s best brains in information-retrieval technology, owns a vast global infrastructure of servers, and dishes up results good enough that more than one in four Internet users make a stop at the search engine every day. “Google and Yahoo and MSN and Ask do a pretty damned good job,” remarked search-industry veteran Stavros Macrakis in a late February post to the Wikia Search mailing list. “It’s not as though the competition was a $2,000 Encyclopaedia Britannica which is always years out of date.”

Penchina acknowledges the scale of the challenges but says Wikia is in the search business for the long haul. “I don’t know that we expect massively impressive results from day one,” he says. “Wikipedia has taken six years to get where it is.”

Wikia Search has a somewhat confusing genealogy. Wikipedia, which melded the idea of an online encyclopedia with the collaborative-editing technology of wikis, has been controlled since 2003 by the nonprofit Wikimedia Foundation, which also operates Wiktionary, Wikinews, Wikiquote, and other collaborative projects.

Wikia, on the other hand, is a for-profit company cofounded in 2004 by Wales and British Internet entrepreneur Angela Beeseley under the original name Wikicities. It hosts hundreds of special-interest wikis, including wikis for genealogy buffs. Wikia has no direct connection to Wikipedia; however, several of Wikipedia’s most dedicated contributors are now employees at Wikia, including Beeseley.

Wikia raised at least $4 million in venture capital in 2006 from a group including Bessemer Venture Partners, Omidyar Network, Amazon.com, and angel investors.

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.