During the course of a day, the average person who works at a desk deals with torrents of information coming from many sources: e-mails, Web searches, calendars, notes, spreadsheets, documents, and presentations. Sorting through the information is tough, and for the most part, it’s done in an ad hoc manner. But in the next couple of months, there may be a better way. Radar Networks, based in San Francisco, is releasing a free Web-based tool, called Twine, that it hopes will change the way people organize their information.
Twine is a website where people can dump information that’s important to them, from strings of e-mails to YouTube videos. Or, if a user prefers, Twine can automatically collect all the Web pages she visited, e-mails she sent and received, and so on. Once Twine has some information, it starts to analyze it and automatically sort it into categories that include the people involved, concepts discussed, and places, organizations, and companies. This way, when a user is searching for something, she can have quick access to related information about it. Twine also uses elements of social networking so that a user has access to information collected by others in her network. All this creates a sort of “collective intelligence,” says Nova Spivack, CEO and founder of Radar Networks.
Spivack says that Twine leverages decades’ worth of work done in esoteric research fields such as machine learning and natural-language processing. “Twine helps you become smarter, more productive, and collaborate, share, and organize in a smarter way,” he says.
The idea underlying Twine’s function and technologies is known as the Semantic Web, a concept, long discussed in research circles, that can be described as a sort of smart network of information in which data is tagged, sorted, and searchable. Spivack says that his company’s tool is “one of the first mainstream applications of the Semantic Web.”
To be sure, Twine is not the first Semantic Web product or tool. For years, companies have used database software that automatically puts information in certain categories and searches for it accordingly, with varying degrees of accuracy. Even today’s simple blogging tools have elements of the Semantic Web: people add tags to their posts, thereby creating useful metadata that can be searched. In addition, del.icio.us, the online bookmarking site where people add tags to links of saved Web pages, is an example of giving structure to previously unstructured data.
Thus, a hard-and-fast definition of the Semantic Web can be elusive, says Clay Shirky, professor in the Interactive Telecommunications Program at New York University. “There’s a range you’re playing in,” he says. At its most basic, says Shirky, the Semantic Web is a campaign to tag information with extra metadata that makes it easier to search. At the upper limit, he says, it is about “waiting for machines to become devastatingly intelligent.”
According to Spivack, Twine can be called a Semantic Web application because the software was written with Semantic Web standards, established by the World Wide Web Consortium (W3C), in mind. This means that its design follows certain conventions, and because of this, Twine is compatible with other Semantic Web applications, and its information can be shared across applications.
In addition to employing the Semantic Web standards, Twine is also using extremely advanced machine learning and natural-language processing algorithms that give it capabilities beyond anything that relies on manual tagging. The tool uses a combination of natural-language algorithms to automatically extract key concepts from collections of text, essentially automatically tagging them. According to Spivack, these algorithms adroitly handle ambiguous sets of words, determining, for example, whether J.P. Morgan is a person or a company, depending on the context. And Twine can find the subject of a text even if a keyword is never mentioned, he says, by using statistical machine learning to compare the text with data sources such as Wikipedia. “We can determine when a document is about a subject even if the subject isn’t mentioned in the document,” Spivack says. “So we can add new paths and new ways to get to the document” during a search.
Another technique that Twine uses is graph analysis. This idea, explains Spivack, is similar to the thinking behind the “social graph” that Mark Zuckerberg, the founder of Facebook, extols: connections between people exist in the real world, and online social-networking tools simply collect those connections and make them visible. In the same way, Spivack says, Twine helps make the connections between people and their information more accessible. When data is tagged, it essentially becomes a node in a network. The connections that each node has to other nodes (which could be other data, people, places, organizations, projects, events, et cetera) depend on their tags and the statistical relevance they have to the tags of other nodes. This is how Twine determines relevance when a person searches through his or her information. The farther away a node is, the less relevant it is to a user’s search.
It’s still too early to know if Twine will be successful with consumers, says Tony Shaw, president of Semantic Universe, an organization committed to raising awareness of semantic technologies in business and consumer settings. Success will not simply depend on making the technology work, but also on managing people’s expectations of the technology, he says. “It’s about fighting the hype problem.”
Twine will open up to invited users starting today. In the next couple of months, says Spivack, the tool will accept more users, and by the summer of 2008, it should be completely open. In addition, Twine will have an open platform that allows software developers to build tools on top of it, such as visualization software so that users can see their information in different ways. “But first, we’re starting with the basics,” Spivack says.