But if there is one software company that knows how to hire young turks and turn their ideas into market-dominating products, it’s Microsoft. Name any hot corner of computer science, and the company Bill built is likely to employ at least one or two of the field’s leading investigators: after all, the five Microsoft Research labs around the world employ more than 600 researchers. And when Microsoft smells a big market, it usually moves with full force to stake its claim.
There’s nothing blue-sky about Microsoft’s forays into information retrieval, the discipline from which the search engine sprang. The company has already won a 97 percent market share in PC operating systems and a 90 percent share in office software; search is one of the last big pieces of the computing landscape that Microsoft does not dominate. And a survey of R&D projects at the company confirms that it sees enhanced forms of search as key to its business growth. As the release of the next version of Windows, code-named Longhorn, grows nearer-a test version will be ready later this year-researchers and product developers are accelerating efforts to make Web searching an integral part of it.
One of the flashiest pieces of software in the works promises to allow you to enter your questions in simple English and get a direct answer back. The company believes search users shouldn’t have to worry about selecting the right keywords, linking them together with the right Boolean operators (and, or, not, etc.), and scrolling through page after page of search results. Instead, says Microsoft researcher Eric Brill, search engines should understand and answer questions in natural language.
Take Microsoft Research’s AskMSR program, which Brill and his colleagues have been testing on Microsoft’s internal network for more than a year. At its core is a simple search box where users can enter questions such as “Who killed Abraham Lincoln?” and, instead of getting back a list of sites that may have the information they seek, receive a plain answer: “John Wilkes Booth.” The software relies not on any advanced artificial-intelligence algorithm but rather on two surprisingly simple tricks. First, it uses language rules learned from a large database of sample sentences to rewrite the search phrase so that it resembles possible answers: for example, “___ killed Abraham Lincoln” or “Abraham Lincoln was killed by ___.” Those text strings are then used as the queries in a sequence of standard keyword-based Web searches. If the searches produce an exact match, the program is done, and it presents that answer to the user.
In many cases, though, the program won’t find an exact match, but only oblique variations on the text strings, such as “John Wilkes Booth’s violent deed at the Ford Theater ended Lincoln’s second term before it had started.” That’s okay, too. As its second trick, AskMSR reasons that if “Booth” frequently appears in the same sentence as “Lincoln,” there must be an important relationship between them-which allows it to posit an answer, even if it’s not 100 percent confident (see “Q: How Does Question Answering Work?” below). “We are tapping into the redundancy of the Web,” explains Brill. “If you have a lot of places where you are somewhat certain that you have found the answer, the redundancy makes it more certain.” As the Web grows, so will its redundancy, making AskMSR ever more powerful, Brill reasons. While plans for AskMSR aren’t definite, Brill believes the code will see the light of day, perhaps as part of a future Microsoft search engine.
Another Microsoft Research effort is less concerned with how search engines work than with how and when users need information. “Right now, when you want to search for information, you basically stop everything you’re doing, pull up a separate application, run the search, then try to integrate the search result into whatever you were doing before,” says Microsoft information retrieval expert Susan Dumais. “We are trying to think about how search can be much more a part of the ongoing computing experience.”
Toward that end, Dumais is developing a program called Stuff I’ve Seen that’s designed to give computer users quick, easy access to everything they have viewed on their computers. The interface to the experimental program, which will influence the search capabilities in Longhorn, is an always available search box inside the Windows taskbar. Enter a query into the box, and Stuff I’ve Seen will display an organized list of links to related e-mail messages, calendar appointments, address book contacts, office documents, or Web pages in a single, unified window. One emerging feature of Stuff I’ve Seen, called Implicit Query, would work in the background to retrieve information related to whatever the user is working on. If you’re reading an e-mail message, for example, Implicit Query might display a box with links to the titles and e-mail addresses of all the people whom the message mentions, and to all of your previous e-mail from the sender. To make the software even more useful, Dumais is working on adding an item to the two-button mouse’s standard Windows right-click menu that would be labeled “Find me stuff like this” and would search both personal and Web data for information related to a highlighted name or phrase.
AskMSR, Stuff I’ve Seen, and related projects are all part of a larger shift in technology strategy at Microsoft, one that could position the company to convert hundreds of millions of Windows users around the world to its own search technology, much as it wrested the Web browser market from Netscape back in the 1990s. The crux of this transformation is the new Windows File System, or WinFS-the very heart of Longhorn. Under the current Windows file system, each software application partitions its allotted storage space into its own peculiar hierarchy of folders. This makes it nearly impossible, for example, to link a chunk of information such as the name of the author of a Word document with the same person’s address or phone number in Outlook. WinFS, by contrast, has at its core a relational database: an orderly set of tables stored on your hard drive where all the data on your computer can be searched and modified by all applications using a standard set of commands.
If Longhorn includes tools based on Stuff I’ve Seen and allows them to communicate directly with a Web search engine, it could create the “single search box” dreamed of by software makers-the gateway to all the information you need, whether inside your PC or out on the network. Gartner’s Whit Andrews, for one, is looking forward to Microsoft’s new software. “Bring it on!” he says. “I am sitting here looking at my e-mail. If I want to look you up, I’ve got to remember to go Google you. But what I really want is to find out if I have talked with you in the past. So I want to right-click and search globally, search my e-mail and contact folders, search U.S. Search.com [which sells access to information stored in public records]. Who has that advantage? Microsoft is there, and for the low-price stuff that consumers aren’t going to throw a whole lot of money at, they are in a terrific position.”
Q: How Does Question Answering Work?
A: Like This
|1. Question||How many eggs are in a baker’s dozen?|
|2. Rewrite query||“There are” + “eggs in a” + “baker’s dozen”|
“A baker’s dozen has” + “eggs”
“baker’s” + “dozen” + “eggs”
|3. Collect search results and filter (for example, ignore results that do not resemble an answer to a “how many” question)||“A dozen usually has 12 eggs, so how many eggs does a baker’s dozen have?” |
“The Baker’s Dozen Cookbook”
“Why are 13 eggs called a baker’s dozen?”
“13 eggs make a baker’s dozen.”
|4. Extract answers from text |
and present most likely answers
|13 eggs (81 percent likely) |
12 eggs (7 percent likely)