The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
Smart search: Yolink attempts to identify relevant information on Web pages. This example shows terms picked out on Craigslist.
TigerLogic Corporation
A new service mines the contents of Web pages looking for meaning and relevance.
Search is one of the main tasks performed online, and yet often it doesn't work as well as it should. Take, for example, the common experience of "back-clicking," when a user has to return to the results page several times before finding the information she's looking for. According to a 2009 comScore survey, 30 percent of searches are abandoned in frustration, and two-thirds of the rest required users to refine their queries before getting the desired result.
A new product called Yolink, which launched this week, aims to help users figure out which search results are most relevant. It does this by looking at the contents of the Web pages that a list of search result link to. The company bills itself as a step toward semantic search, because it attempts to find meaning in the contents of a Web page. And it can do this even though most pages aren't marked up in the formats typically used to help machines interpret content. The product is made by TigerLogic, a company based in Irvine, CA.
Yolink's technology is designed to look past the links on a search results page and perform the next few steps in assessing the value of information for the user. Brian Cheek, vice president of business development for TigerLogic, explains that the company has built several demonstrations of how its technology can be used to enhance the search functionality on a site. These include one for Craigslist that pulls out data from apartment rental listings such as whether a unit allows pets or has parking, using this to supplement the results that the site already returns.
Jeff Dexter, who is director of product development and technology chief architect at Yolink, explains that semantic search typically calls for the text on a page to be marked up by the publisher with tags that help identify different types of information. Yolink is designed to deal with data that hasn't been marked up. Dexter says the company tried to establish a common model for how data is typically presented across a website. That model is used to identify particular pieces of data.
Yolink will analyze a page--such as a Craigslist listing--looking for clues as to where certain pieces of content are shown. It might notice, for example, that location and price information tend to be near each other. Search engines such as Google present snippets of pages beneath the links they provide to perform a similar service for the user, but Dexter says Yolink's results go beyond simply identifying keywords and surfacing them.
While I applaud anyone working to make search more semantically sophisticated, there is never going to be a great solution to the problem of irrelevant search results for two reasons.
The first is that there is an entire industry of very intelligent people actively working to sabotage search result relevancies for the profit of their clients - the SEO industry. This is an arms race without an end.
The second is frankly, that most searchers are stupid, umm, maybe I should say naive. They have such limited command of their own language that they don't perceive how the word (often searches are a single word!) or words they type could have multiple meanings (English is notorious for this). There is just no way a search engine can be smart enough to read their minds. The only solution for this is generally considered unacceptable by many people, which is for the search engine to track their individual search/click behavior over time to personalize its interpretation of their searches in order to gain a contextual understanding of what they want (e.g. whenever I search "dolphins", I'm always interested in the animal, never the football team).
As a librarian who struggles to improve our little search engines (eg library catalog), I am very grateful that I don't have to deal with someone actively trying to manipulate my indexing data like the SEO folks. And as a reference librarian, I see university level students type in the most unreasonable search strings all the time, unreasonable that is to expect that they'll get relevant results given the intent that they just articulated to me as we work together. Library science folks have pretty much reached the conclusion that we have to offer them what we call a faceted approach, which others may think of as a guided drill down, to help them refine their first unhelpful set of keyword search results with suggestions of new terms and other limiters (eg date of publication, language, geographic region) to narrow in on what they want. You see this on Amazon all the time, those boxes on the left that help you narrow your product results by brand, price range, range of GB size (for storage devices), etc.
I'm surprised Google hasn't found a way to do this yet, although since they (unlike librarians and Amazon) are working with an entirely uncontrolled index set, it's an enormously more difficult problem for them, but still I expect they'll figure it out, hopefully soon.
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
Our list of the 50 most innovative companies, including the following:
smithsomian
182 Comments
not a competitor to Google?
don't quite buy that. Google is certainly working on similar enhancements to their search products, so whether these chaps see themselves as comptetition to Google or not, I would imagine Google will see it that way.
Reply