With Fire Phone, Amazon Could Popularize Visual Search

The Fire Phone’s Firefly feature could make visual search easier, better, and more popular.

Rachel Metzarchive page

June 19, 2014

Amazon is evidently on a quest to make it as fast as possible to buy whatever you want, whenever you want it (whether you need it is another matter), and the smartphone that the online retailer unveiled yesterday is its newest tool for making that happen. In doing so, however, it may also be creating a powerful new mobile search engine that could evolve into a simpler way to find all sorts of information on the fly.

During yesterday’s event in Seattle, Amazon founder and CEO Jeff Bezos introduced the Fire Phone—a black handset that runs a modified version of Google’s Android system and looks pretty similar to countless others already on the market. The phone, which will be available on July 25, is initially available only through AT&T, and will cost $199 with a two-year contract.

A few features do set the Fire apart, most notably a scanning technology called Firefly, which lets you not only shoot pictures of QR codes and bar codes to find products, but capture images, video, and audio of all kinds of things you might want to buy. (In a demo, most of the physical objects Bezos showed were either flat or mostly flat with some curvature and featured some text.) Capture a few seconds of a song, or a movie, or take a snapshot of a book, and up pops some information such as the name of the song you’re hearing and its Amazon customer rating. The feat is not revolutionary on its own, but Amazon is combining numerous image, sound, and text recognition technologies in one feature, and in a demonstration it appeared to work quite rapidly.

To make Firefly work, Amazon is matching what the phone’s camera sees with information from its database of products. And, interestingly, it’s allowing developers to use Firefly in their own apps. That could mean anything from multimedia scavenger hunts to faster access to nutritional data. Bezos used the example of an app called MyFitnessPal, which could use Firefly to recognize something like a bag of Cheetos and show its nutrition data. Bezos also indicated that developers could use Firefly with their own image-recognition technology and databases of known objects.

The high-end smartphone market is already crowded, but given the rise of mobile e-commerce, it’s a plunge worth taking (see “Why Amazon Needs Its Own Phone”). It’s also clear that Amazon intends for Firefly to help it sell more stuff: 70 million of the more than 100 million things Firefly can currently recognize are products like books and video games, and the rest are songs, which you’ll be able to order on Amazon.com or add to your Amazon wish list. But beyond perhaps changing how we shop, the feature could change how we search, encouraging us to use images to find out more about the world around us—rather than typing words into a search box.

“I think it will be a very addictive capability,” says Ramneek Bhasin, general manager of mobile and vice president of product for shopping search engine TheFind.

Bhasin is interested in using Firefly to expand the TheFind’s search capabilities. The company already includes barcode scanning in its mobile apps, for instance, but he imagines non-shopping scenarios where it could be useful for finding information. In museums it could pull up Wikipedia articles when focused on a piece of art (Amazon says that it will add image-recognition for artwork to Firefly later in the year).

Nick Shiftan, chief technical officer and cofounder of Curalate, a startup that uses image recognition to tell companies when their products appear on social networks, thinks that having a physical button to access Firefly on the Fire Phone will help popularize visual search simply by making it easier to access. To use anything similar, users currently have to load a third-party app.

“I don’t really know quite yet what the long-term use case is going to be here, but I think we now assume everyone’s phone can recognize a song you hear on the radio or a song you hear at the bar, and I think we’ll grow to expect the same thing from visual search as well,” he says.

That may be, but a lot of work still needs to happen for this to become reality. While a smartphone may be able to recognize somewhat flat items like books, it’s still very difficult to discern objects like a purse or a stuffed animal. That’s because there are all sorts of factors to consider when capturing an image of the object, such as angles, shadows, and lighting, so that it can be matched up against known objects in a database. And it can get trickier—and slower—to determine a match with authority as the database gets larger.

“There’s still a gap between science and fiction there; what we’d like to do and what the state-of-the-art allows,” Shiftan says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.