Hao Li remembers watching Jurassic Park as a kid: “That moment of seeing something that didn’t exist in reality, but it looked so real—that was definitely the one that made me think about doing this,” he says. Li tells me the story one afternoon while we dine at the cafeteria of Industrial Light & Magic, the famed San Francisco visual-effects studio where he has been working on a way to digitally capture actors’ facial expressions for the upcoming Star Wars movies. When Jurassic Park came out, Li was 12 years old and living in what he calls the “boonie” town of Saarbrücken, Germany, where his Taiwanese parents had moved while his father completed a PhD in chemistry. Now, 20 years later, if all goes to plan, Li’s innovation will radically alter how effects-laden movies are made, blurring the line between human and digital actors.
Visual-effects artists typically capture human performances through small balls or tags that are placed on an actor’s face and body to track movement. The data capturing the motion of those markers is then converted into a digital file that can be manipulated. But markers are distracting and uncomfortable for actors, and they’re not very good at capturing subtle changes in facial expression. Li’s breakthrough involved depth sensors, the same technology used in motion gaming systems like the Xbox Kinect. When a camera with depth sensors is aimed at an actor’s face, Li’s software analyzes the digital data in order to figure out how the facial shapes morph between one frame and the next. As the actor’s lips curl into a smile, the algorithm keeps track of the expanding and contracting lines and shadows, essentially “identifying” the actor’s lips. Then the software maps the actor’s face onto a digital version. Li’s work improves the authenticity of digital performances while speeding up production.
Li is amiably brash, unembarrassed about proclaiming his achievements, his ambitions, and the possibilities of his software. His algorithm is already in use in some medical radiation scanners, where it keeps track of the precise location of a tumor as a patient breathes. In another project, the software has been used to create a digital model of a beating heart. Ask him if his technology can be used to read human emotions or if he’ll find some other far-off possibility, and he’s likely to say, “I’m working on that, too.”
When I ask if he speaks German, Li smiles and says he does—“French, German, Chinese, and English.” This fall, he will begin working in Los Angeles as an assistant professor in a University of Southern California computer graphics lab. But Hollywood movies are not the end game. “Visual effects are a nice sandbox for proof of concepts, but it’s not the ultimate goal,” Li says. Rather, he sees his efforts in data capture and real-time simulation as just a step on the way to teaching computers to better recognize what’s going on around them.
Problem: Fraud over the telephone costs banks and retailers more than $1.8 billion a year. Criminals who call customer service lines pretend to be legitimate customers and often dupe the operators into approving a transfer or divulging sensitive account information.
Solution: Vijay Balasubramaniyan can detect where a call is coming from by analyzing its audio quality and the noise on the line. If a call purportedly from one place has the audio signature of a call from the other side of the world, his technology can sound an alert. The company he founded, Pindrop Security, counts several banks and an online brokerage firm as customers.
The audio quality of a phone call is affected in subtle ways by many factors, including the networks and cables it travels through. Pindrop makes hundreds of phone calls per hour to build a database of what, for example, a cell phone on a particular network in India sounds like. The service can then compare those files with the audio patterns in calls to customer service centers to determine whether a call is coming from where it says it is.
David Fattal, a French-born quantum physicist who is now a researcher at HP Labs, is a master of nanoscale light tricks, and the feat he unveiled this year is his most impressive yet. It’s a new kind of display that can project colorful moving images, viewable in three dimensions from multiple angles without any special glasses.
Fattal’s invention, which he calls a “multidirectional backlight,” consists of a thin piece of glass (or plastic) with light-emitting diodes mounted on its edge. Thanks to its particular design, which governs the angle at which the light is propagated, the device takes advantage of total internal reflection—the same optical phenomenon used in fiber optics.
Light from the LEDs doesn’t escape from the material until it hits nanoscale features etched or imprinted on the surface—what Fattal calls “directional pixels.” Composed of grooves smaller than the wavelength of the light, the pixels allow for precise control over the direction in which individual rays are scattered, shooting the different colors of light in specific directions. The result is colorful images that “seem to come from nowhere,” says Fattal.
In a paper published in Nature in March, Fattal and colleagues presented prototypes capable of projecting static and moving images viewable from 200 angles. They performed the trick by overlaying their novel backlight with an ink-printed mask that blocked certain colors and allowed others through. One of the first images they produced was that of a turtle hovering immediately above the glass. Fattal has also used a modified liquid crystal display to produce simple moving images.
Since the setup creates realistic, hologram-like 3-D images without the need for bulky optical equipment, it could be attractive for use in smartphones, tablets, smart watches, and other mobile devices.
Projecting high-quality images, however, will require much larger and more complicated pixel arrays and advanced mechanisms for handling a huge number of data-rich images quickly. And creating 3-D content that can be enjoyed from all the many vantage points accommodated by this technology will be no small task either. But in his ingenious use of nanotechnology, Fattal has given us the possibility of seeing images and videos in a whole new light.
Christine Fleming is trying to give cardiologists a powerful new tool: high-resolution movies of the living, beating heart, available in real time during cardiac procedures. Such technology might also one day help physicians pinpoint the source of dangerous irregular heart rhythms without invasive biopsies. It could even help monitor treatment.
Her invention uses optical coherence tomography (OCT), a technique that captures three-dimensional images of biological tissue. A specialized catheter with a laser and small lens near its tip is threaded through the arteries. When the laser light reflects off the heart tissue, it is picked up and analyzed to create an image. OCT has a higher resolution than ultrasound and captures images faster than magneticresonance imaging, or MRI. But today OCT has limited cardiac application—usually to search the arteries for plaques. Fleming, an electrical engineer who joined the faculty at Columbia University this year, has designed a new type of catheter capable of imaging heart muscle.
One of the primary uses of the technology will be to locate, and monitor treatment for, irregular heart rhythms that are typically caused by disruption of the heart’s regular tissue structure. In patients with arrhythmias, which can lead to heart failure, surgeons often burn away the affected tissue with targeted radio-frequency energy. Currently they perform the procedure somewhat blind, using their sense of touch to determine when they have come in contact with the muscle wall. “Since the physician doesn’t have a view of the heart wall, sometimes the energy is not actually being delivered to the muscle,” says Fleming, who adds that the procedure can last for hours. Fleming has shown in animal tests that her catheter, which uses a novel forward-facing lens, can successfully monitor the ablation in real time. Algorithms that help distinguish untreated from treated tissue offer further guidance.
Fleming is also developing algorithms to help improve the detection of arrhythmias by precisely measuring the three-dimensional organization of heart muscle. The technique works best when the tissue has been chemically treated to make it clearer, and thus easier to image. But her team at Columbia is now improving the algorithms so that the method works without this treatment. She hopes that in time the technology could supply an alternative to invasive biopsies, which are sometimes used to diagnose unexplained arrhythmias or to monitor heart health after transplants.
Fleming’s arrival at Columbia earlier this year was something of a homecoming. As a high-school student in New York City, she interned at the NASA Goddard Institute for Space Studies, which is down the street from her current lab. But in the intervening years her engineering interests have increasingly become tied to medicine; her inspiration for studying the electrical properties of the heart came when she studied electrical engineering and computer science as an undergraduate at MIT. Working with physicians is especially exciting, she says, because “you get the sense that one day your technology will be used.”
Markus Persson—better known as Notch to his millions of followers—is an unlikely technology megastar. A quiet, unassuming Swede, he looks like the typical video-game programmer, with thinning hair and a thickening torso; his defining features are twin dimples when he smiles and a jet-black fedora, an accessory he is rarely seen without. But Minecraft, an independent video game he created and released on the Internet in May 2009, has sold 30 million copies, making him rich and famous.
Persson is now a hero to a generation of young game players, who hang on his every tweet. Last year he earned more than $100 million from Minecraft and its associated merchandise. But the programmer appears largely unchanged by the money. While he routinely travels by private jet and is well-known for hosting lavish parties in Minecraft’s name, his main material indulgence is ensuring he always has the latest computer.
Though Persson might be little changed by success, Minecraft has transformed video games. A rudimentary-looking Java game that doesn’t require the latest computer to run, it places its player in the middle of a pastoral landscape that represents a unique and randomly generated world. Trees, sand, gravel, and rocks are each represented by a different type of block, and these can be harvested and subsequently “crafted” into different objects and tools. One mouse button is used to harvest the blocks, the other to place them. In this way players are able to shape the game’s world to suit their whims. The blocks can be rearranged to create structures and settlements as elaborate as the player’s imagination permits.
Persson believes his success is a once-in-a-lifetime event, a freakish hit of the sort that strikes some creative people with unrepeatable fortune. Minecraft’s popularity has brought unfamiliar attention to the designer, whose every idea is now pored over by a watching world. Regardless of the scrutiny and accompanying creative jitters, Persson continues to be a prolific and ambitious game inventor. His next project is a resource-trading game set in space.
Three decades ago, the availability of many versions of DOS helped spark the boom in personal computers. Today, Robot Operating System, or ROS, is poised to do the same for robots. Morgan Quigley programmed the first iteration of what grew into ROS as a graduate student in 2006, and today his open-source code is redefining the practical limits of robotics. Since version 1.0 was released in 2010, ROS has become the de facto standard in robotics software.
To visit Quigley’s office at the Open Source Robotics Foundation in Mountain View, California, the organization he cofounded last summer to steward ROS, is to step into a future of robotics where hardware is cheap, and it’s quick and easy to snap together preëxisting pieces to create new machines. Quigley’s workspace is littered with dozens of mechanical fingers—modules that form a robotic hand. “The hands themselves can talk ROS,” Quigley says. His T-shirt is emblazoned with a programming joke: shirtcount++;.
Unlike more conventional robotic technology, Quigley’s four-fingered hand is not controlled by a central processor. Its fingers and palm distribute computing chores among 14 low-cost, low-power processors dedicated to controlling each joint directly. That greatly simplifies the internal communication and coördination required to execute a task such as picking up a pencil. Both the software and electronics are open source. Any robot builder can take Quigley’s design and use or improve upon it.
Ultimately, Quigley hopes, these innovations will lead to more agile, more capable robots that can perform a variety of jobs and don’t cost tens or hundreds of thousands of dollars. And no longer will engineers have to start from scratch to design the functions that go into a robot—they’ll have an open-source base of code and hardware. Already, engineers using ROS are working on robots that do everything from folding laundry to repetitive operations in advanced manufacturing. “It will allow applications we couldn’t dream of before,” Quigley says.
Unlike many children of the 1980s and 1990s, Quigley wasn’t enthralled by Star Wars’ C-3PO or Star Trek: The Next Generation. Rather, he was mesmerized by the far more mundane but real Apple II computer at his elementary school. In class, he typed commands in the Logo language to move an animated turtle around the screen—the ancestor of ROS’s turtle mascot. But it wasn’t until 1998, when he entered Brigham Young University in Provo, Utah, that he encountered robots. He was hooked. “Robots are the meeting place between electronics, software, and the real world,” he says. “They’re the way software experiences the world.”
When he arrived at Stanford for graduate study in machine learning, Quigley joined Andrew Ng’s lab, where the students were collaborating on the Stanford Artificial Intelligence Robot, or STAIR. Typical industrial robots execute a single tightly defined task in a controlled environment, like an advanced automobile factory. Ng, however, envisioned a general-purpose robot that could execute diverse tasks in an uncontrolled environment. The signature STAIR challenge was getting the robot to respond productively to the request “Fetch me a stapler.” To bring back the stapler, STAIR needed to understand the request, navigate hallways and elevators to an office, open the door, make its way to the desk, identify a stapler among other items of roughly the same size, pick it up, bring it back, and hand it off.
As Ng’s teaching assistant, Quigley realized that the class needed a software framework that could integrate contributions from a few dozen students working asynchronously without bringing down the robot when one of their programs crashed. ROS was his solution: a distributed peer-to-peer system designed to connect all the resources—technological and human—required to make a robot work.
In 2007, he began collaborating with Willow Garage, a Silicon Valley company that works on robots and open-source software. For the next two years, Quigley oversaw the ROS architecture while Willow Garage’s programmers extended his initial work. Released in 2010, ROS quickly became the dominant software framework for robotics.
Despite its name, ROS isn’t really an operating system. It’s a framework that enhances conventional operating systems (theoretically, any OS; in practice, Linux). It provides software modules for performing common robotics functions such as motion planning, object recognition, and physical manipulation. So if you want a robot to map its surroundings, you don’t have to write that code; you can simply plug in the ROS software module. As an open-source product that can be freely modified, it attracts a community of users who are constantly improving and extending its capabilities.
Any number of independent modules can run at a given time. Modules can be connected for testing, disconnected for debugging, and reinstated without destabilizing the network as a whole. In this way, ROS allows a robot to be controlled by any number of computers running any number of programs—a laptop focusing on navigation, a server performing image recognition, an Android phone issuing high-level instructions. It all happens in real time as the robot wanders about.
The masterstroke in Quigley’s design is not strictly technical but social. Members of the community who produce a finished release can distribute it themselves, rather than having to house it on central servers. “That’s a big deal in terms of giving people the credit they deserve and allowing them to control their contributions,” Quigley says. “Their code isn’t lost in this beast called ROS.”
Quigley’s ambition is to make ROS a productive starting point for any kind of robotic system—large or small, expensive or cheap, academic or commercial, networked or stand-alone.
Adapting ROS for low-cost processors is critical if the software is to play a key role in next-generation designs. Cheap processors are becoming more capable, opening an opportunity to bring the intelligence that has been concentrated in desktop-class processors to the CPUs that manage robotic wheels, joints, and cameras. Where image recognition was once a function of a rack of servers, soon it might be managed within the camera.
Quigley also wants ROS, which was designed to control one robot at a time, to move into environments that use multiple robots. Settings such as warehouses or factory floors would benefit from squadrons of them operating in a coördinated way. Beyond that, it’s not hard to imagine robot fleets managed in the cloud: users could send ROS commands to a data center and from there to an automaton. “ROS might tie into an online knowledge base,” Quigley says, “so if someone says, ‘Get the stapler off my desk,’ it might retrieve a CAD model of a stapler from the cloud.”
In 2012, when Cuba suffered its first outbreak of cholera in 130 years, the government and medical experts there were shocked. But software created by Kira Radinsky had predicted it months earlier. Radinsky’s software had essentially read 150 years of news reports and huge amounts of data from sources such as Wikipedia, and spotted a pattern in poor countries: floods that occurred about a year after a drought in the same area often led to cholera outbreaks.
The predictions made by Radinsky’s software are about as accurate as those made by humans. That digital prognostication ability would be extremely useful in automating many kinds of services.
Radinsky was born in Ukraine and immigrated to Israel with her parents as a preschooler. She developed the software with Eric Horvitz, co-director at Microsoft Research in Redmond, Washington, where she spent three months as an intern while studying for her PhD at the Technion-Israel Insitute of Technology.
Radinsky then started SalesPredict, to advise salespeople on how to identify and handle promising leads. “My true passion,” she says, “is arming humanity with scientific capabilities to automatically anticipate, and ultimately affect, future outcomes based on lessons from the past.”
You and Tony Fadell, one of the creators of the iPhone and iPod, started Nest after both of you left Apple. Wasn’t being in charge of iPod and iPhone software development your dream job?
I had a Mac Plus when I was three years old, and I loved Apple as a company. I flew out to California [from Gainesville, Florida] with my grandparents on my 13th birthday to go out to Cupertino. And I told my grandparents then, “Yeah, I’m going to work at Apple, for sure.”
Then why leave at just 26 years old?
Basically, I pushed as hard as I could, worked incredibly hard, built tons of stuff, built teams, built products, and loved it. But somewhere around my four-and-a-half-year anniversary at Apple, we were working on another generation of iPods and another generation of iPhones and starting work on the third generation of iPads, and I was ready for something new.
Going from smartphones to smart thermostats isn’t an obvious jump.
Tony and I had lunch back in October of 2009. I told Tony, “I’m thinking about leaving Apple; I’m thinking about starting my own company, and I’m looking at smart-home stuff.” And he stops me right there. He goes, “You know what? A smart home is for geeks. No one wants a smart home—it’s a stupid idea. Focus on doing one thing and doing it really well.”
Programmable thermostats existed before the Nest, but they were awful.
The programming was tough. They were like the early ’80s VCRs, where you’d push a button 15 times to change it to Tuesday and change the temperature there. Part of it is that the product was designed to be sold to a contractor and not designed for a user.
In contrast, the Nest is a lot like the iPhone—it’s easy to figure out how to use.
The product that we built is basically a smartphone on the wall.
And there’s nothing I have to push 15 times. There aren’t even any buttons—you just turn the entire metallic case. That’s very Apple-like.
When we were building Nest, we were going to build it like any great product and design company. You’d have great industrial design, great hardware engineering, great software engineering, great services, great consumer marketing—all those things.
One way the Nest saves energy is by detecting when no one’s home. But there’s got to be much more you can do on the back end, to make plans based on weather forecasts and other data.
There’s always more. Since we’ve launched the product, we’ve done something like 21 software updates, of which I’d say five or six have included major energy-saving algorithm improvements, and we’re always finding more. The more detail we have, the more users we work with, the more homes we’re in, the more we’re learning. It’s a very long tail of things we could be doing. You can see multiple products.
Which brings us back to your original ideas about a smart home. The Nest could become a hub for controlling many things, not just heating and cooling.
It could be, yes.
But yet you guys say only geeks want smart homes—
Wait, wait. I don’t believe in networking connectivity just for the sake of having things connected. There’s got to be a really good reason why you’d want to do it. You don’t want to put networking in your microwave oven. What would it do?
So what does make sense? What might a home in the future do differently?
Today when you arrive home, the Nest sensor sees you and starts cooling your home so you’re comfortable. And if you extrapolate to the future, you’re driving home from work; your phone knows that you’re driving home, or your car itself knows you’re driving home, and lets Nest know, “Matt will be home in 15 minutes; we’ll start preparing the home for his arrival.” And then, as you get closer to the door, things might change—it might turn the song list on and play my favorite music, or turn the lights on—or, when I leave, do the opposite.
That sounds like a geek dream to me—less about reducing energy than increasing comfort.
These things go hand in hand, actually. Part of the promise of Nest is that we’re going to keep your home comfortable, and may actually even make you more comfortable, while also helping you save.