Black Hat: Google Glass Can Steal Your Passcodes

Footage of people unlocking their phones can be used to steal mobile passcodes even if the typing can’t be seen.

Tom Simonitearchive page

August 7, 2014

Criticism of Google Glass has often focused on the way its camera makes surreptitious video recording too easy. Now researchers have shown that footage captured by the face-mounted camera could also pose a security threat.

Software developed by the researchers can automatically recover the passcodes of people recorded on video as they type in their credentials, even when the screen itself is not visible to the camera. The attack works by watching the movement of the fingers to work out what keys they are touching. It also works on footage from camcorders, webcams, and smartphones, but Glass offers perhaps the subtlest way to stage it.

The work suggests that “shoulder surfing”—stealing passwords or other data by watching someone at a computer—could become more of a threat as digital cameras and powerful image processing software become more common.

In tests where people stood three meters away from the camera, the software was around 90 percent accurate at capturing four-character-long strings typed on the iPhone’s QWERTY keyboard. The researchers say that the method could theoretically reconstruct a short e-mail or SMS.

“With Glass it’s very sneaky,” says Qinggang Yue, a grad student at the University of Massachusetts, Lowell, who carried out the research with colleagues Xinwen Fu and Zhen Ling.

When Yue met with MIT Technology Review at the Black Hat security conference, where he had presented his findings on Wednesday, he glanced around the busy press room and instantly identified a handful of people pecking away on touch screens that might be vulnerable to such an attack.

Yue has also shown that video footage can be used to recover passcodes at some distance. In one set of experiments, a camcorder held by someone at a first-floor window was used to successfully capture the passcode of someone using an iPad just over 43 meters away. “With a long-focal-length camera it could be much further,” says Yue.

To capture a passcode, the software must identify the position and orientation of a device’s screen as well as the position of a person’s fingertips tapping on it. Yue and colleagues used machine learning to train software to tackle both those problems. The software runs on a PC, so footage captured with Google Glass must be downloaded to extract any passcodes.

The software automatically finds a device captured in a piece of footage. It then identifies the position of its screen’s four corners, and tracks the velocity of a person’s fingertip.

The researchers are currently testing ways to defend against such software-enhanced shoulder surfing. One countermeasure involves randomly swapping the keys on a standard keypad around, so that software can’t correctly translate each tap. Another involves having buttons drift around instead of staying fixed to a standard grid.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.