While object recognition in video has rapidly advanced, scene detection, or knowing what’s actually happening on screen, has lagged behind. But being able to analyze and recognize actions in footage could prove useful for applications like video editing. So Amir Ziai, a Stanford student at the time of research and now a senior data scientist at Netflix, took it upon himself to advance the state of the art, specifically in detecting Hollywood kissing scenes. The study may seem rather light-hearted or silly, but it has important implications.
Ziai selected a subset of 100 movies and labeled their various non-kissing and kissing scenes between 10 and 20 seconds in length. He then extracted image and audio stills for every second of each scene, and used them to train a machine-learning algorithm. The resulting model was able to identify which seconds depicted kissing and group them into scenes, achieving a high level of accuracy.
The study shows how quickly the means of analyzing footage for specific, even intimate, actions have advanced. Couple that with surveillance footage, and the implications quickly turn Orwellian. In fact, in a new report, the ACLU sounded the alarm on a future in which camera owners would be able to rapidly identify unusual behavior or seek out embarrassing moments. Like deepfakes, it’s yet another example of a situation where technologists should think about the consequences of their work.
To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.