Andrew Weinert and his colleagues were deeply frustrated. After Hurricane Maria struck Puerto Rico, the researchers from MIT’s Lincoln Laboratory were hard at work trying to help the Federal Emergency Management Agency (FEMA) assess the damage. In hand they had the perfect data set: 80,000 aerial shots of the region taken by the Civil Air Patrol right after the disaster.
But there was an issue: there were too many images to sort through manually, and commercial image recognition systems were failing to identify anything meaningful. In one particularly egregious example, ImageNet, the golden standard for image classification, recommended labeling an image of a major flooding zone as a toilet.
“There was this amazing information content, but it wasn’t accessible,” says Weinert.
They soon realized this problem isn’t unique. In any large-scale disaster scenario, teams of emergency responders like FEMA could save significant time and resources by reviewing details of on-the-ground conditions before their arrival. But most computer vision systems are trained on regular day-to-day imagery, so they can’t reliably pick out relevant details in disaster zones.
The realization compelled the team to compile and annotate a new set of photos and footage specific to emergency response scenarios. They released the data set along with a paper this week in the hopes that it will be used to train computer vision systems in the future.
The data set includes over 620,000 images and 96.5 hours of video that encompass imagery from all 50 states of the US. Most of the media were sourced from government databases or Creative Commons videos on YouTube; a small fraction were also filmed by the Lincoln Lab staff themselves.
To make it genuinely useful to emergency responders, the researchers considered various emergency scenarios that were likely to trip up common image classification systems. They compiled images of cars in flooded waters, for example; most systems would see the water and immediately label the vehicle as a boat—simply as a symptom of their training data.
They also spent a significant amount of time figuring out the best way to annotate the images. They wanted the annotations to offer emergency responders useful context for their missions, and also needed the annotation scheme to be simple enough for data labelers to perform quickly with minimal errors. So they mimicked ImageNet’s organizational structure, which groups photos into increasingly specific categories of objects, like animal, dog, then labrador retriever. Rather than object categories, however, the researchers clustered photos based on increasingly specific disaster characteristics: Is there damage? Yes or no? Is there water? Yes or no? Should the water be there? Yes or no?
Such qualifications will allow computer vision researchers to easily sort through the data set and pick relevant segments to train disaster-related image recognition systems. Those systems would then help an emergency responder quickly process imagery from new disaster scenarios to gain a sense of the worst areas of impact, the kinds of on-the-ground conditions to expect, and what supplies to prepare for their mission.
Weinert says it’s still a work in progress, but he’s excited about its potential. “If we could figure out a way to say, ‘This is how you should qualify disaster response imagery,’ Amazon, Task Rabbit, and all the other cloud source entities” could start using it as an industry standard, he says, and start developing more disaster-aware recognition systems.
The researchers are now making the data set available to the National Institute of Standards and Technology and have begun to work with other organizations to set up image recognition competitions around its use. “We’re looking at ways to get this into the hands of computer vision researchers,” Weinert says.
To have more stories like this delivered directly to your inbox, sign up for our Webby-nominated AI newsletter The Algorithm. It's free.