Computer vision has come a long way since Imagenet, a large, open-source data set of labeled images, was released in 2009 for researchers to use to train AI—but images with tricky or bad lighting can still confuse algorithms. Researchers have either tried to employ hand-crafted rules about how light interacts with objects or used a data set that covers as many lighting situations as possible. But there is a nearly limitless combination of items and light in the real world, handicapping both approaches.
A new paper by researchers from MIT and DeepMind details a process that can identify images in different lighting without having to hand-code rules or train on a huge data set. The process, called a rendered intrinsics network (RIN), automatically separates an image into reflectance, shape, and lighting layers. It then recombines the layers into a reconstruction of the original image.
To train RIN, the researchers created a data set of five shapes—cubes, spheres, cones, cylinders, and toruses—and rendered each with 10 different orientations and 500 different colors. As a proof of concept, the researchers showed how breaking down an image into the three layers could help a computer identify what an item in an image is, or infer its shape. For example, the model learned to spot much more complicated items—like the classic image test models Stanford bunny, Utah teapot, and Blender’s Suzanne—after being trained on the basic sample shapes, without ever seeing labeled examples.
Beyond offering a new way to overcome the problem of infinite lighting situations for an image, RIN is also an example of learning with unlabeled data. Most AI still needs labeled data to learn, and preparing it takes hours of repetitive human labor. Finding a way to learn from unlabeled data is one of the next frontiers in artificial intelligence.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
Sam Altman invested $180 million into a company trying to delay death
Can anti-aging breakthroughs add 10 healthy years to the human life span? The CEO of OpenAI is paying to find out.
ChatGPT is about to revolutionize the economy. We need to decide what that looks like.
New large language models will transform many jobs. Whether they will lead to widespread prosperity or not is up to us.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.