Skip to Content
Artificial intelligence

Unlimited computer fractals can help train AI to see

Large datasets like ImageNet have supercharged the last 10 years of AI vision, but they are hard to produce and contain bias. Computer generated datasets provide an alternative.

fractal
fractal
Ms Tech | Unsplash

Most image-recognition systems are trained using large databases that contain millions of photos of everyday objects, from snakes to shakes to shoes. With repeated exposure, AIs learn to tell one type of object from another. Now researchers in Japan have shown that AIs can start learning to recognize everyday objects by being trained on computer-generated fractals instead.

It's a weird idea but it could be a big deal. Generating training data automatically is an exciting trend in machine learning. And using an endless supply of synthetic images rather than photos scraped from the internet avoids problems with existing hand-crafted data sets.

Training trouble: Pretraining is a phase in which an AI learns some basic skills before being trained on more specialized data. Pretrained models allow more people to use powerful AI. Instead of having to train a model from scratch, they can adapt an existing one to their needs. For example, a system for diagnosing medical scans might first learn to identify basic visual features, such as shape and outline, by being pretrained on a database of everyday objects—such as ImageNet, which contains more than 14 million photos. Then it will be fine-tuned on a smaller database of medical images until it recognizes subtle signs of disease.

The trouble is, assembling a data set like ImageNet by hand takes a lot of time and effort. The images are typically labeled by low-paid crowdworkers. Data sets might also contain sexist or racist labels that can bias a model in hidden ways, as well as images of people who have been included without their consent. There’s evidence these biases can creep in even in pretraining.

Natural forms: Fractals can be found in everything from trees and flowers to clouds and waves. This made the team at Japan’s National Institute of Advanced Industrial Science and Technology (AIST), the Tokyo Institute of Technology, and Tokyo Denki University wonder if these patterns could be used to teach an automated system the basics of image recognition, instead of using photos of real objects.

The researchers created FractalDB, an endless number of computer-generated fractals. Some look like leaves; others look like snowflakes or snail shells. Each group of similar patterns was automatically given a label. They then used FractalDB to pretrain a convolutional neural network, a type of deep-learning model commonly used in image-recognition systems, before completing its training with a set of actual images. They found that it performed almost as well as models trained on state-of-the-art data sets, including ImageNet and Places, which contains 2.5 million images of outdoor scenes. 

Does it work? Anh Nguyen at Auburn University in Alabama, who wasn’t involved in the study, isn’t convinced that FractalDB is yet a match for the likes of ImageNet. He has studied how abstract patterns can confuse image recognition systems. “There is a connection between this work and examples that fool machines,” he says. He would like to explore how this new approach works in more detail. But the Japanese researchers think that with tweaks to their approach, computer-generated data sets like FractalDB could replace existing ones.  

Why fractals: The researchers also tried training their AI using other abstract images, including ones produced using Perlin noise, which creates speckled patterns, and Bezier curves, a type of curve used in computer graphics. But fractals gave the best results. “Fractal geometry exists in the background knowledge of the world,” says lead author Hirokatsu Kataoka at AIST.

Deep Dive

Artificial intelligence

chasm concept
chasm concept

Artificial intelligence is creating a new colonial world order

An MIT Technology Review series investigates how AI is enriching a powerful few by dispossessing communities that have been dispossessed before.

open sourcing language models concept
open sourcing language models concept

Meta has built a massive new language AI—and it’s giving it away for free

Facebook’s parent company is inviting researchers to pore over and pick apart the flaws in its version of GPT-3

spaceman on a horse generated by DALL-E
spaceman on a horse generated by DALL-E

This horse-riding astronaut is a milestone in AI’s journey to make sense of the world

OpenAI’s latest picture-making AI is amazing—but raises questions about what we mean by intelligence.

labor exploitation concept
labor exploitation concept

How the AI industry profits from catastrophe

As the demand for data labeling exploded, an economic catastrophe turned Venezuela into ground zero for a new model of labor exploitation.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.