An AI algorithm inspired by how kids learn is harder to confuse

Karen Haoarchive page

May 6, 2020

Tang Ming Tung / Getty

Information firehose: The standard practice for teaching a machine-learning algorithm is to give it all the details at once. Say you’re building an image classification system to recognize different species of animals. You show it examples of each species and label them accordingly: “German shepherd” and “poodle” for dogs, for example.

But when a parent is teaching a child, the approach is entirely different. They start with much broader labels: any species of dog is at first simply “a dog.” Only after the child has learned how to distinguish these simpler categories does the parent break each one down into more specifics.

Dispelled confusion: Drawing inspiration from this approach, researchers at Carnegie Mellon University created a new technique that teaches a neural network to classify things in stages. In each stage, the network sees the same training data. But the labels start simple and broad, becoming more specific over time.

To determine this progression of difficulty, the researchers first showed the neural network the training data with the final detailed labels. They then computed what’s known as a confusion matrix, which shows the categories the model had the most difficulty telling apart. The researchers used this to determine the stages of training, grouping the least distinguishable categories together under one label in early stages and splitting them back up into finer labels with each iteration.

Better accuracy: In tests with several popular image-classification data sets, the approach almost always led to a final machine-learning model that outperformed one trained by the conventional method. In the best-case scenario, it increased classification accuracy up to 7%.

Curriculum learning: While the approach is new, the idea behind it is not. The practice of training a neural network on increasing stages of difficulty is known as “curriculum learning” and has been around since the 1990s. But previous curriculum learning efforts focused on showing the neural network a different subset of data at each stage, rather than the same data with different labels. The latest approach was presented by the paper’s coauthor Otilia Stretcu at the International Conference of Learning Representations last week.

Why it matters: The vast majority of deep-learning research today emphasizes the size of models: if an image-classification system has difficulty distinguishing between different objects, it means it hasn’t been trained on enough examples. But by borrowing insight from the way humans learn, the researchers found a new method that allowed them to obtain better results with exactly the same training data. It suggests a way of creating more data-efficient learning algorithms.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.