Skip to Content

Data Mining 200 Years of Patent Office Records To Reveal The Nature of Invention

The elaborate records kept by the US Patent Office since 1790 are allowing researchers to study the nature of invention and how it has changed in 200 years.

One way to think about invention is as a process that combines technologies to fulfill some human need or purpose. In other words, inventions never come out of nowhere. They always build on earlier advances to create something new.

So, for example, the incandescent light bulb uses electricity, a heated filament, inert gas and a glass bulb; an inkjet printer relies on the ability to position matter with extreme precision and to pump ink in very small droplets; and the laser is based on the ability to make highly reflective optical cavities and so on. All these inventions stand on the shoulders of previous advances.

That’s why many technologists think about invention as a combinatorial process—a walk through the entire space of technological permutations. To find a new invention, simply combine various old technologies in a new way.

At least, that’s the idea. But how to test the extent to which it is true? Today, we find out thanks to the work of Hyejin Youn at University of Oxford in the UK and a few pals. These guys have studied the nature of invention and say that there is good evidence that it is indeed a combinatorial process, at least in part.

There are work relies on data gathered by the US Patent Office, which uses an elaborate system of technology codes to classify the technologies responsible for an invention’s novelty. Inventions that rely on a single technology have a single code. But those that rely on several technologies are given a combination of codes.

That opens up the possibility of an interesting study, they say. Since the US Patent Office records go back to 1790, it ought to be possible to see how the combination of codes has changed over time. In particular, these records should reveal to what extent invention is the refinement of existing combinations of technologies and to what extent it is the result of new combinations of technologies.

And that’s exactly what these guys have done. “To do this we treat patented inventions as carriers of technologies and avail ourselves of the elaborate system of technology codes used by the US Patent Office to classify the technologies responsible for an invention’s novelty,” say Youn and co.

For each invention, they count the number of technology codes associated with it. That allows them to study the way the number of inventions and the combination of technologies they rely on has changed with time.

So, to what extent do inventions rely on completely new combinations of technology codes? If most inventions were entirely new, the percentage should be high.

On the other hand, if most inventions are merely revised and improved versions of existing technologies, then they would depend on previously existing combinations of technologies.

The results give an interesting insight into this question. They suggest that some 40 per cent of new inventions rely on previously existing combinations of technologies while about 60 per cent introduce entirely new combinations of technologies.

That has important implications. One idea is that new inventions can come about through a random walk through the space of all possible permutations of technologies. But the fact that 40 per cent reuse previously existing combinations suggests that invention is not the result of this kind of random search.

Indeed, Youn and co point out that certain parts of the combinatorial space are excluded for reasons of practicality, thereby ruling out inventions such as exploding prosthetics or espresso-making toothbrushes.

What’s more, certain technology “phenotypes”– particular operating systems, dimensions of roads and so on– limit the types of technologies that can later be useful. And this places another important bound on the types of inventions likely to be useful.

For these and other reasons, the number of inventions is a vastly smaller than the almost infinite space allowed by combinations of technologies. “The huge gap between the possible and the actual number of combinations indicates that only a small subset of combinations become inventions,” they say.

There is an interesting comparison here between the way inventions and DNA–based organisms have evolved. Biological evolution is another combinatorial process that relies on only a small number of building blocks–the protein-coding genes—combined together in lots of different ways. That’s not unlike the way inventions rely on a relatively small number of technologies combined in different ways.

What’s more, biological evolution is path-dependent since the success of an adaptation depends on the order in which it follows other changes. And it is one that is ultimately determined by selection.

Youn and co say there is more work to be done in studying the link between these combinatorial processes. “Studying patent, comparative and systemic records of inventions, will open a way to make quantitative assessments for a counterpart of these features of biological evolution in technological evolution,” they say.

Perhaps. But either way, the use of big data to study the nature of invention has significant potential. There are surely more jewels to be found in them thar hills.

Ref: arxiv.org/abs/1406.2938 : Invention as a Combinatorial Process: Evidence from U.S. Patents

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

This baby with a head camera helped teach an AI how kids learn language

A neural network trained on the experiences of a single young child managed to learn one of the core components of language: how to match words to the objects they represent.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.