The US military wants to understand the most important software on Earth

Open-source code runs on every computer on the planet—and keeps America’s critical infrastructure going. DARPA is worried about how well it can be trusted

Patrick Howell O'Neillarchive page

July 14, 2022

Ms Tech

It’s not much of an exaggeration to say that the whole world is built on top of the Linux kernel—although most people have never heard of it.

It is one of the very first programs that load when most computers power up. It enables the hardware running the machine to interact with the software, governs its use of resources, and acts as the foundation of the operating system.

It is the core building block of nearly all cloud computing, virtually every supercomputer, the entire internet of things, billions of smartphones, and more.

But the kernel is also open source, meaning anyone can write, read, and use its code. And that’s got cybersecurity experts inside the US military seriously worried. Its open-source nature means the Linux kernel—along with a host of other pieces of critical open-source software—is exposed to hostile manipulation in ways that we still barely understand.

“People are realizing now: wait a minute, literally everything we do is underpinned by Linux,” says Dave Aitel, a cybersecurity researcher and former NSA computer security scientist. “This is a core technology to our society. Not understanding kernel security means we can’t secure critical infrastructure.”

Now DARPA, the US military’s research arm, wants to understand the collision of code and community that makes these open-source projects work, in order to better understand the risks they face. The goal is to be able to effectively recognize malicious actors and prevent them from disrupting or corrupting crucially important open-source code before it’s too late.

DARPA’s “SocialCyber” program is an 18-month-long, multimillion-dollar project that will combine sociology with recent technological advances in artificial intelligence to map, understand, and protect these massive open-source communities and the code they create. It’s different from most previous research because it combines automated analysis of both the code and the social dimensions of open-source software.

“The open-source ecosystem is one of the grandest enterprises in human history,” says Sergey Bratus, the DARPA program manager behind the project.

“It’s now grown from enthusiasts to a global endeavor forming the basis of global infrastructure, of the internet itself, of critical industries and mission-critical systems pretty much everywhere,” he says. “The systems that run our industry, power grids, shipping, transportation.”

Threats to open source

Much of modern civilization now depends on an ever-expanding corpus of open-source code because it saves money, attracts talent, and makes a lot of work easier.

But while the open-source movement has spawned a colossal ecosystem that we all depend on, we do not fully understand it, experts like Aitel argue. There are countless software projects, millions of lines of code, numerous mailing lists and forums, and an ocean of contributors whose identities and motivation are often obscure, making it hard to hold them accountable.

That can be dangerous. For example, hackers have quietly inserted malicious code into open-source projects numerous times in recent years. Back doors can long escape detection, and, in the worst case, entire projects have been handed over to bad actors who take advantage of the trust people place in open-source communities and code. Sometimes there are disruptions or even takeovers of the very social networks that these projects depend on. Tracking it all has been mostly—though not entirely—a manual effort, which means it does not match the astronomical size of the problem.

Bratus argues that we need machine learning to digest and comprehend the expanding universe of code—meaning useful tricks like automated vulnerability discovery—as well as tools to understand the community of people who write, fix, implement, and influence that code.

The ultimate goal is to detect and counteract any malicious campaigns to submit flawed code, launch influence operations, sabotage development, or even take control of open-source projects.

To do this, the researchers will use tools such as sentiment analysis to analyze the social interactions within open-source communities such as the Linux kernel mailing list, which should help identify who is being positive or constructive and who is being negative and destructive.

The researchers want insight into what kinds of events and behavior can disrupt or hurt open-source communities, which members are trustworthy, and whether there are particular groups that justify extra vigilance. These answers are necessarily subjective. But right now there are few ways to find them at all.

Experts are worried that blind spots about the people who run open-source software make the whole edifice ripe for potential manipulation and attacks. For Bratus, the primary threat is the prospect of “untrustworthy code” running America’s critical infrastructure—a situation that could invite unwelcome surprises.

Unanswered questions

Here’s how the SocialCyber program works. DARPA has contracted with multiple teams of what it calls “performers,” including small, boutique cybersecurity research shops with deep technical chops.

One such performer is New York–based Margin Research, which has put together a team of well-respected researchers for the task.

“There is a desperate need to treat open-source communities and projects with a higher level of care and respect,” said Sophia d’Antoine, the firm’s founder. “A lot of existing infrastructure is very fragile because it depends on open source, which we assume will always be there because it’s always been there. This is walking back from the implicit trust we have in open-source code bases and software.”

Margin Research is focused on the Linux kernel in part because it’s so big and critical that succeeding here, at this scale, means you can make it anywhere else. The plan is to analyze both the code and the community in order to visualize and finally understand the whole ecosystem.

Margin’s work maps out who is working on what specific parts of open-source projects. For example, Huawei is currently the biggest contributor to the Linux kernel. Another contributor works for Positive Technologies, a Russian cybersecurity firm that—like Huawei—has been sanctioned by the US government, says Aitel. Margin has also mapped code written by NSA employees, many of whom participate in different open-source projects.

“This subject kills me,” says d’Antoine of the quest to better understand the open-source movement, “because, honestly, even the most simple things seem so novel to so many important people. The government is only just realizing that our critical infrastructure is running code that could be literally being written by sanctioned entities. Right now.”

This kind of research also aims to find underinvestment—that is critical software run entirely by one or two volunteers. It’s more common than you might think—so common that one common way software projects currently measure risk is the “bus factor”: Does this whole project fall apart if just one person gets hit by a bus?

While the Linux kernel’s importance to the world’s computer systems may be the most pressing issue for SocialCyber, it will tackle other open-source projects too. Certain performers will focus on projects like Python, an open-source programming language used in a huge number of artificial-intelligence and machine-learning projects.

The hope is that greater understanding will make it easier to prevent a future disaster, whether it’s caused by malicious activity or not.

“Pretty much everywhere you look, you find open-source software,” says Bratus.“Even when you look at proprietary software, a recent study showed it’s actually 70% or more open source.”

“This is a critical infrastructure problem,” Aitel says. “We don’t have a grip on it. We need to get a grip on it. The potential impact is that malicious hackers will always have access to Linux machines. That includes your phone. It’s that simple.”

Deep Dive

Computing

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

Taylor Majewskiarchive page

How ASML took over the chipmaking chessboard

MIT Technology Review sat down with outgoing CTO Martin van den Brink to talk about the company’s rise to dominance and the life and death of Moore’s Law.

How Wi-Fi sensing became usable tech

After a decade of obscurity, the technology is being used to track people’s movements.

Meg Duffarchive page

Why it’s so hard for China’s chip industry to become self-sufficient

Chip companies from the US and China are developing new materials to reduce reliance on a Japanese monopoly. It won’t be easy.

Zeyi Yangarchive page

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

The US military wants to understand the most important software on Earth

Threats to open source

Unanswered questions

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Why it’s so hard for China’s chip industry to become self-sufficient

Stay connected

Get the latest updates from
MIT Technology Review

The latest iteration of a legacy

Advertise with MIT Technology Review

About

Help

Threats to open source

Unanswered questions

Deep Dive

Computing

It’s time to retire the term “user”

How ASML took over the chipmaking chessboard

How Wi-Fi sensing became usable tech

Why it’s so hard for China’s chip industry to become self-sufficient

Stay connected

Get the latest updates fromMIT Technology Review

Get the latest updates from
MIT Technology Review