Hello,

We noticed you're browsing in private or incognito mode.

To continue reading this article, please exit incognito mode or log in.

Not an Insider? Subscribe now for unlimited access to online articles.

  • Chameleon Design
  • Intelligent Machines

    With this tool, AI could identify new malware as readily as it recognizes cats

    A huge data set will help train algorithms to spot the nasty programs hiding in our computers.

    From ransomware to botnets, malware takes seemingly endless forms, and it’s forever proliferating. Try as we might, the humans who would defend our computers from it are drowning in the onslaught, so they are turning to AI for help.

    There’s just one problem: machine-learning tools need a lot of data. That’s fine for tasks like computer vision or natural-language processing, where large, open-source data sets are available to teach algorithms what a cat looks like, say, or how words relate to one another. In the world of malware, such a thing hasn’t existed—until now.

    This week, the cybersecurity firm Endgame released a large, open-source data set called EMBER (for “Endgame Malware Benchmark for Research”). EMBER is a collection of more than a million representations of benign and malicious Windows-portable executable files, a format where malware often hides. A team at the company also released AI software that can be trained on the data set. The idea is that if AI is to become a potent weapon in the fight against malware, it needs to know what to look for.

    Security firms have a sea of potential data to train their algorithms on, but that’s a mixed blessing. The bad actors who make malware are constantly tweaking their code in an effort to stay ahead of detection, so training on malware samples that are out of date could prove an exercise in futility.

    Sign up for the The Algorithm
    News and views on the latest in artificial intelligence

    By signing up you agree to receive email newsletters and notifications from MIT Technology Review. You can change your preferences at any time. View our Privacy Policy for more detail.

    “It’s a game of whack-a-mole,” says Charles Nicholas, a computer science professor at the University of Maryland, Baltimore County.

    EMBER is meant to help automated cybersecurity programs keep up.

    Instead of a collection of actual files, which could infect the computer of any researcher using them, EMBER contains a kind of avatar for each file, a digital representation that gives an algorithm an idea of the characteristics associated with benign or malicious files without exposing it to the genuine article. 

    This should help those in the cybersecurity community quickly train and test out more algorithms, enabling them to construct better and more adaptable malware-hunting AI.

    Of course, making the data set open for anyone to use could also prove a liability if it were to fall into the wrong hands. Malware creators could use the data to design systems that virus-hunting AI won’t recognize, a problem that Hyrum Anderson, Endgame’s technical director of data science, says the company has thought through. Anderson, who worked on EMBER, says that he hopes the benefits of this openness outweigh the risks. Besides, cybercrime is so lucrative that the people behind malware are already well motivated to keep refining their attack tools.

    “The hacker will find an example anyway,” says Gerald Friedland, a computer science professor at the University of California, Berkeley.

    Couldn't get to Cambridge? We brought EmTech MIT to you!

    Watch session videos here
    More from Intelligent Machines

    Artificial intelligence and robots are transforming how we work and live.

    Want more award-winning journalism? Subscribe to Insider Plus.
    • Insider Plus {! insider.prices.plus !}*

      {! insider.display.menuOptionsLabel !}

      Everything included in Insider Basic, plus the digital magazine, extensive archive, ad-free web experience, and discounts to partner offerings and MIT Technology Review events.

      See details+

      Print + Digital Magazine (6 bi-monthly issues)

      Unlimited online access including all articles, multimedia, and more

      The Download newsletter with top tech stories delivered daily to your inbox

      Technology Review PDF magazine archive, including articles, images, and covers dating back to 1899

      10% Discount to MIT Technology Review events and MIT Press

      Ad-free website experience

    /3
    You've read of three free articles this month. for unlimited online access. You've read of three free articles this month. for unlimited online access. This is your last free article this month. for unlimited online access. You've read all your free articles this month. for unlimited online access. You've read of three free articles this month. for more, or for unlimited online access. for two more free articles, or for unlimited online access.