Harvard just put more than 6 million court cases online to give legal AI a boost

Erin Winickarchive page

October 31, 2018

After five years of work, nearly 6.5 million US court cases are now available to access for free online.

The news: The Library Innovation Lab at the Harvard Law School Library has completed its Caselaw Access Project, an endeavour to digitize every reported state and federal US legal case from the 1600s to last summer. The process involved scanning more than 40 million pages.

Why is this needed? One of the biggest hurdles to developing artificial intelligence for legal applications is the lack of access to data. To train their software, legal AI companies have often had to build their own databases by scraping whatever websites have made information public and making deals with companies for access to their private legal files.

What it means: Now that millions of cases are online for free, a good training source will be easily available. Programs will also be able to more easily search case text to provide lawyers with relevant background research for cases. As Adam Ziegler, the managing director of the Library Innovation Lab, told us last year: “I think there will be a lot more experimentation, and the progress will accelerate. It’s really hard to build a smart interface if you can’t get to the basic data.”

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.