MIT Technology Review Subscribe

Harvard just put more than 6 million court cases online to give legal AI a boost

After five years of work, nearly 6.5 million US court cases are now available to access for free online.

The news: The Library Innovation Lab at the Harvard Law School Library has completed its Caselaw Access Project, an endeavour to digitize every reported state and federal US legal case from the 1600s to last summer. The process involved scanning more than 40 million pages.

Advertisement

Why is this needed? One of the biggest hurdles to developing artificial intelligence for legal applications is the lack of access to data. To train their software, legal AI companies have often had to build their own databases by scraping whatever websites have made information public and making deals with companies for access to their private legal files.

This story is only available to subscribers.

Don’t settle for half the story.
Get paywall-free access to technology news for the here and now.

Subscribe now Already a subscriber? Sign in
You’ve read all your free stories.

MIT Technology Review provides an intelligent and independent filter for the flood of information about technology.

Subscribe now Already a subscriber? Sign in

What it means: Now that millions of cases are online for free, a good training source will be easily available. Programs will also be able to more easily search case text to provide lawyers with relevant background research for cases. As Adam Ziegler, the managing director of the Library Innovation Lab, told us last year: “I think there will be a lot more experimentation, and the progress will accelerate. It’s really hard to build a smart interface if you can’t get to the basic data.”

This is your last free story.
Sign in Subscribe now

Your daily newsletter about what’s up in emerging technology from MIT Technology Review.

Please, enter a valid email.
Privacy Policy
Submitting...
There was an error submitting the request.
Thanks for signing up!

Our most popular stories

Advertisement