Skip to Content
Computing

The grassroots push to digitize India’s most precious documents

The Servants of Knowledge collection on the Internet Archive is an effort to make up for the lack of library resources in India.

October 25, 2023

On a bright sunny day in August, in a second-floor room at the Gandhi Bhavan Museum in Bengaluru, workers sit in front of five giant tabletop scanners, lining up books and flipping pages with foot pedals. The museum building houses the largest reference library for Gandhian philosophy in the state of Karnataka, and over the next year, the large assortment of books—including the collected works of Mahatma Gandhi, a translation of his autobiography, Experiments with Truth, into the Kannada language, and other rare items—will be digitized and their metadata recorded before they join the Servants of Knowledge (SoK) collection on the Internet Archive. 

This digitization push is just the latest for the SoK, which was established about four years ago with a volunteer effort to preserve hard-to-find resources. It has since expanded to include partnerships with various libraries and archives throughout India.

screenshot from Internet Archive
The Servants of Knowledge digital collection aims to make up for the scarcity of library resources in India.

Today, the SoK collection is a searchable library of books, speeches, magazines, newspapers, palm leaf manuscripts, audio, and film from and about India in over 15 languages. The collection is a truly open digital library containing public-domain and out-of-copyright works on science, literature, law, politics, history, religion, music, and folklore, among many other topics. All content is open access, searchable, downloadable, and accessible to visually challenged people using text-to-speech tools. Volunteers and staff continue to expand the collection, scanning about 1.4 million pages per month in various locations across Bengaluru, and more collaborations are in the works.

The collection is an effort to make up for the scarcity of library resources in India. There are about 50,000 public-funded libraries in this country of over 1.4 billion people, according to the Raja Rammohun Roy Library Foundation, a group established by the Indian government to promote the public-library movement there. Village and tribal libraries may contain just a few thousand books, compared with a median 77,000 books in each state’s central library and 24,000 in every district library, according to a 2018 report by the foundation. Some libraries have lost their collections to fire. A number of books have been ruined by neglect. Others have gone missing.

Moreover, most public libraries aren’t freely accessible to the public. “Getting access to many of our public libraries is so difficult, and after a point people will give up asking for access. That’s the case in many of our public-funded educational institutes too,” says Arul George Scaria, an associate professor at the National Law School of India University Bengaluru, who studies intellectual-property law. One of the best ways to liberate access to these libraries, he says, is through digitization.

Technologist Omshivaprakash H L felt the acute lack of such resources when he needed references for writing Wikipedia articles in Kannada, a southwestern Indian language. Around 2019, he heard that Carl Malamud, who runs Public Resource, a registered US charity, was already archiving books like Gandhi’s Hind Swaraj collection on Indian self-rule and works of the Indian government in the public domain. “I also knew that he used to buy a lot of these books from secondhand bookstores and take them to the US to get them digitized,” says Omshivaprakash. 

Public Resource had been working with the Indian Academy of Sciences, Bengaluru, to digitize its books using a scanner provided by the Internet Archive, but the efforts had tapered off. Omshivaprakash proposed engaging community members to help. During the weekends, these volunteers began scanning some of the books Omshivaprakash had and that Malamud had bought. “Carl really understood the idea of community collaboration, the idea of local language technology that we needed, and the kind of impact we were creating,” Omshivaprakash says.

The scanners use a V-shaped cradle to hold the books and two DSLR cameras to capture the pages in high resolution. The device is based on the Internet Archive’s scanner but was reengineered by Omshivaprakash and manufactured in India at a lower cost. Each worker can scan about 800 pages an hour. 

The more crucial parts of the operation happen after the scan: volunteers make sure to apply accurate metadata to make the scans findable on the Internet Archive, and optical character recognition, which has been fine-tuned to work better for a range of Indian language scripts, makes the text searchable and accessible through text-to-speech programs.

Public Resource funds the SoK project, and Omshivaprakash manages the operation, with the help of staff and volunteers. Collaborators have come through social media and word of mouth. For instance, a community member and Kannada teacher named Chaya Acharya approached Omshivaprakash with newspaper clippings of work by her grandfather, the renowned journalist and writer Pavem Acharya, who wrote articles on science and social issues as well as satirical essays. Unexpectedly, she found more articles by her grandfather in the existing Servants of Knowledge collection. “Simply by searching his name, I got many more articles from the archive,” she says. She began collecting copies of Kasturi, a prominent Kannada monthly magazine that Pavem Acharya had edited from 1952 to early 1975, and gave them to Omshivaprakash for digitizing. The old issues of the magazine contain rare writings and translations by popular Kannada authors, such as Indirabai by Gulavadi Venkata Rao, regarded as the first modern novel in Kannada, and a Kannada translation of Edgar Allan Poe’s famous short story “The Gold-Bug.”

This is all part of a vision of a public library on the internet as “a bottom-up, grassroots thing,” Malamud says. “It’s a bunch of people teaching each other. We just want to keep scanning and making [these materials] available to people. It’s not a grand goal or single aim. 

“It’s what we do for a living,” he says. “We have done it for years, and we are gonna keep doing it for years.”

Ananya is a freelance science and technology journalist based in Bengaluru, India.

Deep Dive

Computing

It’s time to retire the term “user”

The proliferation of AI means we need a new word.

Modernizing data with strategic purpose

Data strategies and modernization initiatives misaligned with the overall business strategy—or too narrowly focused on AI—leave substantial business value on the table.

How ASML took over the chipmaking chessboard

MIT Technology Review sat down with outgoing CTO Martin van den Brink to talk about the company’s rise to dominance and the life and death of Moore’s Law.

 

Why it’s so hard for China’s chip industry to become self-sufficient

Chip companies from the US and China are developing new materials to reduce reliance on a Japanese monopoly. It won’t be easy.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.