The next step, explains Barzilay, is to add structure to the transcribed words. Software was already available that could break up long strings of sentences into high-level concepts, but she found that it didn’t do the trick with the lectures. So her group designed its own. “One of the key distinctions,” she says, “is that, during a lecture, you speak freely; you ramble and mumble.”
To organize the transcribed text, her group created software that breaks the text into chunks that often correspond with individual sentences. The software places these chunks in a network structure; chunks that have similar words or were spoken closely together in time are placed closer together in the network. The relative distance of the chunks in the network lets the software decide which sentences belong with each topic or subtopic in the lecture.
The result, she says, is a coherent transcription. When a person searches for a keyword, the browser offers results in the form of a video or audio timeline that is partitioned into sections. The section of the lecture that contains the keyword is highlighted; below it are snippets of text that surround each instance of the keyword. When a video is playing, the browser shows the transcribed text below it.
Barzilay says that the browser currently receives an average of 21,000 hits a day, and while it’s proving popular, there is still work to be done. Within the next few months, her team will add a feature that automatically attaches a text outline to lectures so users can jump to a desired section. Further ahead, the researchers will give users the ability to make corrections to the transcript in the same way that people contribute to Wikipedia. While such improvements seem straightforward, they pose technical challenges, Barzilay says. “It’s not a trivial matter, because you want an interface that’s not tedious, and you need to propagate the correction throughout the lecture and to other lectures.” She says that bringing people into the transcription loop could improve the accuracy of the system by a couple percentage points, making user experience even better.
Hear more from MIT at EmTech MIT.