Jott: Calling It In

Ever had a great idea while you’re stuck in traffic? No problem. Pick up your phone and Jott yourself an e-mail.

Wade Rousharchive page

March 30, 2007

For years, Google has aggressively resisted the verbification of its name, to the point of scolding individual technology reporters every time they write “Googled” when they mean “searched.” Now there’s a tech company with the temerity to verbify its name from the very beginning. It’s Jott, a Seattle startup founded by two former Microsoft employees who want to help people capture their thoughts and ideas electronically, even if they’re nowhere near a computer keyboard. You can “jott” by calling Jott’s toll-free number from your cell phone, specifying who should receive your message (for example, “myself” or “family”), and dictating for up to 30 seconds. Within minutes, your message or reminder is transcribed and e-mailed or text-messaged to the appropriate parties.

**Talk time:** With Jott, a user can call a toll-free number and dictate a message up to 30 seconds long. The resulting audio file is transcribed by workers in India, and its contents are sent to the user as an e-mail or text message.

“People have some of their greatest ideas when they’re away from their PCs,” says Jott CEO and cofounder John Pollard. “The only appliances they have with them all the time are their phones and their voices. So we said, let’s take advantage of that and help people put these thoughts they’re not otherwise remembering into a form where they can actually do something with them later.”

Jott launched its service in December and upgraded it this week, adding features such as the ability to address a message to multiple people. Say you’re stuck at the airport and you’re going to miss a business meeting. You can use the new feature, called JottCast, to notify all your colleagues with a single call (assuming that you’ve used Jott’s website beforehand to collect their e-mail addresses under a single alias such as “Team”).

Jott’s service is free, at least for now. “We didn’t feel comfortable rolling out a business model when we didn’t really know how people were going to use the product,” says Pollard. But he says it’s likely that Jott, like so many other Web 2.0 products, will eventually adopt the so-called freemium model endorsed by many venture capital firms–attaching ads to the basic free service, waiting for interest to spread through word of mouth, and then adding a premium service for a monthly fee.

Hands-free messaging is not a new concept, of course. Transcribing the medical notes that many doctors dictate every day by telephone is a $20 billion business. Cell phones that can store digital voice memos have been on the market for years.

And Jott is only one of several young companies experimenting with new services marrying voice, text messages, and the Internet. In November 2006, voice-over-Internet company ViaTalk introduced Braincast, which works much like Jott, except that it delivers the actual sound file recorded by the user rather than a text transcription. (Jott sends both.) QTech of Hyderabad, India, is testing a similar service called ReQall. Pinger lets users send voice mail without actually placing phone calls; the recipient gets a text message with a link to the audio file. British firm SpinVox works with cellular carriers to turn subscribers’ voice mails into e-mails, text messages, or blog entries. And for $9.99 a month, SimulScribe will convert up to 40 of your voice mails into text messages.

Jott’s strength may be its convenience. It doesn’t involve a software download (as ReQall does), a cellular-carrier middleman, or (as yet) any fees. Pollard says it was important to him and cofounder Shreedhar Madhavapeddi to build something simple. “The core philosophy of Jott is that we want to use stuff that’s already entrenched in your life–you don’t have to buy a new phone or download a bunch of new software,” he says.

Even the speech-to-text process at Jott is low-tech. Jott’s phone system makes sense of contact names such as “myself” using speech recognition software, but such software is still far too primitive to deal with the unrestricted vocabulary that callers use in their actual messages, not to mention rushed or garbled speech or audio junk such as ums and uhs. So Jott saves messages as sound clips on central servers. Human workers at a large call center in India log onto the servers, listen to the most recent clips, and transcribe them manually. In case a transcription is murky, every e-mail from Jott also contains a link to the original sound clip.

“Over time, speech recognition software will get good enough so that shorter, clearer Jotts will be transcribed in a completely automated fashion or will at worst be sent to a quality-assurance person who reads them, listens to the sound file, and says, ‘Yes, this is completely accurate,’” says Pollard. “But for now, 100 percent of Jotts go through a human being.”

In my tests, Jott transcriptions arrived in my e-mail in-box quickly (within about 10 minutes of my calling the service) and were remarkably accurate. Jott’s transcribers even corrected one of my own mistakes. I was leaving myself a reminder to pick up plaque remover for my dog’s water bowl, and in an attempt to be helpful, I spelled out the word “plaque.” But instead of “P-L-A-Q-U-E,” I said “P-L-A-Q-E.” Someone in India thoughtfully inserted the missing “u.”

Pollard won’t reveal how many people are using Jott so far, but he says the company is beating its own projections. Some users have told the company that they already depend on Jott for critical tasks such as coordinating in-home medical care for elderly relatives, Pollard says, making him optimistic that many users can be converted into premium customers. “We’ve had plenty of people say they would be willing to pay for something like this,” he says.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.