Today, at universities and companies alike, everything from broken printers to failed Web transactions generates computer error messages. And while technicians pore over reams of system status “logs”-text files that document what’s happening in the network-everyone else loses precious time trying to access disconnected file servers, Internet links, and e-mails.
Now, a new tool is emerging to break the logs’ logjam. Researchers at San Jose, CA-based Cisco Systems and IBM Research in Yorktown Heights, NY, have developed software that scours logs, converts them to a standard format, and automatically extracts important information. The key is machine-learning algorithms that let system managers teach computers new tricks. If a log states that, say, a server is down, the system manager flags “down” as a keyword and instructs the software to search for the server name, time of failure, and any ripple effects in the network. The software can then apply this instruction to new messages, reducing the need for human intervention. “This is an important step in automating networks,” says Cynthia Hood, a computer scientist and expert in network management at the Illinois Institute of Technology. “Everyone knows how much money is spent on configuring networks and keeping them running.”
Large companies like Toshiba, Hewlett-Packard, and Computer Associates are currently evaluating the technology. This could mean users of computer networks will soon encounter fewer disruptions-for shorter durations. The bottom line, says Alan Ganek, vice president of IBM’s autonomic-computing initiative, is that identifying problems quickly, or even before they occur, lets users “focus on their business and not their infrastructure.” It all starts with teaching old logs new tricks.