The Chinese Solar Machine Layer by Layer Fire in the Library The Mystery Behind Anesthesia
Technology Review
Specialized transistors track hardware bugs as they happen.
As microprocessors get smaller and more intricate, finding the hardware bugs that can cause a computer to crash requires more time, money, and engineering effort. But now engineers at Stanford University have proposed a shortcut that could help locate bugs in a fraction of the time.
Debugging normally involves putting a chip through a battery of tests to identify spots that are likely to fail and to give engineers a chance to fix problems before the chips go into mass production. As chip-making companies as push the functionality of their hardware, this becomes increasingly complicated.
Subhasish Mitra, professor of electrical engineering and computer science at Stanford and colleagues have developed a method that uses a small number (about 1 percent) of the transistors on a chip to record a log of chip activity--the instructions that pass through the chip's circuits. This log can be extracted from the chip, dumped into a computer, and analyzed to find out where the bugs are.
"It's enormously expensive to diagnose where chips are failing," says Rob Rutenbar, professor of computer science at the University of Illinois, who wasn't involved with the research. As the features on microprocessors get smaller, Rutenbar says, "people worry more about wear-out and reliability issues."
Engineers test for bugs throughout the making of a chip. First, they scour the designs to find any so-called functional or logic errors. Then, after the designs have been etched into silicon, engineers look for bugs that crop up under operating conditions such as playing video or browsing the Web. This process is called post-silicon debugging, and 30 to 40 percent of the time and money spent on making a new chip by companies like Intel and AMD is allotted to post-silicon debugging, says Mitra.
During the post-silicon phase, engineers pulse electrical signals through the chip, mimicking the electrical activity seen during normal operation. If a chip fails during these tests, engineers try to re-create the electrical signals that caused the problem. Next, they try to pinpoint the exact set of instructions and conditions responsible for the failure. But simulation takes time: a single second in silicon can be equivalent to days of simulation, says Mitra. Moreover, many of the errors occur due to operating temperatures and workloads that are difficult to re-create. "The trouble is, the whole electrical state of the system changes," Mitra says.
So Mitra and Stanford graduate student Sung-Boem Park decided to catch evidence of the bugs while they happen, eliminating much of the time spent doing electrical simulations. The challenge was finding the right way to record information about chip instructions without using too many transistors and without storing too much information. To do this, they built recording devices, or buffers, into chips. This is not a new idea. In fact, almost any kind of commercially available chip today has a small number of transistors whose job is to hold small amounts of data about chip activity--to ensure, for instance, that operations are synchronized across the chip.
"designed to collect just the right amount of information about the chip's activity at just the right time."
This leaves me frustrated and confused. What information, and at what time? Without knowing that, it's like reading about magic. Decades ago I did work in logic diagnostics, and I'd like a little more than hand-waving when I read about it.
"When a failure or hint of an impending failure is detected..."
Again, how does the new circuitry sense "impending failure"? I feel like I've learned nothing from this article.
Manufacturing in the United States is in trouble. That's bad news not just for the country's economy but for the future of innovation.
National Instruments has gathered customer information and data regarding some of the cost differences between building a custom solution versus using NI off-the-shelf tools. Using this data, we built the Graphical System Design ‘Build vs. Buy’ Calculator. The calculator can help show the financial differences between building a custom solution versus buying an off-the-shelf system. This paper discusses the benefits and drawbacks of both a traditional custom design approach and off-the-shelf embedded tools.
View full PDF >Our list of the 50 most innovative companies, including the following:
bmatichuk
4 Comments
Wireless transmission
A company in Edmonton called Scanimetrics has developed a technology called WiTAP (Wireless Test Access Port) to read results from a chip in process without touching the chip. This could be used with or without logging. The main advantage is reduced chip real estate because processing can all be done off chip.
Reply