The smaller a silicon transistor becomes, the more electrons it leaks. That can mean unreliable, battery-draining chips. Researchers at Intel have come up with a way of dealing with the problem that subverts the industry’s strong preference for precision. The company’s prototype chip operates in a low-power but error-prone mode, but it detects and corrects its errors. This approach, researchers have found, saves 37 percent on power compared with running in conventional mode with no loss of performance.
One way to ensure better performance, even as transistors get smaller and leakier, is to operate them at a relatively high voltage all the time. Most microprocessors today are designed to run at a level that represents a kind of worst-case scenario, says Wen-Hann Wang, director of circuits and systems research at Intel and vice president of Intel Labs in Hillsboro, OR. But it’s rare that a user is doing so many things at once–say, playing a graphics-rich game, uploading video to Facebook, and surfing the Web–that the microprocessor needs to be running in its highest range.
And the high-voltage, high-performance design strategy is becoming a problem for mobile devices, where battery life is important. One way to prolong battery life is to run the chip at a lower voltage, but this leads to errors.
“When a circuit operates at a low voltage, the system gets noisy,” says Wang. Circuits running at low voltages are particularly vulnerable to variations in temperature, and to a phenomenon called “voltage droop”: running a low level of electrical current through billions of transistors at the same time is like taking a shower while the washing machine and dishwasher are running. Just as this heavy water usage can cause a drop in water pressure, running many operations at low voltage can cause sudden drops in current through an individual transistor, and this can lead to errors. Another source of errors that becomes more of a problem at low voltages are inconsistencies that emerge as a chip ages.
These errors are rare, but significant. For example, they might lead an image to freeze as it’s being rendered, forcing the user to restart the process. To cope with the errors that occur when running at low voltage, Intel is developing a strategy the company calls “resilient” circuits. “You don’t know how things will vary, and in which circuits errors will happen,” says Wang. “But if you don’t worry about it, it will be okay most of the time.”
The company’s prototype chip is based on the 45-nanometer transistors in its products today, but it incorporates resilient circuits. The chip is run at low voltage, and when an error-detection circuit detects a problem, the calculation is redone at high voltage to correct it. “When you have to correct an error, and reexecute a process more slowly, there is a tiny penalty,” says Wang. “But overall, you get a huge return.” Tests in the lab have shown that the chip can either save 37 percent on power consumption, or operate 21 percent faster at a given power level.
“They push it as close to the danger zone as they can, and things sometimes go bad, and they correct for it, which is very clever,” says Krishna Palem, professor of computing at Rice University in Houston. “The number of times you do that ought to be few and far between.” This strategy has been developed by mathematicians for decades, but Palem says Intel seems to be the only company testing circuits that operate on these principles in the context of a product. Palem is developing low-voltage, low-power computing strategies that are even more laissez-faire about errors. Some of these errors, if they’re made in calculations that aren’t critical (such as a calculation that causes an undetectable distortion in an image but doesn’t freeze it), don’t need to be corrected. Palem believes a combination of his technique with Intel’s resilient circuits could help chips save even more power.
Intel would not disclose when it will incorporate resilient circuits into its products. Its next generation of mobile processors, which will come to market in a few months and which is based on 45-nanometer transistors, won’t use this error-detection strategy. But error-generating leakiness becomes more of a problem as transistors shrink, so something like circuit resiliency may become a necessity in the next few years. “It will really begin to show at the 20-nanometer level,” says Palem.