TY - JOUR
T1 - Performance, area, and power evaluations of ultrafine-grained run-time power-gating routers for CMPs
AU - Matsutani, Hiroki
AU - Koibuchi, Michihiro
AU - Ikebuchi, Daisuke
AU - Usami, Kimiyoshi
AU - Nakamura, Hiroshi
AU - Amano, Hideharu
PY - 2011/4
Y1 - 2011/4
N2 - This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.
AB - This paper proposes the ultrafine-grained run-time power gating of on-chip routers, in which the power supply to each router component (e.g., virtual-channel buffer, virtual-channel multiplexer, and crossbar multiplexer and output latch) can be individually controlled based on the applied workload. Since only the router components that are transferring a packet are activated, the leakage power of the on-chip network can be reduced to a near-optimal level. However, such techniques inherently increase the communication latency and degrade the application performance, since a certain amount of wakeup latency is required to activate the sleeping components. To mitigate this wakeup latency, an early wakeup method that can preliminarily detect the next packet arrival and activate the corresponding components is essential. We designed and implemented an ultrafine-grained power-gating router using a commercial 65 nm process. We propose four early wakeup methods and combine them with the power-gating router. The proposed router with the early wakeup methods is evaluated in terms of its application performance, area overhead, and leakage power reduction taking into account the on/off energy overhead. The simulation results showed that it reduces the leakage power by 54.4-59.9% on average even when the application programs are fully running, at the expense of 4.6% of the area and 0.7-3.7% of the performance overheads when we assume a 1 GHz operation.
KW - Low power
KW - on-chip networks
KW - power gating
KW - router architecture
UR - http://www.scopus.com/inward/record.url?scp=79953071090&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79953071090&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2011.2110470
DO - 10.1109/TCAD.2011.2110470
M3 - Article
AN - SCOPUS:79953071090
SN - 0278-0070
VL - 30
SP - 520
EP - 533
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 4
M1 - 5737865
ER -