Ultra-Low Power Pipeline Structure Exploiting Noncritical Stage with Circuit-Level Timing Speculation

2013-06-19 16:16:43TaoLuoYaJuanHePingLuoYanMingHeandFengHu

Journal of Electronic Science and Technology 2013年3期

Tao Luo, Ya-Juan He, Ping Luo, Yan-Ming He, and Feng Hu

Tao Luo, Ya-Juan He, Ping Luo, Yan-Ming He, and Feng Hu

—With the increase of the clock frequency and silicon integration, power aware computing has become a critical concern in the design of the embedded processor and system-on-chip (SoC). Dynamic voltage scaling (DVS) is an effective method for low-power designs. However, traditional DVS methods have two deficiencies. First, they have a conservative safety margin which is not necessary for most of the time. Second, they are exclusively concerned with the critical stage and ignore the significant potential free slack time of the noncritical stage. These factors lead to a large amount of power waste. In this paper, a novel pipeline structure with ultra-low power consumption is proposed. It cuts off the safety margin and takes use of the noncritical stages at the same time. A prototype pipeline is designed in 0.13μm technology and analyzed. The result shows that a large amount of energy can be saved by using this structure. Compared with the fixed voltage case, 50% of the energy can be saved, and with respect to the traditional adaptive voltage scaling design, 37.8% of the energy can be saved.

Index Terms—Adaptive circuits, dynamic voltage scaling, exploiting noncritical stage, ultra-low power.

1. Introduction

With the development of electronic products, power, the first degree design constraint, is becoming more and more important[1],[2]. With the increasing clock frequency, power aware computing becomes more and more crucial in the imbedded system design and system-on-chip (SoC) design. Dynamic voltage scaling (DVS) is an effective way to achieve large amount of power savings[3]?[7], for dynamic energy scales is quadratic with the supply voltage[8]. In order to save energy as much as possible, it is significant to scale the supply voltage as low as possible. To achieve this goal, the traditional methods of adaptive design have used look-up tables[9],[10]or delay-chain[11]?[17]. However, traditional DVS design is conservative because the voltage is chosen to ensure the processor operates correctly under the worst-case combined condition which is very rare[18]. In order to ameliorate this situation and make use of the safe margin, several techniques, such as the Razor structure, have been put forward[19]. However, the Razor structure is exclusively concerned with the critical stage and ignores the significant potential free slack time of the noncritical stages.

In this paper, an ultra-low power pipeline structure is proposed. It combines the advantage of in-situ error detection and correction, namely the safety margin is cut off, and the advantage of using the free slack time of the noncritical stage. The main differences between the proposed pipeline structure and the traditional one are the stage register and the clock gating strategy. In order to detect and correct the timing error due to the decrease of the supply voltage, the stage register is equipped with a latch and some other accessories which allow the stage register to double lock the data. And the latch is high level enabled so the stage register can receive data during the high level of the clock. So even if the data fail to be ready at the rising edge of clock, namely the timing errors appear, the pipeline can still operate correctly without performance penalties. Besides, the time slack of the last stage can propagate to the next one, if the next stage is a noncritical stage which has extra time, then it can tolerate the time slack of the last stage, which allows the supply voltage to be scaled lower.

The rest of the paper is organized as follows. In Section 2, the whole structure of this ultra-low power pipeline will be presented including the structure of stage register. In addition, the operating mechanism and tuning supply voltage according to the error condition would also be presented. Section 3 will show the simulation results of the whole pipeline. And the conclusion will be drawn in Section 4.

2. Structure of Ultra-Low Power Pipeline

The block diagram of the proposed ultra-low power pipeline structure is shown in Fig. 1.

As shown in Fig. 1, the ultra-low power pipeline consists of five stages which is the class structure of the current pipeline. In order to control the critical paths precisely, the combinational blocks are replaced by delay chains which can well represent the delay of the critical path of each stage. The pipeline structure consists of five stages. The first one is the instruction fetch (IF), at which the processor fetches the instruction code from the instruction register. The second one is the instruction decode (ID), at which the instructions delivered from the IF stage is decoded. The third stage is execution (EX), at which the processor executes the instruction decoded by the ID stage, and the control signal from the ID stage can make arithmetic logic unit (ALU) do all kinds of action such as addition and subtraction. Then the processor stores data to memory or load data from memory at the fourth stage named the memory (MEM) stage. The last stage is the write back (WB) stage, at which the processor stores the result to the data register. According to the differences among these stages, each stage is replaced by different delay chains which have different delay time. In classic pipeline structure, the EX is the critical stage and the others are noncritical stages[19],[20]. According to the design, the critical stage, namely the EX stage, has a delay of 4.45 ns with a period of 5 ns, the ID stage has a delay of 3.6 ns, and the other stages all have delays of 2.7 ns. The clock is gated by the err signal (see Fig. 1) of the last stage register. This gating strategy working together with stage registers can fully exploit noncritical stages.

Fig. 1. Block diagram of the ultra-low power pipeline.

Fig. 2. Structure of stage register.

2.1 Pipeline Error Detection/Correction

As mentioned before, the pipeline achieves progressive energy saving by cutting off the safe margin and exploiting noncritical stages. Efficient timing error detection and correction are keys to reach this goal. The block diagram of the stage register is shown in Fig. 2. It consists of a flip-flop, a latch, a XOR gate, and a multiplexer (MUX) module.

As shown in Fig. 2, the main flip-flop is augmented with a latch which is controlled by the clock, and the latch is high level enabled. The operating voltage is constrained such that the worst-case delay is guaranteed to meet the setup time of the latch. When the high level of clock is coming, the flip-flop latches the data at the rising edge and the latch receives the data during the high level of the clock. The data latched by them respectively are then compared. If they are different, it indicates that there is a timing error in the flip-flop, and the correct value latched in the latch is used to correct the timing error to ensure the data delivered to the next stage is correct. Utilizing the value in the latch directly to the next stage is an effective way to use the extra time of the noncritical stage to ameliorate the critical stage. The time slack of the last stage can propagate to the next stage, if the next stage is a noncritical stage which has extra time, then it can tolerate the time slack of the last stage, which allows the supply voltage to be scaled lower. According to this mechanism, the system operates correctly as long as the MEM stage register does not generate a valid error signal. And once the MEM stage register generates a valid error signal, the pipeline will be recovered by using global clock gating.

The operation of a stage register is illustrated in Fig. 3. In clock cycle1 and cycle2, the combination logic meets the setup time at the rising edge of the clock, and both the main flip-flop and the latch can latch the correct data. In this condition, the signal Error_h keeps low and the operation of the pipeline is normal. The condition of timing error appears in cycle3 as shown in Fig. 3. The combinational logic exceeds the intended delay due to sub-critical voltage scaling. In this case, the main flip-flop fails to latch the data at the rising edge of the clock, but since the latch is high level enabled, the data is latched by the latch correctly in cycle4. As the data latched in the main flip-flop and the latch are different, the Error_h signal is set valid at the output of the comparator. Then, the MUX controlled by the Error_h signal chooses the output of latch as the output of the whole register. So the output of the register is correct.

Fig. 3. Operation of stage register.

Fig. 4. Critical stage borrow time from noncritical stage.

2.2 Exploiting the Noncritical Stage

Because the stage register is able to borrow time from the next stage, the pressure on critical stage is released by exploiting the next noncritical stage. Fig. 4 shows the operation of a critical stage and the noncritical stage next to it. At the first and second rising edges of the clock signal, the critical stage and noncritical stage both satisfy the timing requirement and the Error_h signal remains low. The operation of pipeline is normal. At the third rising edge of the clock signal, the critical stage fails to satisfy the timing constraint namely that data4 does not arrive at the rising edge of the clock signal, and then the Error_h signal is set valid to indicate this timing error. However, since the error detection and correction mechanism which explained in Section 2.1 is applied, the correct data4 still delivers to the noncritical stage after the rising edge of the clock.

2.3 Short Path Constraints and Duty Ratio of Clock

The use of the high level enabling latch raises the possibility that a short path in the combinational logic will corrupt the data in the latch. Fig. 5 shows the difference between the right path and the short path.

Fig. 5 shows how a short-path allows data launched at the start of a cycle to be latched into the latch, instead of the data being launched from the previous cycle. As we design, the latch should lock the data from the previous cycle as the main flip-flop does. However, if the delay of the stage is too short, the data will arrive at the latch before the lock window closes. As shown in Fig. 5, the minimum-path constraint is equal to the sum oftdelayand the hold timetholdof the latch, which is typically a small value. The minimum path delay constraint can be expressed as

wheretdelayis the duration time of the high level of the clock andtholdis the hold time of the latch.

Therefore, a minimum-path length constraint should be applied to the input of each register to avoid this corruption. These minimum-path constraints result in the addition of buffers to slow down the fast path and therefore introduce a certain overhead. However, the fast path of the pipeline stage is rare so the number of buffers is negligible, which makes the overhead negligible.

Fig. 5. Short path constraint.

According to (1), the duty cycle of the clock determines how serious the minimum-path length constraint can be. A large duty ratio of a clock signal increases the severity of the short path constraint and therefore increases the power overhead due to the need for additional buffers. On the other hand, a small duty ratio of clock reduces the margin between the main flip-flop and the latch, and hence reduces the amount that the supply voltage can be dropped below the critical supply voltage. Thus the duty ratio represents a trade-off between the cost due to buffers added and the power saved by the dropped supply voltage. In this design, a duty ratio of 2/5 is adopted to balance the energy cost and saved.

2.3 Supply Voltage Control Strategy

The error condition of registers at each stage is the basis of how to adjust the supply voltage. There is no need to gate the clock when errors appear at IF, ID, and EX stages, because the errors at those stages will not affect the correctness of the pipeline. The errors that appear at the MEM stage really matter. If the error signal of the MEM stage is invalid, then it indicates the pipeline operates correctly and the voltage ought to be decreased, no matter whether other error signals valid or not. When the error signal of the MEM stage appears, it indicates the circuits are not meeting the clock period constraints and it is used in the clock gating to correct the output with right data, and the whole pipeline will be suspended for one cycle. The error signal is also used to indicate that the supply voltage should be increased. When the supply voltage increases, the delay of the combinational circuit will decrease at the same time, then the error signal will be set invalid to indicate that the pipeline operates correctly again. There are four errors index signals, which have different weights, for they indicate different stages of the pipeline. So a more complicated algorithm can be developed to generate the clock gating signal and to control the supply voltage. For simplicity, we take the simplest one, namely using the error signal of the MEM stage to control the supply voltage and clock gating.

3. Simulation Result

To prove the validity and robustness of this novel pipeline structure which exploits noncritical stage and uses the timing error detection and correction approach, the structure is designed and simulated. The only difference between the ultra-low power pipeline structure and regular pipeline structure is the stage register, but the stage register does not affect the delay of the circuit as well as the supply voltage, so the ultra-low power pipeline can show all the characteristics of the regular pipeline. Therefore, the critical comparisons are given among the ultra-low pipeline with different supply voltages. The pipeline is implemented in a 0.13 μm digital-analog mixed signal standard CMOS (complementary metal-oxide-semiconductor transistor) process, which is expected to operate at 200 MHz and the ratio cycle of clock is 2/5. Fig. 6 shows the relation of the power and supply voltage.

As shown in the Fig. 6, the power cost of the design reduces when the supply voltage decreases. From the simulation result, when the supply voltage scales at 1.08 V, the first error occurs at the critical stage, namely the EX stage. This situation represents the traditional adaptive voltage scaling (AVS) with a safety margin. When the supply voltage scales at 0.97 V, the ID stage first fails to meet the clock constraint, and when the supply voltage scales at 0.95 V, the err signal is first to be set valid to indicate that the supply voltage should increase to avoid the corruption of the whole pipeline because of the over scaling. The key voltage point and the corresponding power and error condition are shown in Table 1.

Fig. 6. Relation of power and supply voltage.

Table 1: Key voltage point

According to Table 1, a large amount of energy can be saved by this structure. Compared with the fixed voltage case, 50% of the energy can be saved, and compared with the traditional adaptive voltage scaling design, 37.8% of the energy can be saved.

4. Conclusions

In this paper, an ultra-low power pipeline structure has been proposed. The key advantage of this pipeline structure over the traditional voltage scaling technologies is that it makes use of the most of the free slack time of noncritical stages, and applies the in-situ timing error detection and correction approach in order to eliminate the conservative safe margin. Simulations are performed and the results show that this innovative pipeline structure is very effective in reducing the power consumption. Since the overhead logic is just the augment of the stage registers, which are negligible compared with the whole microprocessor system, the power consumption of the overhead logic is negligible compared with the reduction of power consumption of the whole pipeline.

Acknowledgment

The authors would like to thank IPGoal Microelectronics (Sichuan) Co., Ltd for its support.

[1] A. Wang, S. Naffziger,Adaptive Techniques for Dynamic Process Optimization, New York: Springer, 2008, pp. 1?10.

[2] Y.-Q. Huo, Q.-C. Shao, and Z. Huai, “Adaptive power and bit allocation in multicarrier systems,”Journal of Electronic Science and Technology of China, vol. 5, no. 1, pp. 13?17, 2007.

[3] T. Pering, T. Burd, and R. Brodersen,“The simulation and evaluation of dynamic voltage scaling algorithms,” inProc. of 1998 Int. Symposium on Low Power Electronics and Design, Monterey, 1998, pp. 76?81.

[4] T. Liu and S. Lu, “Performance improvement with circuit-level speculation,” inProc. of the33rd Annual Int. Symposium on Microarchitecture, Monterey, 2000, pp. 348?355.

[5] H. W. Lee, K. H. Kim, Y. K. Choi, J. H. Sohn, N. K. Park, K. W. Kim, C. Kim, Y. J. Choi, and B. T. Chung, “A 1.6V 1.4 Gbp/s/pin consumer DRAM with self-dynamic voltage scaling technique in 44 nm CMOS technology,”IEEE Journal of Solid-State Circuits, vol. 47, no. 1, pp. 131?140, Jan. 2012.

[6] M. Elgebaly and M. Sachdev, “Variation-aware adaptive voltage scaling system,”IEEE Trans. on Very Large Scale Intergration Systems, vol. 15, no. 5, pp. 560?570, May 2007.

[7] A. Gupta, R. Chauhan, V. Menezes, V. Narang, and H. M. Roopashree, “A robust level-shifter design for adaptive voltage scaling,” inProc. of the 21st Int. Conf. on VLSI Design, Hyderabad, 2008, pp. 383?388.

[8] T. Mudge, “Power: A first class design constraint,”Computer, vol. 34, no. 4, pp. 52?57, Apr. 2001.

[9] J. Tschanz, N. S. Kim, S. Dighe, J. Howard, G. Ruhl, S. Vangal, S. Narendra, Y. Hoskote, H. Wilson, C. Lam, M. Shuman, C. Tokunaga, D. Somasekhar, S. Tang, D. Finan, T. Karnik, N. Borkar, N. Kurd, and V. De, “Adaptive frequency and biasing techniques for tolerance to dynamic temperature-voltage variations and aging,” inDigest of Technical Papers of IEEE Int. Solid-State Circuits Conf., San Francisco, 2007, pp. 292?293,

[10] B. Stackhouse, B. Cherkauer, M. Gowan, P. Gronowski, and C. Lyles, “A 65nm 2-billion-transistor quad-core Itanium?? processor,” inDigest of Technical Papers of IEEE Int. Solid-State Circuits Conf., San Francisco, 2008, pp. 92?93.

[11] A. Drake, R. Senger, H. Deogun, G. Carpenter, S. Ghiasi, T. Ngyugen, N. James, and M. Floyd, “A distributed critical-path timing monitor for a 65nm high-performance microprocessor,” inDigest of Technical Papers of IEEE Int. Solid-State Circuits Conf., San Francisco, 2007, pp. 398?399.

[12] T. D. Burd, T. A. Pering, A. J. Stratakos, and R. W. Brodersen, “A dynamic voltage scaled microprocessor system,”IEEE Journal of Solid-State Circuits, vol. 35, no. 11, pp. 1571?1580, 2000.

[13] M. Nakai, S. Akui, K. Seno, T. Meguro, T. Seki, T. Kondo, A. Hashiguchi, H. Kawahara, K. Kumano, and M. Shimura,“Dynamic voltage and frequency management for a low power embedded micro-processor,”IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 28?35, Jan. 2005.

[14] K. J. Nowka, G. D. Carpenter, E. W. MacDonald, H. C. Ngo, B. C. Brock, K. I. Ishii, T. Y. Nguyen, and J. L. Burns, “A 32-bit PowerPC system-on-a-chip with support for dynamic voltage scaling and dynamic frequency scaling,”IEEE Journal of Solid-State Circuits, vol. 37, no. 11, pp. 1441?1447, Nov. 2002.

[15] S. Dhar, D. Maksimovic, and B. Kranzen, “Closed-loop adaptive voltage scaling controller for standard-cell ASICs,”inProc. of2002 Int. Symposium on Low Power Electronics and Design, Piscataway, 2002, pp. 103–107.

[16] A. K. Uht, “Uniprocessor performance enhancement through adaptive clock frequency control,”IEEE Trans. On Computers, vol. 54, no. 2, pp. 132–140, 2005.

[17] M. Miller, K. Janik, and S. L. Lu, “Non-stalling counterflow microarchitecture,” inProc. of the 4th Int. Symposium on High Performance Computer Architecture, Las Vegas, 1988, pp. 334–341.

[18] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “A self-tuning DVS processor using delay-error detection and correction,”IEEE Journal of Solid-State Circuits, vol. 41, no. 4, pp. 792–804, 2006.

[19] D. Ernst, N. S. Kim, S. Das, S. Pant, T. Pham, R. Rao, C. Ziesler, D. Blaauw, T. Austin, T. Mudge, and K. Flautner,“Razor: A low-power pipeline based on circuit-level timing speculation,” inProc. of the 36th Annual IEEE/ACM Int. Symposium on Microarchitecture, doi: 10.1109/MICRO. 2003.1253179.

[20] D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga, S. Das, and D. Bull, “Razor II: In situ error detection and correction for PVT and SER tolerance,” inDigest of Technical Papers of IEEE Int. Solid-State Circuits Conf., San Francisco, doi: 10.1109/ISSCC.2008. 4523226.

Tao Luo was born in Sichuan Province, China in 1988. He received the B.S. degree from the Harbin Institute of Technology (HIT), Harbin in 2010. He is currently pursuing the M.S. degree with the School of Microelectronics and Solid State Electronics, University of Electronic Science and Technology of China (UESTC). His research interests include digital IC and low power circuit and techniques.

Ping Luo was born in Sichuan Province, China, in 1968. She received the B.S. and M.S. degrees from the Chongqing University in 1990 and 1993, respectively. She received the Ph.D. degree in electrical circuit and system from UESTC in 2004. She is now a professor with UESTC. As a scholar,she visited the Georgia Institute of Technology from 2002 to 2003. Her research interests include power management circuit for SoC/CPU and LED driver.

Yan-Ming Hewas born in Shanxi Province, China in 1990. He received the B.S. degree from the UESTC in 2012. He is currently pursuing the M.S. degree with the School of Microelectronics and Solid State Electronic, UESTC. His research interests include power electronics and low power techniques.

Feng Huwas born in Hubei Province, China in 1988. He received the B.S. degree from the UESTC in 2010. He is currently pursuing the M.S. degree with the School of Microelectronics and Solid State Electronic, UESTC. His research interests include power electronics and low power techniques.

her B.S. degree from East China Normal University, Shanghai, China in 2001, and the Ph.D. degree from Nanyang Technological University, Singapore in 2008. From 2001 to 2002, she was a research and training program digital IC designer with the Institute of Microelectronics, Singapore. In 2007, she joined Asia Pacific Design Center, STMicroelectronics, Singapore, working on the smartcard products. Since 2009, she has been with the School of Microelectronics and Solid-State Electronics, UESTC, where she is now an associate professor. Her current research interests include digital integrated circuits, low-power techniques, and power management IC design.

Manuscript received August 15, 2012; revised April 10, 2013. This work was supported by the Important National S&T Special Project of China under Grant No. 2011ZX01034-002-001-2, and the Fundamental Research Funds for the Central Universities under Grant No. ZYGX2009J026.

T. Luo and Y.-J. He are with the School of Microelectronics and Solid-State Electronics, University of Electronic Science and Technology of China, Chengdu 610054, China (Corresponding author e-mail: leto.luo@gmail.com; yjhe@uestc.edu.cn).

P. Luo, Y.-M. He, and F. Hu are with the School of Microelectronics and Solid-State Electronics, University of Electronic Science and Technology of China, Chengdu 610054, China (e-mail: pingl@ uestc.edu.cn; heyanming1990@gmail.com; echofeng6@gmail.com).

Digital Object Identifier: 10.3969/j.issn.1674-862X.2013.03.012

Journal of Electronic Science and Technology2013年3期

Journal of Electronic Science and Technology的其它文章: Toluene Sensing Properties of P4VP/Multi-Walled Carbon Nanotubes Multi-Layer Film Sensors; Measuring Optical Length and Analyzing Accuracy Error Based on All-Fiber Optic Interferometer; Ultra-Low Power Pipeline Structure Exploiting Noncritical Stage with Circuit-Level Timing Speculation; Formaldehyde OTFT Sensors Based on Airbrushed Different Ratios of P3HT/ZnO Films; Breast Cancer Detection Using an Ultrashort-Pulse Radar System in Synthetic Breast Phantom Model; Correcting Image Distortion for Adaptive Cruise Control

国产日韩欧美一区二区三区三州_亚洲少妇熟女av_久久久久亚洲av国产精品_波多野结衣网站一区二区_亚洲欧美色片在线91_国产亚洲精品精品国产优播av_日本一区二区三区波多野结衣 _久久国产av不卡

Ultra-Low Power Pipeline Structure Exploiting Noncritical Stage with Circuit-Level Timing Speculation

1. Introduction

2. Structure of Ultra-Low Power Pipeline

3. Simulation Result

4. Conclusions

Acknowledgment