Low power FPGALow Power System DesignNishant ShahMaster Electrical Engineering and Information TechnologyTechnical University MunichAbstractField Programmable Gate Arrays (FPGAs) are highlydesirable for implementation in digital systems due totheir flexibility, programmability and low end productlife cycle. However, the advantages of FPGAs are offsetin many cases by their high power consumption andarea. This report provides an overview of static anddynamic power dissipation in FPGAs. It also givesan overview of various low-power techniques used forreduction of power consumption of FPGAs and theiroutcomes with focus on glitch reduction techniques.Glitch reduction techniques are circuit-level techniquesfor reducing power in FPGAs by eliminating unnecessarylogic transitions called glitches. This is done byadding programmable delay elements to the logic blocksof the FPGA to align the arrival times of the inputsof each look-up tables (LUT), thereby preventing newglitches from being generated. On average, the proposedimplementation eliminates 91% of the glitching, whichreduces overall FPGA power by 18%. The added circuitryincreases the overall FPGA area by 6% and critical-pathdelay by less than 1%.I. INTRODUCTIONField-programmable gate arrays are ideal for adaptivesystems, since they are reconfigurable and can be programmedto implement any digital logic. The main difference betweenFPGAs and conventional fixed logic implementations, suchas Application Specific Integrated Circuits (ASICs), is thatthe designer can program the FPGA on-site. Moreover, usingan FPGA instead of a fixed logic implementation eliminatesthe non-recurring engineering costs and significantly reducestime-to market. The main drawback of FPGAs is that theyare less efficient than application-specific integrated circuitsdue to the added circuitry needed to make them reconfigurable.In recent years, however, much of the focus has shifted toimproving the energy efficiency. This shift is due to processscaling and increased demand for low-power applications.Although process scaling reduces the energy needed toperform a given computation (since wires and transistors aresmaller), it increases power dissipation per unit area andtherefore the overall power for a given die size. At the sametime, demand for low-power applications is increasing due tothe proliferation of hand-held devices and increasing energycosts. For hand-held and other battery operated devices,reducing power increases battery life.This report, in the next sections briefly cover the twomain types of power consumption in the FPGAs along withvarious techniques to reduce the power consumption. Thepower reduction technique that will be explained in moredetails will be regarding reducing glitches in the FPGAs. Inthe end providing the efficiency obtained from the describedtechniques.II. POWER REDUCTION TECHNIQUESThe two main types of power consumption in FPGAs arestatic power consumption and dynamic power consumption:Static power consumption: Large amount of configurationmemory in the FPGA to control every configurable logic blockin the FPGA . Each configuration bit dissipates static power.The configuration logic blocks are implemented using lookuptables(LUTs), which have significantly many transistors. Dueto such high number of transistors the static power leakage isquite high.Dynamic power consumption: Large number ofprogrammable switches. These switches significantlyincrease the parasitic capacitance on the wire segments andcharging-discharging this parasitic capacitance consumesdynamic power. Glitches comprises one-fourth of totaldynamic powerThere are a number of ways to reduce power in FPGAs.Dynamic power of the core of the FPGAs can be reduced bydecreasing the supply voltage because dynamic power has aquadratic relationship (CV 2f) with the supply voltage. Onarchitecture-level by using embedded memories, adders, andmultipliers. Its implementation as a fixed-function embeddedblock is more power-efficient since circuitry to make it flexibleis not needed, and it can be turned off when not used.Other methods include Clock-aware placement, Drowsy modewhich are described briefly below. For Glich reduction – delayinsertion, and dont cares methods are used which are describedin later part of this report.A. Clock-aware placementNew FPGAs are sophisticated enough to implement largesystem-level applications. These applications often have manyclock domains. These clock networks have a significant impacton power since they connect to each flip-flop on the FPGAand toggle every clock cycle. Combining these clocks toan efficient clock network, which in-turn reduces area andreduces power dissipation 2. Clock gating can be used toreduce dynamic power consumption by disabling the clock forthe inactive regions to prevent signal transitions.Using clockgating technique in Xilinx 7 series devices dynamic powerconsumption can be reduced from 10% to 80% 4.B. Drowsy modeFig. 1. Drowsy modeThis method provides the ability to connect to two supplyvoltages VDDH and VDDL, a high and low supply voltageas seen in figure 1. The flexibility to connect to either of thesupply voltages is provided by two header PMOS devices thatare controlled by two control signals. When the memory bitis operating at the low supply voltage, the bit will consumeless leakage power since leakage power is proportional tothe supply voltage, while the cell still retains the stored data3. This is called drowsy mode. This can also be referredas partially sleeping or standby mode. Drowsy mode mainlytarget the cache memories that have variable latencies anddynamic data placement. FPGA embedded memories havedifferent characteristics when compared to cache memories.FPGA embedded memory accesses are statically scheduledand the data is stored statically. And drowsy mode does notfully turn off transistors so it does not reduce leakage power asmuch but it preserves data. Due to this it is observed that thedrowsy mode scheme offers only 10% static leakage powersavings.III. GLITCHES AND GLITCH REDUCTION TECHNIQUESGlitching occurs when values at the inputs of a LUT toggleat different times due to uneven propagation delays of thosesignals. If the arrival times are far enough apart, spurioustransitions can be produced at the LUT output. Glitches canoccur multiple times during a clock cycle. The amount ofglitching is greater in circuits with many levels of logic,uneven routing delays, and exclusive-or (XOR) logic . Glitchesdo not adversely affect the functionality of a synchronouscircuit as they settle before the next clock edge, but theyhave a significant effect on power consumption. Glitch powercomprises an average of 26.0% of total dynamic powerA. Glitch reduction using delay insertionGlitches can be reduced by adding programmable delayelements to the configurable logic blocks (CLBs) in the FPGA.These delay elements programmably align the arrival times ofearly-arriving signals at the inputs of the lookup tables (LUTs)to prevent the generation of glitches. The delay elements alsobehave as filters that eliminate other glitches generated byupstream logic or offchip circuitry. Since it is applied afterrouting, this implementation requires little or no modificationsto the FPGA routing architecture or CAD flow. Furthermore,it can be combined with other low-power techniques.In theory, this offers the potential to eliminate all glitchingin FPGAs, thereby saving significant amounts of power. Inpractice, however, we must trade-off the power saved withthe area, and speed overhead incurred by the additional circuitryrequired to implement it. Fortunately, the impact oncircuit speed is not significant (other than increased parasiticcapacitance) because only the early arriving signals need tobe delayed. However, the programmable delay elements doconsume chip area so we should expect a modest increase inthe area of the device 1.Fig. 2. Delaying early-arriving signal to remove glitch.The technique is shown in Figure 2; by delaying inputc, the output glitch can be eliminated. Note that the overallcritical-path of the circuit is not increased since only theearly-arriving inputs are delayed.Implementation of such technique in the configurable logicblock can be seen in the figure 3 below:Fig. 3. Delay elements at LUT inputs of CLB.Here, the LUTs and FFs (flip fops) are paired togetherinto Basic Logic Elements (BLEs). Three parameters are usedto describe a CLB: I specifies the number of input pins, Nspecifies the number of BLEs and output pins, and K specifiesthe size of the LUTs. The local interconnect allows each BLEinput to choose from any of the I CLB inputs and N BLEoutputs. Each BLE output drives a CLB output 1.In the scheme considered here the delay elements areinserted at the input of BLEs as seen in figure 3. This architectureallows each LUT input to be delayed independently. Morethe number of delay elements more the reduction in glitchesbut slight increase in area and overhead. Using this technique91.8% of the glitches can be eliminated with overall powersavings of 18.2%.B. Glitch reduction using don’t caresDon’t cares are entries in the truth table where a LUTsoutput can be set as either logic-0 or logic-1 without affectingthe correctness of the circuit. This optimization has zero costin terms of area and delay, and can be executed after timingclosure is completed. Glitch power is reduced by up to 49.0%,while total dynamic power is reduced by up to 12.5%.IV. CONCLUSIONSignificant improvements have been made to improve powerand energy efficiency of FPGAs. Power management in FPGAswill be mandatory to ensure correct functionality, providehigh reliability, and to reduce packaging costs. Furthermore,lower power is needed if FPGAs are to be a viable alternativeto ASICs in low-power applications, such as battery-poweredelectronics. An example, FPGAs can be used as coprocessorsto perform compute intensive tasks more efficiently than insoftware. Because it is flexible, the hardware implementationof the coprocessor can be optimized for the given task andeven for specific input parameters such as media format.This report summarizes the different works that have beencarried out and various techniques used at different FPGAlevels to reduce the power consumption of FPGAs namelyclock-aware placement, drowsy mode, delay insertion, don’tcares.In the delay insertion method at the circuit level, addingprogrammable delay elements to the CLB architecture to alignthe edges of each LUT input, thereby preventing formation ofglitches on the LUT outputs. The delay elements can alsofilter some glitches produced by the upstream logic. Usingthis technique 18.2% overall dynamic power reduction can beobtained compared to 12.5% from the don’t care method.