Can you really simulate an FPGA device?
FPGAs – field-programmable gate arrays – are an incredibly diverse method of extracting multifunctionality from a single piece of silicon. The usefulness of these devices is encouraging a renaissance of their use in military-focused embedded systems as developers scramble to be at the front of the queue for new interoperability contracts in FACE [Future Airborne Capability Environment] and SOSA [Sensor Open System Architecture] systems. Higher-power FPGA [field-programmable gate array] versions, coupled with continuing reduction in transistor size, ensures there is higher criticality to meet the environmental demands set out in VITA 47, but the approach to thermal solutions has remained relatively stagnant for the last 10 years. The newer generation of size, weight, power, and cost (SWaP-C)-optimized systems simply will not allow for thermal margin built in to deal with inaccuracy.
FPGA [field-programmable gate array] structures vary depending on the vendor, but fundamentally they follow the same structure. Basic functional logic elements are connected through programmable interconnections between fixed wires. Figure 1 shows the functional units as gray boxes, I/O elements as white boxes, and wires and programmable interconnects as black lines. The strategic connection of these functional units can replicate larger-scale logic units, such as processors or memory, in an interconnected network on the chip.
[Figure 1 | FPGA structure (Santangelo, 2014)]
In an FPGA most of the delay in the chip comes from the interconnect. Connecting one functional unit to another functional unit in a different part of the chip often requires a connection through many transistors and switch matrices, each of which introduces extra delay (Zeidman, 2006). Figure 2 shows in more detail the level of switching required to determine connectivity between functional blocks.
[Figure 2 | FPGA interconnecting switch method (Zeidman, 2006)]
The sheer programmability of FPGAs implies that more transistors are needed to implement a given logic circuit in comparison with custom ASIC [application-specific integrated circuit] technologies. This leads to a higher power consumption per gate and increased power demand per device (Anderson & Najm, 2004).
FPGA shapes and logic
The configurability of an FPGA is delivered using large amounts of logic collated into fundamental building blocks called CLBs [configurable logic blocks], formed of smaller components: flip-flops and look-up tables. The allocation and control of signals between these blocks drives the functionality of the FPGA and so position and density of active logic is entirely user-dependent. This situation presents a real challenge to accurate thermal simulation of a device, and using a typical junction to case resistance can give wildly inaccurate results as operability is adjusted to each device.
Designing an FPGA architecture from scratch using only these CLBs is extremely labor-intensive due to the high level of functional detail needed for modern computing. When developing logic using CLBs only, the resultant logic is referred to as “soft blocks,” so named because of their high configurability.
“Hard blocks,” by contrast, are embedded functionality on an FPGA that can only be used for a predetermined purpose. Examples of these could be processors, memory blocks, or high-speed transceivers. These are beneficial in that they have optimized routing and increased logic density enabling reduced timing restrictions, while consequentially reducing the configurability of the chip (Weber & Chin, 2006) and significantly increasing the heat flux density in these dedicated areas (Sundararajan, et al., 2006).
When a circuit is implemented, the place-and-route tools place critical logic close together and spread other logic as far as allowed by the circuit constraints (Velusamy, et al., 2005). Logic distribution is typically dependent on, and local to, pinout placement because timing restrictions on certain I/O require minimized interconnection length between pin and active logic. This mirroring is not a perfect prediction, however, as not all logic requires these strict timing restrictions, the die size is typically much smaller than the solder balls, communication with other control logic may need more optimal placement, and physical logic must be available and so can divert routing.
Software such as Intel’s Quarts Prime or Xilinx’s Vivado Design Suite will handle the majority of this floorplanning and can also offer the user the opportunity to prioritize switching performance, thermal performance, or a balance between the two. Unfortunately from a thermal perspective, however, choosing this focus may significantly affect the latency of an FPGA where some high-density logic is critical to functionality of the device, and this option is rarely available.
Chips heat up in any transistor-based switching device, some power will be lost as heat due to inefficiencies in the device or due to the nonzero resistances to current in a gate (Engineering Entropy, 2020). This situation is ubiquitous for all semiconductor architecture and creates a requirement for suitable chip- and system-level cooling.
Accurate thermal management of these devices is critical to maintaining the desired operating lifetime of electronic devices, which is exponentially shortened by increasing temperature (V. Lakshminarayanan & N. Sriraam, 2014). Overengineering a thermal solution, ironically, can have a negative impact on a product by increasing undesirable factors such as mass and cost.
Thermal power dissipation in FPGA CMOS transistor devices can primarily be divided into dynamic and leakage – also known as static – power dissipation. Dynamic losses arise from capacitive charging and discharging of the transistors plus short-circuit power, typically providing the majority of thermal dissipation in an FPGA device. In legacy FPGAs, dynamic power contributed up to 67% of power usage, with static power providing just 22%. In more recent 28 nm devices, static power has increased its dissipation contribution to closer to 40% of the total thermal loss (Intel FPGA, 2018). The lack of knowledge of how exactly leakage power is distributed across the chip leads to highly inaccurate power traces, and therefore unreliable thermal estimation (Amouri, et al., 2013).
The dynamic power varies greatly with design and is characterized in detail through vendor power-estimation tools (Intel’s Powerplay Quartus or Xilinx’s XPower, for example). It is a function of the known logic quantity, switching frequency and toggle rate.
P_dynamic=[1/2 〖CV〗^2+Q_ShortCircuit V]f∙activity ( 1)
Where C is the capacitance of the transistor, V is the power rail voltage, QShortCircuit is the power consumed during a change in the CMOS logic gate, f is the net frequency, and activity, or toggle rate, is the average number of signal transitions relative to a clock rate (%).
Leakage power is as a result of the noninfinite resistance across an inactive gate threshold, and is heavily dependent on the device temperature (Kushwaha, et al., 2018). The following equation describes the exponential relationship between leakage power, Pleak, and temperature, T.
P_leak=P_0×e^((-k)⁄T) ( 2)
Where P0 and k are process dependent constants (Amouri, et al., 2013).
Unlike in an ASIC, logic that is not utilized in an FPGA remains on the device and so remains powered even though not in use, creating a large power demand for a device even with a low logic load. Static power generally does not vary significantly with logic utilization, but is more greatly dependent on the amount of logic on the die (Intel FPGA, 2018) (Tuan & Lai, 2003). The impetus is therefore on the design engineer to select the smallest device for the given application.
While the design engineer should take steps to ensure that the FPGA has been sized correctly for the functionality desired, it is almost impossible to achieve full utilization in a device due the limited supply of programmable routing resources. Generally speaking, a highly utilized FPGA architecture holds around 75% utilization, which is not an unreasonable estimate given the 62% utilization reported (Gayasen, et al., 2004). This report is slightly dated and it is expected that FPGA technology has developed since then. For an extreme example, Xilinx claims the Ultrascale can accommodate utilization of up to 90% (Xilinx Inc., 2015) although this will be entirely dependent on the desired functionality.
Circuit gating is an additional option commonly used to reduce power (Lach, et al., 2004), whereby blocks of unused logic are “turned off” from the voltage rail until needed and so have no static power draw. This technique has been shown to be used by Xilinx (Xilinx Inc., 2015) and Intel (Intel Corp., 2020) on recent devices.
A significant contributing factor to both the dynamic and static power dissipation in an FPGA is the joule heating of the interconnects. Research completed by Shang et al. (Shang, et al., 2002) shows as much as 50% to 70% of the total power dissipated in a Xilinx Virtex-II was from the interconnection network, shown in Figure 3. While the allocation of this loss to the static or dynamic power contribution is not fully determined, it is expected that with the programmable nature of the interconnect switching the majority of this power is concentrated around active logic.
[Figure 3 | The power distribution in a “real” FPGA circuit (Shang, et al., 2002)]
This high dissipation factor is a result of significantly longer interconnect lengths in FPGAs than ASICs due to the larger area consumed by the logic (Anderson & Najm, 2004). Observing the joule heating and electrical resistivity equations we can identify this relationship:
P=I^2 R ( 3)
R=ρL/A ( 4)
where P is power consumed, I is electrical current, R is the wire resistance, ρ is the resistivity of the material, L is the length of the wire and A is the cross-sectional area.
Equations (3) and (4) can then be combined to give a linear relationship between the length of and the power dissipated in the interconnect.
P=I^2 ρL/A ( 5)
Finally, the most power-hungry input on an FPGA will usually be the power rail, often denoted as Vcc. This is understandable because the core power rail drives the logic, the use of which is central to any FPGA design (Intel Corp., 2017).
How should I simulate it?
In good rugged system design, the resistance of the heat sink will be dependent on the cooling requirements of the system, including adjacent thermally critical devices, and be both cost- and mass-efficient. Using a predetermined thermal resistance will not allow for optimization of the thermal solution.
In high-ruggedization environments, such as those set out by VITA-47 ECC4, the rack temperature must be set at +85 °C, meaning the junction temperature must be higher than this.
The typical approach to junction temperature distribution
Considering the unequal and varied distribution of logic within each individual FGPA architecture, there is inherent inaccuracy in assuming the temperature – and therefore power – in the die can be considered with a single value, or applied uniformly across the surface of the die (Intel FPGA, 2018). This effect is likely to be mitigated with the onset of increased leakage power significance on small transistor dies and with the wide distribution of interconnect heating, but there will inevitably be some variance.
Both Intel and Xilinx provide Delphi and detailed IC models of their FPGAs; however, these are calibrated using a uniform heat flux on the die only. Intel reports that this method gives an average accuracy of only 10% for the resistance of the device (Altera Corp., 2012).
Given the variability of factors described above, an accurate thermal simulation can be truly achieved only with a user-guided approach bespoke to each FPGA architecture. Some research (Amouri, et al., 2013) (Velusamy, et al., 2005) (Huang, et al., 2009) has been completed into how to more accurately predict the temperature variation within the silicon. W. Huang et al. (Huang, et al., 2004) proposed a modeling methodology, HotSpot, which divides the FPGA die into discrete blocks. Each of these blocks is assigned a thermal resistance generated from the geometry of the block and material properties of the silicon die, and an assigned thermal power. (Figure 4.)
[Figure 4 | Showing the discrete method Hotspot employs to derive local die temperatures (Huang, et al., 2013)]
This method has shown tremendous accuracy for temperature distribution, but it does not describe the method of identifying power source and its dependency with temperature.
To extract much more applicable thermal information for embedded design, Amouri et al. (Amouri, et al., 2013) describe a process which uses the Hotspot methodology for estimating temperature variation, but iteratively calculates the impact of leakage power distribution across the die using the following inputs available from FPGA development tools:
- Die dimensions (taken from data sheet)
- A floorplan circuit description of the device
- A detailed power report
This method assumes initially that the junction has a uniform temperature and so leakage power is evenly distributed across the die. It then provides Hotspot with power per block information on the given design, which in turn calculates the temperature distribution, which is in turn discretized and fed into a leakage model based on equation (2).
This method is the most complete found in literature research and if implemented correctly will provide a designer with a highly accurate temperature distribution (average error of 1 °C) across a die (Amouri, et al., 2013). Using the discretized power data, this can be applied to the detailed IC package geometry within a CFD package giving genuine high confidence in thermal simulation results.
What’s a thermal engineer to do?
The 10% inaccuracy derived from uniform heat flux thermal models can be significantly improved upon by considering the physical floorplan of each device. Using available software tools and processes, thermal simulation can be tailored not only to a device level, but to an architecture level with a high degree of accuracy. Thermal engineers should be mindful that while test data has shown these iterative simulation studies can provide an impressive 1 °C of accuracy, there is no validation of power figures at the test stage. These results typically use an off-the-shelf heat sink which is applicable with the given development tools, while more complicated heat sink design should be calibrated against the power estimator results.
Until further validation can be provided and quantified for the true power consumption of an FPGA, a conservative solution should always be evaluated. For a Xilinx device, this may be as significant as simulating a device at 48 W for a given 40 W power consumption.
Product developers are stuck between a rock and a hard place. MIL-STD-810 is not going to relax its temperature stipulations, and so until more creative cooling solutions such as ultra-high conductivity chassis and/or VITA 48.8 (Air Flow Through) become commonplace, the +85 °C cold wall will remain the driving factor for outstanding product performance. Conversely, more stable semiconductor compounds look destined for the power-electronics and automotive market, meaning embedded designers are stuck with a thermal runoff just as things become difficult.
Altera Corporation, 2012. Thermal Management for FPGAs. Altera Corporation.
Amouri, A. et al., 2013. Accurate Thermal-Profile Estimation and Validation for FPGA-Mapped Circuits, Karlsruhe: IEEE.
Anderson, J. H. & Najm, F. N., 2004. Power Estimation Techniques for FPGAs. IEEE.
Engineering Entropy, 2020. Engineering Entropy. https://secureservercdn.net/18.104.22.168/nm2.751.myftpupload.com/wp-content/uploads/2020/02/3.-What-actually-is-TDP-and-why-is-it-important-3.pdf?time=1589271825
Gayasen, A. et al., 2004. Reducing Leakage Energy in FPGAs Using Region-Constrained Placement, Monterey..
Huang, W. et al., 2004. HotSpot:ACompact Thermal Modeling Methodology for Early-Stage VLSI Design. IEEE.
Huang, W. et al., 2009. Differentiating the Roles of IR Measurement and Simulation for Power and Temperature-Aware Design, Boston: IEEE.
Intel Corporation, 2017. Understanding and Meeting FPGA Power Requirements. Intel Corp.
Intel Corporation, 2020. Intel Stratix 10 Power Management User Guide. Intel Corporation.
Intel FPGA, 2018. Power Analysis. www.youtube.com/watch?v=8y6M-rmz19I
Intel FPGA, 2018. Thermal Management in Intel Stratix 10 Devices.
Kushwaha, A., Verma, G. & Kakar, V. K., 2018. Thermal Analysis and Modelling of Power Consumption for FPGAs, Paris: International Conference of Advances in Computing and Communication Engineering.
Lach, J., Brandon, J. & Skadron, K., 2004. A General Post-Processing Approach to Leakage Current Reduction in SRAM-based FPGAs. IEEE.
Santangelo, L., 2014. Viv2XDL: a bridge between Vivado and XDL based software, Pisa: University of Pisa.
Shang, L., Kaviani, A. & Bathala, K., 2002. Dynamic Power Consumption in Virtex-II FPGA Family.
Sundararajan, P., Gayasen, A., Vijaykrishnan, N. & Tuan, T., 2006. Thermal Characterization and Optimization in Platform FPGAs. ICCAD.
Tuan, T. & Lai, B., 2003. Leakage Power Analysis of a 90 nm PGA. IEEE.
V. Lakshminarayanan & N. Sriraam, 2014. The Effect of Temperature on the Reliability of Electronic Components, Bangalore: IEEE.
Velusamy, S. et al., 2005. Monitoring Temperature in FPGA based SoCs, San Jose: IEEE.
Weber, J. M. & Chin, M. J., 2006. Using FPGAs with Embedded Processors for Complete Hardware and Software Systems. American Institute of Physics.
Xilinx Inc., 2005. Static Power and the Importance of Realistic Junction Temperature Analysis. Xilinx Inc.
Xilinx Inc., 2015. Proven Power Reduction with Xilinx UltraScale FPGAs. Xilinx Inc.
Zeidman, B., 2006. All about FPGAs. https://www.eetimes.com/all-about-fpgas/
Entropy Electro-Mechanical Solutions