Heat Sinks

Thermal testing and control by means of built-in temperature sensors

Introduction

This paper will discuss a new approach to the measurement of the thermal properties of PCBs and the packages mounted to them. It is intended to replace traditional solutions, such as using thermocouples and data acquisition equipment. The basic idea of this approach is that the IC chips themselves will be used to measure and acquire temperature data.

It will be demonstrated that only minimal additional circuitry is required in the ICs to achieve this goal. Besides the thermal testing, this additional circuitry is suitable for the on-line thermal monitoring of electronic equipment. The temperature sensor built-into the chip ensures that the temperature of the hottest regions is surveyed and the response time between the overheating and sensing is short. Experimental results demonstrate the viability of these methods.

Conception of a built-in thermal testing system

In the framework of the EC project THERMINIC[1] our research group has dealt for several years with the problems of modifying and completing the chip designs in order to provide a better thermal handling, monitoring and measurability. As an analogy of the Design for Testability (DfT) principle, in a previous paper we proposed that the DfTT = Design for Thermal Testability be applied as part of the general design methodology. Our approach is: to build into the chips some excess circuitry (as little as possible) in order to facilitate both the production testing and the lifetime thermal monitoring.

Based on the previous research and experimental work two different DfTT methods appear to be feasible:

Method 1: a Built-In Temperature Sensor accompanied with its readout circuitry called henceforth BITS.

Method 2: the BITS circuit together with a controllable dissipator constituting a complete Built-In Thermal Tester (BITT).

The natural way to access both circuits is to use the other test circuitry that is present already in the chip. During our experimental work we used the boundary-scan (BS) architecture because it is a standardized and widely known solution.

The key problem central to any thermal testing/monitoring approach is the temperature sensor that is built into the chip. A suitable solution for this sensor will be discussed.

A suitable temperature sensor

Numerous solutions for CMOS compatible temperature sensors for thermal testing purposes have been developed by various groups in the past years.

A promising solution has also been developed at TU Budapest, as a CMOS temperature sensor cell. This circuit is ideal for thermal monitoring purposes and for performing the sensing function in built-in thermal test circuitry.

The sensor circuit consists of 31 small-size MOS transistors. The block diagram of this sensor is shown in Fig.1. The left-hand side of the circuit is a current-output sensor. The I output current decreases as the temperature increases. This operation is based both on the temperature dependence of the threshold voltage and the carrier mobility. The right hand side of the circuit is a current-to-frequency converter, providing a square wave as output signal, the frequency of which is directly related to the temperature.

The experimental realization of this frequency-output temperature sensor, shows strong temperature dependence of the frequency, which can be approximated as;

f = f20Cels·exp( (TCels -20° C))in the -50…+120°C temperature range, where is the sensitivity, f20Cels is the nominal frequency related to T=20°C.

The sensitivity is about = -0.78%/°C. The output frequency of 0.5-1.5 MHz is in the convenient range. The complete circuit requires only an area of 0.018 mm2 using the ECPD10 [2] 1µm CMOS process. Using the AMS 0.8µm process [3] the area consumption is 0.005 mm2.

The frequency output solution holds an explicit advantage, namely that, the output signal can be processed using purely digital circuitry: for example, counters etc. This means that these sensors can be easily implemented into digital circuits.

The low sensitivity of the supply voltage is a remarkable feature: ±0.25 V change in VDD results and only a ±0.28% change in the frequency. The latter corresponds to a ±0.35°C error. The long-term stability has been investigated in a five-month-long experiment. Drift in a definite direction cannot be observed. The total power consumption of the sensor is about 200 µW.

Read-out through the boundary-scan path

If a temperature sensor, like the one described earlier is inserted into a chip design, additional circuitry must be implemented in order to provide access to this sensor. Additional package pins will also be required.

The excess area consumption and especially the need for additional pins is considered by the IC manufacturers as such an important disadvantage that makes questionable the practical use of the BITS solution. Fortunately, the built-in temperature sensors can be combined with other built-in test circuitry, effectively reducing the cost.

The Boundary Scan (BS) architecture is especially suitable for monitoring temperature sensors. In this way it is possible to realize the BITS principle so reducing excess circuitry and the need for excess pins.

The Boundary Scan architecture was developed as a new approach to PCB testing. This architecture has led to a world standard: IEEE Std 1149.1. If the standard BS circuitry is built into all the ICs of a PC board, nail-bed testers can be dispensed with. The wires of the PCB can be observed/controlled in an electronic way, through the BS path. Opens and shorts in the PCB wiring, faulty soldering etc., can be easily detected and localized.

The BS architecture can be adapted to implement additional tests: e.g. the testing of the chip core, launching built-in self-test runs, etc.

The BS circuitry is controlled by a state-machine with 16 states called Test Access Port controllers. Four pins are allocated to the BS: TDI and TDO for the instruction and data flow organized as a serial scan path, TCK is the test clock and TMS is the test mode select signal.

The BS architecture is ideal for incorporating frequency-output temperature sensors. An internal counter of 12-14 bits is required in addition to the usual BS circuitry. Two excess BS instructions must be defined to enable/disable the sensor and to scan-out the temperature data. The block diagram of a realized extension of a BS circuit is shown in Fig. 2. The complete area overhead (shaded blocks) was less than 0.2 mm2 for a 1 µm process.


Fig. 2. Excess blocks required by the BITS in the BS circuitry (colored rectangles)

Examples of the benefits of the proposed method

A very important use of the BITS is the thermal monitoring when (concurrently with the system operation) the chip temperatures are regularly measured. This way the thermal state of a PCB or the entire system can be continuously tested. Dangerous overheating caused by the failure of a fan or an over-dissipation in the electronic circuits can be identified in time. Preventive actions can then be taken, such as reducing the clock frequency, switching-on reserve fans etc.

Fig. 3. Temperature profiles recorded via the BS path (temperature of two packages on the same board)Temperature measurement through the BS path are presented in Fig.3. The PC board was placed in still air. Using only the boundary-scan path the chip temperatures were measured every second. A sequence of 940s duration was plotted. A slow random variation of the temperature inside the chassis is evident, along with the superimposed noise. The amplitude of the latter is about 0.02°C. This meant that the temperature resolution obtained through the experiment was as good as 0.02°C. Besides the on-line thermal testing, the built-in thermal test circuitry is suitable for off-line thermal tests at the system level or at the level of individual packages. These possibilities are discussed later.

The experimental qualification of the cooling of a PCB or an equipment chassis normally requires the attachment of a number of temperature sensors and the use of a multi-channel data acquisition unit. In contrast, if we use ICs within- built temperature sensors, we need only use dedicated software to measure continuously the heat distribution to provide a thermal map of the PCBs. If most of the chips are equipped with BITS a detailed temperature distribution can be acquired. Alternatively, by using only a few chips with BITS in a board a rougher but equally useful image can be obtained.

If, for example, the effectiveness of a cooling fan is in question, measuring the internal chip temperatures with the fan switched on and the fan switched off can be beneficial. Such an experiment is illustrated in Figure 4. Four packages with BITS were mounted on a PCB (Fig.4a). The sensors were read-out via the BS path. The panel was placed into a chassis containing a cooling fan (Fig. 4b).

Fig. 4. Measurement of fan efficiency using BITS

After powering the board we waited until a thermal equilibrium had been reached. Measurement could then be commenced. The fan was then switched on in the time interval between 150 and 790 sec. The resulting temperature measurements are plotted in Fig.5, so allowing the cooling effect of the fan to be evaluated.

Fig. 5. Effect of the fan activity on the internal chip temperatures (chip A=red, B=magenta, C=green, D=blue)

If the PCBs are built with packages equipped with BITS circuitry a number of experimental investigations can be readily undertaken, e.g. the optimization of fan placement, checking the effect of vertical vs. horizontal card position, etc.

The integrity of the chip-to-ambient heat-removal path is very important when reliability issues are taken into account. Small degradations in the heat conduction may result in higher operating temperature, which has a detrimental effect on reliability. An increase in the thermal resistance may indicate the start of a degradation process, e.g. die attach delamination, defects in the contact between the package and the cooling fins, etc.

Steady-state heat removal properties are usually characterized by the Rthja thermal resistance:

Rthja = (TjTa) / Pwhere Tj is the junction temperature, Ta is the ambient temperature and P is the dissipated power. Similarly, the dynamic (transient) thermal properties of the package are characterized by the Zthja(t) thermal step-function response or transient thermal resistance:

Zthja (t) = (Tj(t) – Ta) / P0where Tj(t) is the junction temperature as the function of the time, P0 is the amplitude of a dissipation step-function.

These expressions hold a certain ambiguity in the case of an environment where other power sources are responsible for the rise in temperature. This is true of a PCB, when the temperature rise in a package is influenced by heat dissipation from other packages. These equations can be reformulated in order to avoid this problem,. e.g. for the static thermal resistance we may write:

where is the chip temperature if the chip is powered, is the temperature if the chip is unpowered. This equation is identical to the former one as long as the non-linear effects of the heat transfer can be neglected.

The BITS circuit discussed earlier provides an easier way to measure “in-situ” the heat removing properties of the package. Both the steady state and the transient thermal resistance can be measured if the chip is equipped with the BITS circuitry. Measurement of the and chip temperatures can be performed by using the built-in CMOS sensor and the BS path.

A transient thermal resistance function is presented in Fig.6. The chip was equipped with BITS circuit accessible via BS path. The chip was mounted in a 48 pin ceramic DIL package; the package was soldered into a PCB and was measured in natural convection. Powering was realized by changing the digital control of the core circuit. The response was measured via the BS path. The asymptotic value of the function gives the static thermal resistance, which is about 44 K/W in this case.

Fig. 6. Transient thermal resistance of a ceramic DIL package, measured through the BS path

Measurements can be made ‘in-situ’ which ultimately tells us much about the conditions of heat removal. If the used components incorporate BITS and BITT circuitry the thermal features of all packages can be thoroughly tested, both on the production line and in the field. This test does not require any special equipment and may be performed using software.

The built-in thermal tester circuitry is also suitable for analyzing complex thermal effects, e.g. the thermal couplings between the neighboring packages of the PCB. If the BITT circuitry is controlled in such a way that the PS dissipation of a single selected package is switched on while the Tk temperature rise of the other packages is measured, the coupling factors can be calculated between the one selected and the other packages under consideration.

Conclusions

The paper describes a consistent set of hardware extensions proposed for digital VLSI circuits providing good thermal testability both at component and PCB level. These extensions include the built-in temperature sensor (BITS), the built-in power switch and their interface to the boundary scan test circuit. If all these elements are incorporated, the chip is in effect equipped with the in-built thermal tester (BITT).

The advantages provided by the BITT are:

  • on-line thermal monitoring at the PCB or system level
  • measurement of the temperature very close to the hottest region (namely, the surface of the silicon chip)
  • support for experimental investigation of cooling at chassis level (fan efficiency, dependencies on the PCB placement, etc.),
  • support for static and transient thermal resistance measurements both in production and in the field
  • measurement of the thermal coupling effects at the PCB level.
  •  

For a 20-25mm2 chip, the area required for the BITT circuit is less than 1%. This is more than compensated for by such benefits as better thermal management and the possibility of precise thermal measurements without the need for expensive dedicated equipment. This Design for Thermal Testability approach is highly advisable for ASICs and other ICs if the design is thermally stressed and reliability issues are of primary concern.