Heat Sinks

Interaction of the system and module-level thermal phenomena- a flip-chip/BGA example


Fig. 1: Levels in a computer environment.

Introduction

Increasing demands for higher performance in microprocessors have a direct effect on chip power and heat generation. Increased function and miniaturization of the packages result in thermal challenges that require a thorough understanding of the system’s thermal performance under all possible field conditions. Regardless of the magnitude of the power dissipated by the chip (die), a cooler die means better performance (especially in the CMOS technology), increased die reliability and improved package longevity due to a reduction in thermo-mechanical stresses in the package.

In this paper, an overview of the cooling requirements and highlights from the die to the system level will be presented. The interaction of the module level with the system level will be considered in detail via the example of the IBM Tape Ball Grid Array (TBGA) Package. Even though the example is that of the Tape Ball Grid Array, any other package such as the PBGA (Plastic Ball Grid Array) or the CBGA (Ceramic Ball Grid Array) could be used to illustrate and elucidate the concepts. Numerical as well as experimental data on the thermal performance of the TBGA package will be used to illustrate the interaction of the module level with the system level cooling. It is also the intention of the paper to show how commonly-used parameters and tests for comparing different packages could lead to misinformation, and that finally the performance of the package and the chip in the system is the most relevant parameter upon which a designer should focus.

Figure 1 shows the different levels in the computer system, such as the chip, the module (package), the board and the system. The cooling/heat transfer considerations at each level are unique and are described as follows. At the chip level, heat transfer is by conduction. The thermal resistance (henceforth referred to as the resistance alone) from the junction to the chip is of importance. Although the temperature rise from the junction to the chip is usually not too large, it cannot be neglected in some very high powered chips. At the first (module/package) level, again the mechanism is mainly conduction of heat in solids. The important considerations at this level are the chip/module power dissipation and the module construction – its geometry and material properties. Depending on the complexity of the package and the boundary conditions at the boundary of the package; the solution techniques for characterizing thermal performance could range from analytical closed form solutions, solutions of a simple resistance network using the electrical/thermal analog, to numerical (finite difference/finite element) methods.

At the second level (card/board level), the heat transfer mechanism is mainly by convection. The thermal considerations are component (package) geometry and location on the board, flow type (laminar or turbulent) and flow rate, flow distribution and package flow impedance. The governing equations (Navier Stokes) are non-linear and simple solutions are not amenable. Either CFD (computational fluid dynamics) or empirical correlations must be used.

The system level considerations are the ambient environment, namely the temperature, altitude and humidity; blower/fan/pump selection considerations such as capacity, physical size, acoustic noise and location in the machine; and human factors such as acoustic noise, casing temperature and grill locations. The heat transfer analysis is typically restricted to a simple energy balance at this level.

It can be seen that the considerations at different levels are quite different and varied, yet there is considerable interaction between the various levels. As an example, system level considerations determine the air-flow near a component or a board thus affecting heat transfer boundary condition at the board or module level. This in turn affects the chip thermal performance.

For an overview of thermal management in electronic packages, the reader is referred to Andros and Sammakia (Ref. 1) and Bar-Cohen and Kraus (Ref. 2). To predict the junction temperature of electronic component packages it has been customary to use a simple equation to calculate the temperature. While the use of the equation is straight-forward, estimating the values of the variables can range from textbook calculations, to computer model simulation, to experimental measurements and requires an expert in the field of heat transfer/fluid flow.

As mentioned earlier, the goal is to achieve a cooler chip junction. For CMOS technology, the die performance and reliability is directly dependent upon the die junction temperature, which must be kept below a certain limit. It is therefore extremely important to understand how to estimate the junction temperature of the die and understand the factors at different levels that affect it.

Junction Temperature Prediction

The chip junction temperature in an actual computer environment can be predicted using –

Tj = Tambient + dTairheating + dTcase-air +dTj-case (1)

This equation states that the junction temperature rise above ambient temperature comprises the rise due to air heat-up, rise from the local bulk air near the heat sink base or module case (when there is no heat sink) to the heat sink base or module case, respectively (dTcase-air); and the rise from the heat sink base or module case to the junction of the chip (dT j-case). The air heat-up can be caused by heated upstream components and/or the power dissipated by the air-moving device. The temperature rise from the local air to the module case depends on the convection boundary layer which is a comprehensive heat transfer/fluid flow topic in itself. The derivation of equation (1) is schematically shown in Figure 2. As explained earlier, predicting the various temperature rises is the arena of the Heat Transfer Engineer. The prediction can be done either through experimentation or numerical (CFD) modeling, or a combination of both.


Fig. 2: Diagram of the temperatures related to equation (1).

The factors that affect each of the terms in equation (1) are now briefly listed:

Tambient - system level parameters – machine ambient, altitude
dTairheating - system level parameters – air-flow rate, air density, humidity board level parameters – component location, power of upstream components, board thermal properties
dT case-air - system level parameters – air-flow rate – board level parameters – air flow distribution, convection coefficient, board thermal properties – component level parameters – package construction, chip power
dT j-case - system level parameters – air-flow rate – board level parameters – air flow distribution, convection coefficient, board thermal properties – component level parameters – package construction, chip power

The factors affecting the Tambient and dTairheating are somewhat self explanatory. The last two need a little more explanation and the factors may become clearer to the reader after completely reading the paper, especially with the example of the TBGA. For the temperature rise from the case to the air, it is clear that a larger air flow rate will increase the convection coefficient, decreasing this temperature rise. Also, the board construction and the package construction determine what percentage of the chip power will be dissipated directly through the case to the air, thus affecting the case-air temperature rise. For the temperature rise from the case to junction it is easy to see how it is affected by the package construction and chip power. Also, the board details affect the percentage of the heat flowing from the junction to the case. The total air flow rate affects the air flow distribution around the boards because the friction factors and flow impedances are a function of the air flow rate. The air flow distribution, for example, at the top of the module vs. back side of the card affects the percentage heat flowing in a direct path from the junction to the case of the module, in turn affecting the junction to case and the case to air temperature rises.

It is beneficial to define the three commonly used performance parameters dependent upon the temperature rises in equation (1):

Rint = (Tj - Tcase)/Chip Power = dTj-case/Chip Power (2)
Rext = (Tcase – Tair)/Chip Power = dT case-air/Chip Power (3)
Rtotal = Rint + Rext (4)

It is clear that the aim of the thermal designer should be to design for a low Rtotal in the system conditions in which the package will be used. The module and system interactions and their impact on Rint and Rext will now be illustrated with a TBGA example.

IBM Tape Ball Grid Array (TBGA) Package Study

The IBM TBGA offers desirable features such as thin and light construction, flexible custom designs, sizes and lead counts, CTE matching to card for excellent reliability, TCB or flip chip interconnect and excellent electrical and thermal performance capability. In this paper, the computational results for a 40 mm body size TBGA are used to illustrate the system and component interactions. Most of the underlying methodologies and conclusions reached are applicable to other organic packages such as the flip-chip BGA.

The schematic diagram of the TBGA package is shown in Fig. 3. The package size is 40 mm x 40 mm, with 671 solder ball I/Os with a 14.6 mm chip. The card size is 76.2 mm x 76.2 mm x 1.6 mm thick (3″ x 3″ x 0.063″) and has 2 or 1 or 0 copper power planes, 0.036mm (1.4 mils) thick. The chip is attached to a Kapton (trademark of duPont Co.) tape using the C4 (controlled collapse chip connection) technology. The Kapton tape consists of a top copper signal layer, the Kapton dielectric layer and a bottom copper power plane layer. A copper stiffener with an inside window of 19mm is attached to the tape all around the silicon chip. A copper cover plate is attached to the stiffener and the chip using another adhesive for protection and heat sinking. The direct heat path is from the chip through the chip attach adhesive into the cover plate and the air. The indirect path is through the C4 bumps – tape – solder balls out through the card to the air on the back. The direct attachment of the cover plate and the dual paths result in superior cooling capability. This, however, complicates the analysis of this package since the card plays a significant role in the thermal management of the package. This effect is often estimated incorrectly in the determination of the ‘package’ thermal performance for comparing different packages.


Fig. 3: Schematic Cross Section of the Tape Ball Grid Array Package.

Mathematical Formulation

Due to the complexities in the package and the flow, a detailed, extensive numerical model was developed to predict the heat transfer characteristics of the TBGA package. Material properties and the thicknesses of the different blocks are given in Ref. 4.

The air is assumed to be non-participating in thermal radiation, and both the surfaces of the card as well as the cover plate are assumed to be gray and diffuse. The card surfaces and the TBGA cover plate (the module case) are allowed to have radiation exchange with the ambient environment.

Several researchers have proposed and used 2nd order schemes such as the (symbol here) model for computing turbulent flow and heat transfer around electronic packages. The widely used standard -  models are valid for fully turbulent flow such that the Reynolds number, Re > 104, based on hydraulic diameter (Ref. 3). Such Re regimes are rarely encountered in air-cooled electronic packages. Moreover, the -  models involve the solution of two additional coupled equations and are numerically expensive (Ref. 3). The results shown here use a simplified approach for the turbulence model, which is shown to work quite well (Ref. 4). The C4 and the solderball constriction resistances are accounted for in the model. The emissivity of the card surface and the TBGA cover plate were measured by comparing radiation from these surfaces with that from a gray, diffuse surface of known temperature and emissivity.

A detailed grid size study was done to attain a compromise between cost and accuracy. For the turbulence model runs an 81 x 31 x 51 non-uniform grid was used to resolve the sub-layer.

TBGA Thermal Results

Experiments were performed to complete a verification of the model by comparing the numerically obtained temperatures with the experimentally measured values. The comparison was intended to verify that all of the simplifications, material properties selected and boundary conditions used in the model were adequate. The experiments were run at the IBM Advanced Thermal Engineering Laboratory in Endicott, NY. A detailed description of the experimental set up and procedure is given in Ref. 5.

Figure 4 shows the total thermal resistance as a function of chip power dissipation in the range of 1 to 10 W, in natural convection. The experimental results for two power planes is seen to be in excellent agreement with the numerical predictions. Numerical results are compared for a card with no, one and two power planes. The effect of power planes on chip junction temperatures is seen to be very significant. The presence of even a single power plane improves the thermal performance significantly. This clearly shows the effect of the board properties on the total thermal performance. The junction to air temperature rise per unit power (in other words, Rtotal) decreases with an increase in power due to more vigorous convection and higher convective velocities. Radiation heat transfer is a significant factor in enhancing the cooling from packages in the natural convection mode and must be included in an analysis (Ref. 4).


Fig. 4: Experimental comparison of the experimental and CFD results for natural convection. Also shown is the effect of the power planes.

Table 1 (overleaf) shows the Rint and Rtotal for the same TBGA package for a power of 4 W in natural convection and velocities of 1 and 2 m/s. It is seen that the Rint is a very small portion of the Rtotal in all the three flow cases because, the temperature drop from the chip to the case is much smaller than that from junction to air. Therefore, it is most important for the thermal designer to concentrate on lowering the resistance from the case to the ambient by intelligent flow management to improve cooling in this case. Thus, the system air-flow has a profound effect on the Rtotal. Also the Rint is affected by the flow condition, though to a much smaller extent, as seen in Table 1. However, if the extent of flow is changed only on one side of the card, Rint may be affected to a much greater extent by such a change in the flow conditions.

Table 1

Rint and Rtotal for a card with 2 power planes (experimental data), chip power=4W.
Flow Condition Rint (C/W) Rtotal (C/W)
Natural convection 0.31 12.8
1m/s 0.30 9.0
2 m/s 0.33 7.5

Table 2

Rtotal (C/W) cap vs. capless (numerical predictions), chip power = 4 W
Flow Condition Capped No Cap (Bare Die)
Natural convection 12.6 17.1
2 m/s 7.7 10.4

Since the junction to case resistance parameter can change with flow conditions, the thermal designer must evaluate a particular package based on the final application conditions and not merely on comparing the Rint of two packages. The total performance in the system must be evaluated.

CFD simulations of the TBGA package without the cap or cover plate (ie. with bare die) reveal some more interesting characteristics. Since there is no case temperature, ie. the chip itself is the case, the Rint value is 0 or very nearly zero. Thus the thermal designer, when presented with data for Rint for the two packages may be tempted to choose the uncapped module because it has much lower Rint. This may be the case even if the comparison is done using a Thetaj-c comparison, since Thetaj-c measures the resistance from the junction to the case and it will be nearly 0 for the uncapped module and of the mathematical order of 0.1 for the capped module. However, as the results show in Table 2, the junction temperature for the capped module will be lower since the Rtotal is much lower. This is due to the fact that with the cap, the area of dissipation from the module top is 40 x 40 = 1600 mm2 as opposed to 14.6×14.6 = 213 mm2 for the bare die; coupled with the fact that in the capped case, a superior adhesive with high low thermal resistance is used for attaching the cap to the chip.

In another illustrative example, consider two packages in an air-cooling test, i.e. the package mounted on a card in air cooling. From experimental data, an overmolded 54 mm BGA package under similar airflow conditions and card construction, has an Rtotal of approximately 7°C/W compared to the 9°C/W in Table 1 for the TBGA. Looking at these two equivalent data from an aircooling test, the designer may be tempted to choose the one with the lower Rtotal ie. the overmolded package. If it so happens that the system has double-sided components, which are very closely spaced on both sides in arrays, the primary heat transfer path is expected to be from the top of the module. In that case, the TBGA is expected to perform better due to the low resistance path from the chip to the top of the module compared to the overmolded BGA. Thus the designer would have selected a package with a worse performance in the actual system. These examples clearly illustrate that the thermal designer must carefully evaluate the performance of the package in the system in which it will be used. Choices based on the package characteristics or standard air-cooling tests alone may not lead to optimal solutions.

Summary and Conclusions

The effects of the system, board and package parameters on the junction temperatures are explained qualitatively as well as through the examples of BGA packages. The paper shows that the thermal designer must consider the performance of a package in the actual system in which it will be used. Solutions based on the resistances or a combination thereof, or even the performance results from a standard air-cooling test alone, may not lead to optimal performance.

References

1. Andros, F. E. and Sammakia, B.G., (1989), “Thermal Management in Electronic Packaging”, in Principles of Electronics Packaging, Edited by Seraphim, D.P., Lasky, R. and Li C.Y., McGraw Hill, New York.
2. Bar-Cohen A. and Kraus, A.D., (1988), Advances in Thermal Modeling of Electronic Components and systems, Vol. 1, Hemisphere Publishing Corp.
3. Gibson, M.M., Jones W.P and Whitelaw, J.H. (1995), “Turbulence Models for Computational Fluid Dynamics,” Thermofluids Section Report, TF-94-10, Imperial College of Science, Technology and Medicine, U.K.
4. Sathe, S.B. and Sammakia B.G. (1996), “A Numerical Study of the Thermal Performance of a Tape Ball Grid Array (TBGA) Package”, ASME Heat Transfer Development, Vol. 329, pp. 83-93.
5. Sathe, S.B., Kosteva, S.J., Stutzman, R.J. and Sammakia, B.G. (1995), “PowerPC 60X Family of Products: Thermal Management Overview”, IBM Technical Report #TR01.C780.