Rahima K. Mohammed, Yi Xia, Ying-Feng Pang, Ridvan A. Sahan
Silicon validation platforms are purpose-built to validate processors, chipsets, and ASICs to ensure world-class quality and reliable products. These platforms are a crucial step in the process of releasing the silicon to market. However, due to growing complexities, it is becoming more challenging to design validation platforms for high speed Quick Path Interconnect (QPI™) links based processors for rapidly increasing data and memory bandwidth applications. Typical silicon debug validation platforms are open chassis and need to allow for accessibility to all components, probing, voltage and temperature margining as shown in Figure 1. Due to multiple options of the architecture to be validated for the same program, thermo-mechanical designs may need to support several board configurations. In these platform designs, cost is equal in priority with platform features and program schedule.
The major power dissipating components that need the thermal solution of validation platforms are CPUs, chipsets, memory, ASICs and voltage regulators, as PCIE, graphic cards and power supplies have their own cooling systems. While it is becoming more challenging to remove the high heat dissipation of a multi-core silicon architecture to support the increasing bus speeds in every process generation, the keep-out volume available for a cooling solution is form-factor constrained due to more compact component placements. Elevated chip temperatures cause various problems such as increased leakage, accelerating failure mechanisms and inducing timing failures. Consequently, thermal-related effects are considered as the major roadblocks in the design of next generation microprocessors.
Most of the industrial test standards do not apply to open systems directly. The tests have to be conducted to get necessary data which may show a big difference from the data sheet provided by the vendor due to different boundary conditions. In this article, experimental methodologies of thermal tests for open silicon validation platforms are presented as shown in Figure 2. Component level thermal testing data are correlated with computational fluid dynamics (CFD) simulations to provide guidance for system simulations.
The choice of second level thermal interface materials (TIM) is implemented using the detailed thermo-mechanical test data. Acoustic tests are performed to meet acoustic safety requirements by implementing fan control to minimize system noise while still meeting thermal requirements. These experimental techniques and results have effectively guided the thermal design decisions of silicon debug validation platforms with increased capabilities while meeting budget and schedule. The test methodologies presented in this article are applicable to validation and open platforms and not for OEM systems because typical OEM systems are closed systems with unidirectional airflow.
Thermal Test Validation and Correlation
Commercially available CFD simulation tools allow users to analyze and predict the airflow and temperature within the electronic devices or systems, which reduces the design cycle and prototyping time. However, the accuracy of the simulation results is highly dependent on the reliable input of parameters such as the boundary conditions. Therefore, experimental test data are essential for validating prototype designs that are based on optimized numerical simulations. The majority of the heat sink designs used in open system validation platforms are of the active air-cooled type with a fan directly attached to the heat sink. Hence, a wind tunnel is not applicable to characterize these heat sinks. Instead, the active heat sinks are tested in an open air environment. The overall experimental setup is illustrated in Figure 3 (a). The thermal test vehicle (TTV) was used to simulate the actual package and the heat source. The integrated heat spreader (IHS) of the TTV and the heat sink base were embedded with type T thermocouples. Both thermocouples were connected to a data logger in which the thermocouple readings were collected. Thermal grease was applied between the TTV and the heat sink. One other thermocouple was placed above the fan to capture the local inlet temperature near the heat sink fan. Figure 3 (b) shows the thermal resistance network from the TTV case to ambient.
Each prototype was tested for a range of power dissipation levels. The performance characteristic of the active heat sink is defined by the thermal resistance from case (IHS) to local ambient (Psi_ca). Psi_ca is obtained from the slope of the fitted line between the package case temperature rise (Tc – Tinlet) and different power levels. Psi_ca is used since there is a significant secondary heat flow path to ambient . Figure 4 summarizes the Psi_ca values for heat sink samples. The mean case to ambient thermal resistance for the tested active heat sinks is 0.179°C/W with 3σ of 0.033°C/W.
The actual CPU case temperature can be calculated for certain ambient temperatures and power loss based on the mean plus three sigma of the Psi_ca. The validation approach is to perform a correlation between simulation results and the experimental data. With the test data, the heat sink numerical thermal model can be validated accordingly. The numerical model built using a CFD tool included the details of the active heat sink including heat pipes, the TTV (including the substrate, die, TIM1, IHS), TIM2 between active HS and IHS, the socket, and the test board as shown in Figure 3(c). The heat sink fan was modeled with the fan curve and the air flowing from top to bottom. All the boundaries around the model were open conditions with no duct or chassis.
System level thermal simulation is always more complex due to the number of components in the system and the mutual preheating effect among the components. The airflow in closed systems is usually designed flowing in one direction. However, there is no unidirectional airflow pattern in an open chassis. The system fan is usually designed to cool memory, the passive heat sinks for ASICs and voltage regulators in the system. The CPU and chipset typically have an active heat sink due to their higher power dissipations. With system fan air flowing in one direction while the active heat sink air is impinging through the active heat sink fins, there is no simple way of predicting the airflow pattern and temperatures of the heating components in the system. Thus, numerical modeling provides quick and predictable results for such a complex system. The component level active heat sink thermal model is correlated to experimental results and then the correlated model is used in open system level thermal simulations for studying the mutual effect among the components around the CPU heat sink.
Test Validation of Thermal Interface Material
The thermal interface material (TIM) used in a silicon validation platform should have characteristics such as (i) high thermal conductivity, (ii) ease of rework and application, (iii) electrical inertness, (iv) ability to spread well under pressure, (v) ability to fill air gaps, and (vi) long-term reliability since validation customers need to swap the processors frequently during the validation and debug tests. In addition, a TIM with a lower adhesion force is preferred in order to avoid the processor/chipset being pulled out from the socket when removing the heat sink.
There are many TIM choices available in the market ranging from gap pad fillers to phase change materials (PCM) to thermal greases. Thermal greases generally have high thermal conductivity, are easy to replace, and hence, are recommended for the TIM used on a CPU heat sink for the validation environment [1, 2]. Though data sheets provided by vendors are good reference to evaluate the TIM properties, there is no particular parameter in the data sheet or the test data provided by vendors that specifies the adhesion force of the thermal grease. Therefore, adhesion force test, discussed below, helps in quantifying the adhesive strength of the thermal grease under test. A TIM with the lowest adhesion was selected and evaluated for its thermal performance, then checked for its reliability through thermal performance testing and high temperature testing.
In the validation environment, the adhesive bonding strength of the TIM is critical to support the usage model of silicon insertion/extraction. Once the adhesion force exceeds the retention force of the package, the processor will be pulled out from the socket during heat sink removal. This may cause damage to the package and the socket and eventual damage of the validation boards. The test setup shown in Figure 5 is required to understand how the TIM will behave when subjected to tensile forces during heat sink removal. The grease under test was applied on the TTV. A compressive force was first applied on the thermal grease by tightening down the retention screws on the active heat sink to ensure a good contact with the processor and the heat sink. Then, the heat sink screws were loosened. The breaking load was gradually increased to remove the heat sink until the joint ruptured. A digital force gauge was hooked onto the heat sink to measure the peak force during the heat sink removal.
Thermal performance of the TIM is a critical factor for the overall thermal management of the silicon. The test setup for the thermal performance studies completely followed the experimental setup described in the previous section on component level thermal test validation. The mounting screws for the heat sink were fully tightened at all four corners to ensure a uniform load on the thermal grease. Steps were repeated and the grease was cleaned and re-applied each time a different grease was tested. To ensure our test results were as accurate as possible, tests were repeated for each grease under test.
Reliability of the TIM is critical for providing stable and reliable thermal performance throughout the lifetime of the validation. The TIM used on CPU heat sinks needs to pass the high temperature test to avoid hardening and pump-out issues that could lead to overheating of the processor. In this test, the TTV was subjected to maximum CPU power dissipation and the TIM was subjected to 90°C for 24 hours to study the performance change of the TIM. Note that the adhesion force was measured at room temperature right after being subjected to 90°C for 24 hours. Figure 6 illustrates that the grease’s thermal performance and adhesion force decreased after being subjected to high temperature due to viscosity reduction. Grease F was chosen for the validation environment since it provided the best thermal performance with the least adhesive force as shown in Figure 6.
High cooling power and anopen chassis system for the validation platform drive the selection of high-speed fans causing undesirable acoustic noise. The acoustic data provided by the vendor for an individual fan is collected 1m away from the fan with the background noise of 15dBA. This does not provide the system acoustic data for each validation platform running in the realistic lab environment. Therefore, acoustic experiments are required: i) to select fans ii) to implement fan control, iii) to meet thermal/airflow/safety requirements, and iv) to enhance user experience. The industrial test system safety guideline  states that system noise should be less than 85 dBA at 1m away from a single system.
Experiments were performed at both component and system level to quantify the dominant source of the noise. Noise measurements were recorded in all directions using a handheld digital sound meter for design guidance and are not official tests with the acoustic test chambers. Experiments were repeated to collect data from 0.0762m to 1.016m at the highest noise direction including the background noise. It was found that the system fans contributed to the noise most. Then, fan control experiments were performed to quantify the system noise reduction while still meeting all the thermal requirements. CFD simulations shown in Figure 7(a) indicate that fan speed can be reduced to ~80% to 40%
based on different usage configurations. Fan control experiments
show ~4 to 12 dBA noise reduction as shown in Figure 7(b).
Experimental methodologies of thermal, TIM and acoustics are presented to design the open chassis validation platforms efficiently and reliably while meeting the shorter design cycletime and increasing complexities of the
platforms. Component level thermal testing data are correlated with CFD simulations toprovide guidance for system simulations. The choice of TIM2 was implemented using detailed thermo-mechanical test data. Next, the acoustic tests were performed to meet acoustic safety requirements and to implement fan control to minimize system noise. These tests provided guidance to validate the thermal designs and to drive design decisions before availability of new silicon. This helps in increasing the confidence level of the platforms delivered, and identifying design issues/failure modes not captured by the analyses.
Stern, M.B., Jhoty, G., Kearns, D., and Ong, B., “Measurement of Mechanical Coupling of Non-curing High Performance Thermal Interface Materials,” 22nd IEEE SEMI-THERM Symposium 2006.
Gwinn, J.P., and Webb, R.L., “Performance and Testing of Thermal Interface Materials,” Microelectronics Journal, 34 (3), March 2003, p.215-222.
www.jedec.com, in particular JESD51-12, “Guidelines for Reporting and Using Electronic Package Thermal Information.”
Junkkarinen, J., IEC/UL 61010B-1 “Safety Requirements for Electrical Equipment for Measurement, Control, and Laboratory Use – Part 1: General Requirements.”
Rahima K. Mohammed can be reached at email@example.com.