In today’s technological landscape, the demand for high performance computing continues to soar, driven by advances in artificial intelligence, data analytics, and complex simulations. As processors become more powerful, there has been a rise in the power draw of the servers. To increase the compute capability of the data centers, there is a constant push to increase the rack density in data centers. The power levels are set to exceed 70kW per rack [1] for AI and HPC workloads. Traditional air-cooling methods are often inadequate to manage the heat generated with such high-density server configurations, leading to concerns over energy consumption, operational cost, and equipment reliability. To reduce the operating costs and improve the power usage effectiveness (PUE) of the data centers, there is an increasing interest in 100% liquid cooled server architectures.
While liquid cooling offers significant advantages for data center efficiency and performance, it also presents several design challenges that must be addressed. Architecting and designing a liquid cooling system is inherently more complex than traditional air cooling. It involves careful planning at the rack and server level to ensure efficient integration, optimum performance, and ensure future scalability. This article discusses the critical design considerations and methodologies for architecting a robust direct-to-chip (DLC) liquid cooled design, highlighting its potential for modern data centers.
Why is cooling needed in servers?
High compute applications require processing of large datasets at incredible speeds. It’s critical to ensure that the data can be ingested, processed, and analyzed in real-time for faster training and inference times. To achieve maximum performance for HPC servers, each component must operate at its highest potential. Even small underperformance of a component of the server can create bottlenecks, slowing down the entire system.
Since these components are extremely sensitive to temperature, they are designed to undergo performance throttling at high temperatures to prevent permanent damage. This throttling is detrimental for applications and workloads that demand high reliability, as the quality of service cannot be maintained and the downtime can result in a significant revenue impact. Well-crafted thermal management solutions ensure that the temperature of every component of the server is maintained below the limit without excessive throttling. Thoughtfully engineered thermal management solutions are pivotal in ensuring that data centers can deliver consistent, reliable service in an ever-demanding digital landscape.
What is liquid cooling?
Liquid cooling uses a liquid coolant to absorb and dissipate heat from data center components. Liquids can carry away heat more efficiently due to their higher heat capacity. When analyzing liquid cooling options for servers, there are essentially two main categories of liquid cooling – direct-to-chip liquid cooling (DLC) and immersion liquid cooling [2].
Direct-to-Chip liquid cooling (DLC): This method of cooling requires delivering the liquid coolant directly to the critical components of a server with a cold plate placed directly on the chip. The electric components are never in direct contact with the coolant [2]. Immersion liquid cooling: This cooling approach uses a dielectric fluid that is in direct contact with IT components. Servers are fully or partially immersed in this non-electrically conductive liquid within the chassis to extract heat directly from components [2]. Table 1 compares traditional air cooling and direct liquid cooling technologies for thermal management in servers.
Parameter | Traditional Air Cooling | Direct Liquid Cooling |
---|---|---|
Initial cost | Lower | Higher |
Operational cost | Higher | Lower |
Cooling efficiency | Moderate | Higher |
Serviceability | Easier, less specialized | Requires special considerations |
Sustainability | Moderate — higher carbon footprint | High — lower carbon footprint |
Rack density | Lower, risk of hotspots at high density | High, can support higher densities |
Noise level | Higher due to fans | Lower due to fewer/no fans |
Table 1: Comparison of traditional air cooling and direct liquid cooling
How does direct liquid cooling (DLC) work?
Direct Liquid-to-chip cooling has emerged as one of the most cost-effective and efficient methods for cooling high-density servers [3]. DLC involves circulating a coolant through a cold plate directly over the server’s components to cool them. The coolant extracts the heat from the components and is then circulated away from the server to be cooled by a heat exchanger or expelled from the system. Figure 1 shows an overview of the cooling loops involved in a DLC architecture at a data center level.
Heat transfer using DLC follows the following sequence:
- Heat dissipated by high power components is conducted into the cold plates directly attached to these components.
- Liquid coolant that is circulated through these cold plates absorbs the heat from the cold plates.
- The coolant distribution unit (CDU) helps to transfer the heat from the coolant into a heat exchanger, either using air or liquid as the cooling medium.
- The cold coolant is sent through the server cooling loop again and the cycle continues.
DLC allows thermal engineers to take a targeted approach by circulating the coolant directly to the critical components, e.g. GPU and CPU in high compute servers, creating a more efficient, sustainable and scalable solution that is ideal for AI and HPC workloads.
What are the heat loads inside the server?
High-compute servers are a critical piece of hardware designed to support the next generation technologies like AI and machine learning. This hardware realizes the potential of these technologies by providing the necessary computational resources to process and analyze vast amounts of data efficiently.
Before diving into the details of the thermal design considerations, let’s understand the components of a typical high compute server. As shown in Figure 2, a typical server will consist of: a) Compute node; b) Power node; and c) PCIe (Networking and storage) node.
Compute node: This node consists of the processing units of the servers including Graphical Processing Units (GPUs), Central Processing Units (CPUs) and memory. This is the major power consuming node and typically contributes about 85% of the server heat load. In modern AI servers, this node can account for 6.5kW of power.
Power node: This node consists of the power conversion and distribution circuitry. There are losses involved when power is stepped up or down and distributed across the components. This node contributes about 5% of the server heat load which can account for close to 0.4kW in modern AI servers.
PCIe node: This node has the networking devices, high speed switches, and the memory storage devices used for high-speed data transfers. This node typically consists of PCIe form factor devices and contributes about 10% of the server heat load which can account for 0.8kW in modern AI servers.
What are the key considerations when architecting a DLC server?
Some key considerations while architecting a 100% liquid cooled server include:
1) Understand the flow rate capability of the rack: In a closed loop system, there is a limit to how much heat can be dissipated for a given flow rate and approach temperature, and it is driven by the CDU and heat exchanger capabilities at the rack level. This limit is typically expressed in terms of LPM/kW. A common industry standard that many CDUs available today can meet is 1.2LPM/kW for 45ºC inlet temperature [4]. This means that for an 85kW rack, the CDU and heat exchanger should be able to support 102LPM of flow and cool the liquid to 45ºC. Use this to estimate the maximum flow rate that can be made available to each server in the rack for a given inlet coolant temperature.
2) Estimate the flow rate requirements for the server: The flow rate capability at the rack level can be scaled to the server level. For a rack capable of delivering 1.2LPM/kW, a 10kW server should be able to receive 12LPM of flow. Use this as a starting point to estimate a reasonable flow rate that can be made available to each server. This flow rate guidance also ensures that the coolant temperature rise (inlet-outlet) across the server is maintained at 12ºC.
3) Estimate the rack level pressure budget: The rack level pressure drop is the summation of the pressure drop from the servers and rack level coolant hoses and fittings. For high density racks, a conservative pressure drop number from rack level hoses and fittings (separate from server pressure drop) is ~34 kPa (~5psi). The total rack level pressure drop budget will depend on the total flow that is needed to support the desired rack density and the CDU-pump capability.
E.g. Consider a rack which draws 80kW power. Based on the 1.2LPM/kW flow rate guidance, this rack will need 96 LPM of flow. For a CDU with the PQ characteristic shown in Figure 3, the rack level pressure budget must be less than 18 psi (124 kPa) to be able to achieve the desired flow rate. As you can see, “System 2” is the most optimized system. “System 1” will result in under flow and “System 3” will result in overflow.
4) Pressure budget of the server: This involves defining a maximum allowable pressure budget for the server liquid loop. A rack includes multiple servers in parallel with each other. The rack level pressure budget can be used to establish a server level pressure budget.
Table 2 highlights the rack/tray level thermal targets based on the example discussed above.
Rack power | 80 kW |
Tinlet | 45 °C |
Fluid | PG-25 |
Rack flow rate (1.2 L /min · kW) |
96 L/min |
Number of servers | 8 |
Flow rate per server | 12 L/min |
Rack pressure-drop budget | 18 psi (125.1 kPa) |
Rack equipment pressure drop | 5 psi (34.5 kPa) |
Server pressure-drop budget | 13 psi (89.6 kPa) |
Table 2: Rack/Tray thermal targets
5) Prioritize the critical processing units: The DLC architecture should be designed with a conscious effort to provide the maximum possible cooling to the processing units (GPUs and CPUs) in the compute node. The processing units should be the limiting components, meaning that the flow rate requirements and the total server pressure drop of the servers should be driven by these components.
6) Serviceability: The DLC architecture should allow easy access to field replaceable units (FRUs) on the server. The architecture should also allow easy assembly and disassembly of FRUs from the coolant loop. A common way to achieve this is by using universal quick disconnects (UQDs) on the cooling loops.
7) Integration with the rack: The server cooling loop must plug into the rack manifold using a hand mate or a blind mate connection. In either case, it is necessary to use a UQD for ease of assembly and servicing. These QDs have a very high pressure drop as the entire server flow rate must go through them. A careful selection of these QDs is critical to meet existing requirements and ensure future scalability.
8) Maintenance: A well thought out bubble-shedding routine should be implemented to remove air trapped in the cooling loops. This routine should be run periodically and especially after servicing.
9) Scalability for future generations: A DLC architecture is a major design investment, and it should be able to support future generations of products without major design updates. From a cooling perspective, the selection of cooling loop components should be made such that they can support the flow rates for future power levels within the pressure budget. One key consideration is to make sure that the rack plumbing, rack manifold and the QD integration between the server and rack can support future power levels. This ensures that the rack architecture doesn’t need to be updated for each generation.
How to architect the DLC cooling loop inside the server?
One of the key decisions to make while designing the cooling loop inside the server is to decide whether the compute node, power node, and PCIe node should be in series or parallel with each other. Each configuration has its own merits and demerits, and a careful evaluation is needed to decide what’s best for the system.
Series flow: As shown in Figure 4, in a series flow configuration, the coolant flows sequentially through the compute node, followed by power node and then lastly through the PCIe node before exiting the server.
Merits of series flow:
- Allows maximum cooling of the compute node by routing the entire cold coolant flow rate through the compute node.
- If designed properly, it will require fewer fittings, making it cost-effective.
- Reduced leak concern due to fewer connection points.
Demerits of series flow:
- Very high system pressure drop as all the flow goes through each component sequentially.
- There can be challenges in cooling thermally sensitive downstream components due to the higher temperature coolant leaving the compute node.
Parallel flow: In a parallel flow configuration, liquid loops to each node flow parallel with each other. Even within each node, there can be either series or parallel configurations, depending on the requirements and constraints of the system. For example, inside the compute node, the GPU and CPU can be in series with each other to maximize the cooling as shown in Figure 5 or they can be in parallel with each other as shown in Figure 6 if the GPU or CPU/memory must be a field replaceable unit.
Merits of parallel flow:
- Allows for easy access and serviceability of the components.
- The overall server pressure drop is significantly reduced since the flow is distributed in parallel paths.
- Greater design flexibility allowing for optimizing flow rates to each component, based on the cooling requirements.
Demerits of parallel flow:
- Requires many flow paths/hoses, which can make the design complicated.
- Higher initial cost as it will require a system manifold, multiple QDs and other fittings for the cooling loops.
- Requires a thorough flow balancing study that, if not done properly, can result in inadequate cooling for some components.
- Higher potential for leaks due to multiple connection points.
Design for serviceability: Maintaining high server uptime is crucial for ensuring uninterrupted computational capabilities. One key factor that significantly contributes to this goal is on-field serviceability. Modern server designs focus on using pluggable modules, especially for critical components and devices with relatively high failure rates. The liquid loop inside the server should allow for easy assembly and disassembly of these pluggable modules for quick on-field serviceability. The following factors are important when considering serviceability:
- Use of high quality QDs to connect and disconnect the module from the liquid loop without coolant leaks.
- The fittings on the cooling loops should be carefully engineered to account for the entire tolerance stack up to avoid coolant leaks.
- Ensure sufficient hose length to allow the QD to be connected/disconnected without excessive kinking, which can lead to cracking, of the hoses.
- A clear leak detection and containment strategy must be implemented to prevent catastrophic damage to the servers and reduce downtime.
- Depending on the hose diameter, choose an appropriate hose material that can provide required flexibility.
- Ensure that cables don’t restrict the access to pluggable modules.
- Use leak sensors at strategic locations to detect coolant leaks as soon as they happen.
- Provide clear labelling, including flow direction, for serviceable components.
Conclusion
The implementation of liquid cooling in data center architectures represents a significant advancement in thermal management of high-compute servers. A well thought out DLC architecture makes it possible to harness the full potential of the advanced computing resources. The architectural considerations and design philosophy highlighted in this article present a DLC system design approach that can meet the cooling requirements of the present generation while being capable of supporting future scalability without major design updates.
References
[1] “Liquid Cooling Enters the Mainstream in Data Centers.” 2024. Jll.com. 2024. www.jll.com/en-us/insights/liquid-cooling-entersthemainstream-in-data-centers.
[2] “An Introduction to Liquid Cooling in the Data Center.” 2023. Datacenterdynamics.com. March 28, 2023. www.datacenterdynamics.com/en/analysis/an-introduction-to-liquid-cooling-in-the-data-center/.
[3] Meadows, Dave. 2024. “Cooling the AI Blaze: Solutions for Surging Rack Densities in Data Centers.” Stulz-Usa.com. STULZ Air Technology Systems. June 12, 2024. www.blog.stulz-usa.com/cooling-the-ai-blaze-solutions-for-surging-rack-densities-in-datacenters.
[4] Chen, Cheng, Dennis Trieu, Tejas Shah, Allen Guo, Jaylen Cheng, Christopher Chapman, Sukhvinder Kang, et al. n.d. “OCP OAI SYSTEM LIQUID COOLING GUIDELINES.” www.opencompute.org/documents/oai-system-liquid-cooling-guidelines-in-ocptemplate-mar-3-2023-update-pdf.