Water cooling takes the heat off dense HPC optimisation

James Hayes

Monday 26 March 2018

Pumping pints of water in and out of live HPC servers to keep them cool might at first sound like a blatant disregard of data centre health and safety rules. Yet this method of heat removal could well represent the future of enabling dense server configurations to run optimally in the densified, future-defined IT centres of the 2020s.

We know that high-performance/hyper-converged infrastructures bring proven benefits to environments that need top-scale compute power. However, such infrastructures also present concomitant operational challenges for data centres, especially regarding heat generation and removal. As more high-performance hardware is fitted closely together in dense computing configurations – and work-loading on operational core processors and other electronic componentry increases – heat generation escalates.

Traditionally, data centres have addressed this by ramping up fan-based cooling and ventilation provision. But Lenovo engineers saw this as an unsustainable way of addressing the challenge of hyper-convergence. After all, more fans need more room, and they bump up a facility’s power consumption. This is particularly detrimental at a time when data centres are under pressure from regulators and customers to go greener.

Although fan-based cooling is still effective, it’s a solution devised for data centre requirements of the past, rather than the hyperscale infrastructures now being built. The use of water as a coolant is not a new idea, but Lenovo saw scope for innovation around the concept. It set to work to discover how the principle could be re-engineered for today’s customers looking to deploy platforms such as supercomputers based on HPC clusters.

Closer to heat, better heat transfer

Lenovo’s direct on-chip water-cooled solution, Water Cool Technology (WCT), gets close to the places where the heat emanates – right inside the server compute tray, right against the components that generate the heat – and carries it away before it gets out of the hardware and raises ambient temperatures through the rack.

WCT was developed for Lenovo NeXtScale System M5 HPC server environments, and is also available for Lenovo ThinkSystem SD650 servers. WCT circulates water at up to 45°C (113°F) into the server’s rear via heat sinks attached to the CPU thermal case, dual in-line memory modules, I/O and other high-heat-producing components. (Fans are not dispensed with entirely; they waft down the servers’ power supplies.)

As it flows past the components and their heat is transferred, the water temperature rises by about 10°C (18°F), dependent on the workload at the time. The water used in Lenovo WCT also contains anti-corrosion and anti-bio agents, and operates in a closed-loop circulation system, with a heat exchanger (and pump) located outside the data centre.

Supercomputing connection

WCT is another example of how Lenovo’s partnerships with supercomputing customers inspire innovative offshoots that later benefit enterprise customers. Working with the Leibniz Supercomputing Centre (LRZ) of the Bavarian Academy of Science and Humanities, Lenovo installed 3,072 Lenovo NeXtScale nx360 M5 compute nodes with WCT. Not only has water cooling proved a highly effective solution for one of Europe’s largest academic data centres, but it’s adoption by LRZ represents a significant show of confidence in the product.

Figures suggest that WCT transfers 85–90 per cent of heat from the systems it is installed into; ambient air conditioning extracts the rest. Water-based cooling systems naturally incur their own CAPEX, but customer reports to date suggest OPEX savings soon repay the investments.

“With the direct water-cooling design, we reduced energy costs by 35 per cent,” reports Dr Herbert Huber, LRZ’s Head of High Performance Systems.

Development work with Lenovo on water-based cooling and heat transfer techniques is ongoing at LRZ. Its next generation of supercomputer – SuperMUC-NG – will have an even higher heat-to-water transfer ratio. This system will also see further equipment cooling breakthroughs with the introduction of Heat Reuse technology. This reuses the expelled heated water from WCT to generate chilled water for the additional cooling equipment that operates in the facility, such as Rear Door Heat Exchangers and air conditioning units.

Bigger energy cost-cuts

The institute’s experience is echoed by another Lenovo partner from the academic supercomputing/HPC sector. The University of Birmingham anticipates even more significant cooling-energy reductions as a result of water-cooling adoption for the latest upgrade to its BlueBEAR HPC cluster platform. BlueBEAR3, which comprises some 100 Lenovo NeXtScale nx360 M5 compute nodes, is expected to cut cooling-energy costs by up to 83 per cent.

“Operational cooling costs are significantly lower,” says Simon Thompson, Research Computing Infrastructure Architect at the University of Birmingham.

The availability of Lenovo’s WCT for successive generations of Lenovo data centre hardware will make it a compelling consideration for data centres as they undertake incremental upgrades of their server estates to keep up with ever-escalating demands. Studies from Hyperion Research say that the combination of server density and liquid cooling has become increasingly popular as HPC data centres are compelled to fit more computing capacity into limited spatial confines.

Building the next-gen data centre

Where traditional and web-scale apps co-exist