Thursday, 27 November 2014

Direct cooling options for HPC

IBM / Lenovo direct cool NeXtScale system,
image - IBM
Whilst out at SC14 this year, there was evidence of a number of vendors presenting direct cool options for systems, i.e. taking water directly into a chassis and across the CPU. This is something IBM have done with iDataPlex in the past, but it appears to be gaining traction with other vendors as well. IBM/Lenovo currently have a direct cool option with the NeXtScale product and Supermicro were showing off their direct cool product. The approach is different, varying from copper pipes and drip free connectors to rubber hoses hanging out the front of a chassis, I know which of the two I'd be happier with though... (to be fair, it might not be a final production version).
Supermicro direct cool pipes,
a little filler panel would have been nice!

So what difference does direct cool make?

Well, the systems have no fans in them, on the NeXtScale product the water is taken in, over the CPU, down the memory and then taken to the voltage regulators etc. There are no fans in the system (other than in the chassis PSUs) which means the systems are almost silent. Water is taken into the system at up to 45C (warmer is better at heat recovery due to the thermal properties of water!). This means that there are potentially significant savings in terms of fan power and data centre cooling. Its also a lot quieter (in fact I was at a testing lab for NeXtScale talking to someone running a 30kw rack, and the noise from the Ethernet switch in the adjacent rack was louder). This is actually quite important compared to traditional rear door chiller systems, where the fans are all still present and the thermal recovery is likely to be far less efficient.

We did also pull a NeXtScale system out and it genuinely was completely dry. I did also ask about system service and DIMMs are still user changeable parts, and CPUs need and engineer visit, with the NeXtScale system, you do need to remove all four heat sink slugs as they are linked with copper pipes, but with an engineer, that shouldn't be a problem.

And why might we look at it?

We don't currently have chilled water into our data centre, and we're not installing a new HPC solution, so why would we want to look at direct cool and bringing water into an existing data centre?

Well, part of our service offering is that we can accommodate and manage HPC kit for research groups, they buy the kit and we look after and manage it, our data centre team look after hardware faults etc and we look after software and integration into the scheduler. We do this by providing the core infrastructure for research kit, for example IB ports, management network, rack space even cooling and power - these are all part of the standard service offering (over a certain number of nodes we do ask for a contribution), we also happily accommodate storage attached to the GPFS storage arrays.

We've found this approach to work well for us and in fact about 20% of our HPC cluster is directly funded by research groups.

For us to be able to resource this, we have to work within a certain set of hardware specs, which is mostly based around iDataPlex which is based on Intel Sandy Bridge chips, this is a couple of generations out of date, but we chose not to move to Ivy Bridge - specifically because we schedule based on walltime of a job, so we want it to be repeatable across runs. Having said that, it is an older technology and with Haswell (just about) orderable, we do need to consider what becomes our next standard platform.

Cooling is probably our biggest issue in the data centre now, within reason we have a fair amount of space, but with something like a fully loaded NeXtScale chassis running at up to 7kw (12 "blades"), this effectively limits us at 6U per rack and so space could become a problem. This means for us to continue to be sustainable, we need to think about other options for heat recovery.

So, direct cooling for HPC is something I want to look at in early 2015, to look at the total cost of ownership. I expect that direct cool systems will cost more initially as well as factoring in the costs of hauling in water loops etc, but its something we need to understand compared to air cooling in the data centre.

No comments:

Post a Comment