Thursday, 27 November 2014

Direct cooling options for HPC

IBM / Lenovo direct cool NeXtScale system,
image - IBM
Whilst out at SC14 this year, there was evidence of a number of vendors presenting direct cool options for systems, i.e. taking water directly into a chassis and across the CPU. This is something IBM have done with iDataPlex in the past, but it appears to be gaining traction with other vendors as well. IBM/Lenovo currently have a direct cool option with the NeXtScale product and Supermicro were showing off their direct cool product. The approach is different, varying from copper pipes and drip free connectors to rubber hoses hanging out the front of a chassis, I know which of the two I'd be happier with though... (to be fair, it might not be a final production version).
Supermicro direct cool pipes,
a little filler panel would have been nice!

So what difference does direct cool make?

Well, the systems have no fans in them, on the NeXtScale product the water is taken in, over the CPU, down the memory and then taken to the voltage regulators etc. There are no fans in the system (other than in the chassis PSUs) which means the systems are almost silent. Water is taken into the system at up to 45C (warmer is better at heat recovery due to the thermal properties of water!). This means that there are potentially significant savings in terms of fan power and data centre cooling. Its also a lot quieter (in fact I was at a testing lab for NeXtScale talking to someone running a 30kw rack, and the noise from the Ethernet switch in the adjacent rack was louder). This is actually quite important compared to traditional rear door chiller systems, where the fans are all still present and the thermal recovery is likely to be far less efficient.

We did also pull a NeXtScale system out and it genuinely was completely dry. I did also ask about system service and DIMMs are still user changeable parts, and CPUs need and engineer visit, with the NeXtScale system, you do need to remove all four heat sink slugs as they are linked with copper pipes, but with an engineer, that shouldn't be a problem.

And why might we look at it?

We don't currently have chilled water into our data centre, and we're not installing a new HPC solution, so why would we want to look at direct cool and bringing water into an existing data centre?

Well, part of our service offering is that we can accommodate and manage HPC kit for research groups, they buy the kit and we look after and manage it, our data centre team look after hardware faults etc and we look after software and integration into the scheduler. We do this by providing the core infrastructure for research kit, for example IB ports, management network, rack space even cooling and power - these are all part of the standard service offering (over a certain number of nodes we do ask for a contribution), we also happily accommodate storage attached to the GPFS storage arrays.

We've found this approach to work well for us and in fact about 20% of our HPC cluster is directly funded by research groups.

For us to be able to resource this, we have to work within a certain set of hardware specs, which is mostly based around iDataPlex which is based on Intel Sandy Bridge chips, this is a couple of generations out of date, but we chose not to move to Ivy Bridge - specifically because we schedule based on walltime of a job, so we want it to be repeatable across runs. Having said that, it is an older technology and with Haswell (just about) orderable, we do need to consider what becomes our next standard platform.

Cooling is probably our biggest issue in the data centre now, within reason we have a fair amount of space, but with something like a fully loaded NeXtScale chassis running at up to 7kw (12 "blades"), this effectively limits us at 6U per rack and so space could become a problem. This means for us to continue to be sustainable, we need to think about other options for heat recovery.

So, direct cooling for HPC is something I want to look at in early 2015, to look at the total cost of ownership. I expect that direct cool systems will cost more initially as well as factoring in the costs of hauling in water loops etc, but its something we need to understand compared to air cooling in the data centre.

Saturday, 22 November 2014

OpenStack and HPC

The OpenStack and HPC BOF was at SC14
I've spent the last week out in New Orleans at SC14, there seemed to be quite a lot of buzz about HPC and OpenStack, as well as a number of vendors there who didn't really get why OpenStack might be applicable to HPC.

From my perspective, there are two clear use cases for OpenStack with HPC, the first is shared capacity bursting - where a pool of resources is setup into an OpenStack environment and then the job scheduler will dynamically spin-up VMs.

The second is the need to be able to have clean HPC systems, i.e. that data residue isn't left on systems (for example some of our commercial research partners or medical data), and so using an OpenStack VM, which is spun up on demand and destroyed afterwards is of great interest.

At Birmingham, we use Adaptive Moab for our HPC scheduler and Adaptive we demonstrating their scheduler OpenStack integration. To be clear, this isn't a replacement for the Nova VM scheduler but is an add-on for the Moab HPC scheduler. Basically it uses Moab triggers to call the plugin, for example we could set a trigger such that if there is 100 jobs queuing then the trigger will fire. As we're building an OpenStack installation for the CLIMB project, there may be times when that resource is quiet and so we could use some of the resource to fulfil our HPC workload.

Obviously we'd want to be careful about the types of job which ran on the OpenStack environment (big MPI jobs would be bad ;-) ), but it looks of great interest! That's not to say it won't be difficult to build the VM image and I have a lot of question about that (like how to get the GPFS file-system inside a VM image), but its something we'd like to look at.

I also attended the BOF on HPC and OpenStack and was surprised by how many people were there and how much interest there was, I expected a dozen or so, but the room was full and more standing at the back, so clearly OpenStack and HPC is of a lot of interest and vendors really need to sit up and listen!

I'm working with the CLIMB project at Birmingham which spans across Cardiff, Swansea and Warwick as well, and I met up with some of the guys from the sysadmin team on the eMedLab project which is similar. Neither of these are doing the schedule VMs from a traditional scheduler approach, but are there to help get predefined pipelines up and running and to aid with teaching and research. We're planning to catch up in a few weeks either at the Machine Evaluation Workshop or a specific project meeting.

What amazes me about CLIMB and eMedLab is how the proposals were independently developed by the PIs, yet the idea and technology is almost identical. As we are so closely aligned, it makes a lot of sense for us to try and work together on this, we also happen to have both selected the same integrator (OCF) for the OpenStack components backed by SystemX hardware.

If you're doing or planning a similar project in the UK (not necessarily BioInformatics either!), then please get in touch and we'll see if we can get something working collaboratively.

GPFS User Forum @SC14

In case you missed it, I was talking about the CLIMB project at the IBM GPFS User Forum held across the road from the SC14 convention centre in New Orleans this week.

I was talking a bit about what the project is about and how we're using GPFS Elastic Storage with OpenStack.

Slides are available from the GPFS User Group web-site.

If you are using Elastic Storage with IBM GPFS technology, then I encourage you to talk at one of the user groups about what you are doing!

Some of the other talks covered multi-site GPFS installations as well as NCAR talking about their monitoring tools which are now open sourced. Certainly worth a look!

Update: I've embedded the slideshow below:

Lack of updates!

I haven't posted for a while, I've been a bit busy at work with one thing and another, and over the last week I've been out of the country at SC14 in New Orleans, before that I also visited Lenovo's new SystemX site in Raleigh.

I've a couple of half written posts from the interim period, I'll try and get them finished off and posted as well.