Wednesday, 27 January 2016

Installing new technology to deliver HPC

January has seen a busy start to the year! And not from any of the major investments we had approved at the end of 2015!
No, this has come from our deployment of new technologies for HPC systems, in the past week or two, we've taken delivery of a direct water cooled HPC rack (well half a rack).
We're not afraid to try out new technology to help deliver our services, whether that is moving to the latest IBM Spectrum Scale release or building cloud services on OpenStack.
This has however probably been our longest new technology development and is entirely a hardware solution. The plan came following a visit to Lenovo's (then just opened) site in Raleigh, NC where the demo'd their direct cooled HPC technology, wind on a few months and we were considering our options for new systems instead of SandyBridge based iDataPlex. Our strategic framework with OCF and Lenovo meant the options were realistically next scale, but a few calculations and an air cooled data centre with no ability to provide rear door heat exchangers and so a 7.5kW meant we were seriously looking at the WCT hardware. The spec of the standard WCT system wasn't what we really wanted, so we approached Lenovo about getting storage into the systems - we use full fat OS deployments and some of our workloads perform significantly better with local data storage. We eventually agreed on getting SSDs into the systems and having gone through this process, we also wanted to add Mellanox 100Gb/s EDR InfiniBand. (Actually, we want some of the cool features on the ConnectX-4 ASIC). This has proven to be a bit more difficult to actually get, but we have EDR switches and cables and will be adding water cooled EDR cards once they've finally been manufactured for us. Of course in a system with no fans to cool other components, there's a lot of testing gone into making sure the cards can be properly cooled.
Realistically, its taken us 9 months from agreeing we were going to go WCT with this kit to getting our first tests on it. Its taken longer than we anticipated (we ordered it last summer!) and we've learnt a lot from the process. We always knew we'd have a 4-6 month build on the facility side work to do planning permission and installation of dry air coolers, but we'd had delays on getting the SSDs, EDR and silly things like the right valves and hoses!
The infrastructure we've installed is designed to be scalable and to integrate with other warm water cooled systems which have been increasing in the market over the past year - looking round the SC15 conference centre, there were a number of options for direct cooled systems, significantly more so than at SC14. So its good to see that the infrastructure we designed back in July will integrate with other hardware platforms from other vendors.
A little tinkering with xcat and some updates to our deployment system and I've finally got most of the compute nodes running today with an Intel LINPACK test on the hardware. Monitoring the load for about 30mins, I managed to get our supply temperature up to ~25C with a return of 29C, so a 4C delta on 15 compute nodes running flat out. This is a lot less than we originally expected. Checking the kit, we were also getting sustained turbo on all cores to 3.0GHz on the 2x 12 core 2.6GHz SKU we have fitted per node.
Really I want to run the water loops much higher than this, depending on the CPU SKU, this could be up to 45C, but I think we'll aim for 40C. Its interesting that the thermal properties of water are funny, and the warmer it is, the better it is at carrying the heat load away with it...
Installation and final commissioning of the rack took place last week when we worked with the Lenovo engineers, our Estates team and their contractor team to balance and air-bleed the water loops. No leaks so far!
Looking to the future, we're talking to a few technology companies on getting early access to some new hardware to support our projects over the next 18 months. We're also always open to looking at what options we have available from a whole range of vendors.

Saturday, 2 January 2016

Looking back at 2015 - research computing

2015 has been a busy year in research computing at Birmingham. The team has gained two new members and one came back full-time into the team. These have been driven by the need to support new initiatives, for example supporting CCB and needs for research data management.

Getting major project developments out the door took up much of my time in 2015, rolling out our research data store and getting over the teething troubles with that. We got delayed a little by some unfortunate building work which cut fibres between our data centres and also an incident with the fire suppression system, but we worked through it all and got the solution out of the door along with our Sync'n'share solution based on PowerFolder.

With the research data store going live, we also needed to focus on how people move large volumes of data to it, and have been deploying a specialist fabric based network using Brocade VDX technology. This is pretty neat as it provides full layer 1 balancing of multiple links, which we can build up as we need additional bandwidth. Getting the final connections to our campus core took a little longer than anticipated but we got there and now have the network in two buildings for data generating equipment and we've already placed an order to deliver more locations in early 2016. As usual, things like getting access to wiring centres, or faulty building fibre have given us a little trouble, but we're now progressing well with this. Some aspects of the firewall/router product we purchased have caused us a number of headaches and more work than we'd have liked, but we're working through this with the supplier who seem engaged on getting us a fully working and reliable solution in early 2016.

I'll post a separate update on research clouds, but this has also taken a chunk of time, but also a number of opportunities, for example speaking at events, helping to develop strategies and being invited to join the RCUK working group on cloud to help identify and deliver cloud (public, private, hybrid) into supporting research in the UK.

Late 2015 has seen discussion come back about new data centres to help support the increasing demands of research computing, we'll hopefully hear in early 2016 if we'll get funding to develop this further. We did hear that the University is planning to support life-sciences with new storage and compute with a major investment, so we'll have a busy time designing and building this in 2016 as well as recruiting staff to support the investment!

Towards the end of the year, we took a major maintenance window of the HPC service. Although we weren't replacing anything, we will be adding new hardware in early 2016 and we needed to do some major upgrade work on the scheduler and resource manager in preparation for this as well as replacing our core network switches to enable us to scale out more easily. We also generally take this opportunity to apply firmware updates to all research infrastructure and of course did the Spectrum Scale 4.2 upgrade on our GPFS clusters!

We've also been busy working out how we can support the CLIMB project better, and we've now secured funding for a member of staff to join the team to help with this, so watch out in early January as we'll be recruiting a cloud computing person to help deliver the private research cloud.

Looking back at 2015 - SC15, Austin

I was lucky enough to get out to the SC series of conferences again, this time SC15 in Austin Texas. I was pretty late in getting my act together to book to go again and struggled to find a hotel that was within budget, eventually I managed to get one sorted so I was all cleared to go.

I knew I was traveling on the same flight as someone I knew from NOC, so it was no surprise that I bumped into them at Manchester airport. We didn't expect to bump into a couple of the guys from OCF who were travelling a different route. Connecting in Georgia was pretty tight, but we both just made the gate as it was boarding and safely on to Austin!

Although it was pretty late UK time, we agreed to meet up for dinner in downtown Austin to help with the jet lag ... we found a couple of times in the week that there is clearly a strange attraction to the same places for UK folk - with over 12000 attendees at the SC conference, we managed to pick a restaurant and table next to the people I know at Cardiff working on the CLIMB project with me! (we did the same later as we went for a post-dinner pint and bumped into the Oxford e-research centre guys as well as the UK Lenovo people ... entirely coincidentally!).

Sunday morning didn't have anything planned in for the conference, so I again met Tom from NOC for morning coffee - as usual earlier than we wanted due to the joys of jet lag! Followed by a wander through downtown Austin and the Capitol area. I quite like exploring US cities early on Sunday mornings as they are usually pretty quiet, so its a fairly relaxing time to investigate. Whilst I found Austin to be a fairly chilled and welcoming city, it was interesting to check out it's confederate history and the statues around the Capitol.

Sunday was also the day of the Spectrum Scale user group, and as chair of the UK group, and it being the first SC event organised with the user group, I met up early with the US principal and co-principal (the first time we'd all met) along with a couple of the IBM people helping to run the event. The user group itself was really great, and it was good to see so many people there - we'd asked IBM for a bigger room than the one they ran their group in at New Orleans, and they delivered, it was also pretty full as well! There were some great talks from both IBM and from the user community, particularly interesting was the talk from Travelport on the monitoring tools they are using to keep an eye on their GPFS deployments.

Monday morning was the workshop on Parallel Data Storage, this opened with an interesting talk on holographic storage. The guys from Akonia Holographics are developing an interesting looking holographic worm drive with a tape like form factor, the theory being it will eventually retro-fit into existing tape libraries. An interesting idea, but a few years from being a product and how it will fit into the storage market then ... well that's anybody's guess, but with the advent of software defined solutions and cloud storage, maybe it will be less relevant. I also bumped into Dean Hildebrand from IBM there and we had a discussion about various projects we've been working on (Dean and Bill from IBM have been a great help with getting CLIMB up and running on Spectrum Scale!), he showed me a few things from the research he's been doing as well which was interesting.

Monday evening was the gala opening of the conference trade show hall which was as usual massive!  I think Austin convention centre was a little smaller than some of the other locations used by SC, and I'm not sure the two halls really worked well, with a lot of smaller and maybe quite specialised exhibitors in a small hall off the main hall.

On Tuesday I'd been invited by Lenovo to talk on their show floor stand about some of the projects I've been working on, so I had a 15 minute slot their to talk about CLIMB and the warm water cooling systems we've been trying to get installed (long story there ... for another blog post when we get it going!). I was a little worried no-one would be interested, but Lenovo had a great guy working the stand stage and getting people to sit down, if a little difficult to follow his act ;-)

Much of my remaining time was taken up with planned and ad-hoc meetings with vendors and technology companies, a lot of which is under NDA. Some of the companies I talked to include partners we've been working with (e.g. Lenovo, Mellanox, Dell, Adaptive Computing) as well as others who have interesting technology (e.g. Bull, HPE, SpectraLogic, Boston/Supermicro).

There were a number of stands this year focussing on cooling technology, with a lot looking at warm water cooled technology. There are now a lot more vendors focussing on this as a way of cooling systems either via their own solutions or after-market accredited options (which is essential for maintaining warranty!).

I was maybe a little cheeky on the Seagate stand, knowing a little bit about GPFS, I went to find out about their new ClusterStor appliance and had some quite complex questions. They did find me someone on the stand who did actually seem to know his stuff so it all worked out in the end there!

A few other things that were "launched" at SC15 worthy of note, DDN had their SFA14k storage array. Spectra with their Artic Blue (technically launched a few weeks before) which is a cold-storage shingle array with disk spin down. Bull announced Sequana, their exascale HPC solution. Adaptive computing also had torque 6 which has proper NUMA support finally and they spent some time going through the features of this with me, if it works properly as they explained they its pretty neat.

As well as vendor meetings, there were a number of ad-hoc meetings arranged out there, including meeting up with a couple of the CLIMB guys to work out a mini plan for how we are going to progress some of the issues on the project.

In addition to the trade show, there's a full programme of talks running alongside, with invited keynote speakers as well as sometimes quite esoteric papers on specific research projects. Alan Alda was one of the keynotes who spoke on the importance of being able to explain science to non-scientific audiences and he had a great video to demonstrate some of the work he'd been involved with on this. I guess one workshop I went to that did disappoint was the one billed as CEPH and HPC  ... which wasn't really people talking about CEPH and HPC, more how they were using CEPH alongside their HPC solutions (e.g. as a backup target).

I'd have liked to have gone to the session on building HPC applications, but it clashed with a meeting I had, though as with many of these events, the slides get posted online, so I've managed to catch up on these as this is something we really need to deploy at Birmingham to help with our software management.

One thing I think the conference missed this year was the early evening keynote talks which I went to a few in New Orleans. But maybe they were just on topics I wasn't overly interested in. Though I do like to get to some talks on topics that aren't obviously my thing as it helps with awareness on who we should be engaging with in our academic communities to see if we have services that can help empower their research.

The conference technical programme wraps up with a big social event on the Thursday evening, and this year it was held at the home of Texas Longhorns. I'm always amazed at the size of the football stadiums, particularly bearing in mind this is a University football team!

Friday, 1 January 2016

Looking back at 2015 - IBM Spectrum Scale

Well, 2016 has begun, so I thought I'd take a look back at a year with Spectrum Scale. Actually, at the start of 2015, Spectrum Scale wasn't even a term we were using - GPFS or Elastic Storage were the names we all had in our heads. I've heard various rumours about if Elastic Storage was ever meant to be used generally, if it was for the ESS server only, or the code name for the rebranding we eventually came to know as part of the Spectrum portfolio of products. Still, back at the start of 2015 things were different, the System X and GSS Lenovo divestiture was only just completing in the UK and protocol support for non native GPFS access was just an idea on a roadmap (unless you bought SONAS).

Shuffling back a few months in 2014, IBM had just published a white paper on using Object on GPFS, it wasn't supported back then but there were guidelines on how to do it. In fact, it wasn't until the May 2015 release of 4.1.1 that non-native GPFS protocol access was with us. This actually delayed one of my projects by a couple of weeks - we were about to go into pilot with a research data store built on Spectrum Scale when IBM announced at the York May user group that protocol support was imminent. In fact, I did a talk at the user group on our storage solution, it was interesting to hear about async-DR there, but I still think our full HA active-active solutions across two data centres is the right option for is and just works! Though the cluster export services and SMB protocol support are a massive improvement on our deployment based on Sernet Samba. It did take until the 4.2 release to get all the bugs ironed out in the CES scripts as we aren't pure AD or pure LDAP so needed to use custom authentication modes.

Protocol support isn't just SMB support of course, NFS and object are also provided. cNFS has been around for a while, but protocol support uses NFS Ganesha server. One reason IBM have moved to this is that it allows them some control over support for the NFS server stack - for example using the kernel stack may require your Enterprise Linux vendor to provide an updated kernel. They also work closely on the Ganesha project to get their fixes accepted upstream.

2015 also saw the first steps into demonstrating the move to six monthly major release cycles which has seen 2015 close with 4.2 hitting my storage clusters. We're also seeing about 6 weekly PTF releases, though IBM have also kept with their excellent support approach of providing interim EFIX builds for customers until a fully tested PTF release is out. This is great though as we get really timely fixes for the issues we are seeing (it also means IBM get to test the fixes out before they become generally available).

Within the IBM teams, there has been a move to agile development methodologies and scrum teams. This has meant a move away from traditional road-map talks at the user group meetings, and more towards this is what we're working on. There's also been a lot of work under the hood on getting the first and second line support structures working together for Spectrum Scale. We have slides from the SC15 meeting on these if you are interested.

I'm aware of a few cloud deployments in the UK sitting on top of Spectrum Scale storage, its been an interesting ride for us! We see some great performance at times - 15 seconds to provision a VM of any size disk is pretty neat (it uses mmclone under the hood and copy on write), but equally we see performance issues with the small block updates from the VM disk image. One feature that you may have missed that arrived in a PTF release is HAWC - highly available write cache. This is basically a write coalescing mechanism the allows GPFS to buffer small writes via a (flash) layer which potentially should help with the small write issues for VM disk image storage. My testing stumbled a few times with this - the first "release" didn't quite work as the NSD device creation had overly strict checks - it should be possible to run HAWC on client systems. I also dead-locked my file-system testing it out as one of my clients could resolve the NSD servers, but not all the clients in the cluster... I've also had a little instability when shutting down a number of my clients as all the HAWC devices become unavailable. 2016 will see further testing of this from my projects!

Cloud still has a number of unanswered questions, partly around security and access. A requirements document has gone to IBM on this, we'll see how this progresses in 2016!

4.1.1 brought in the Spectrum Scale installer. I must say I've not used this in anger as it doesn't really work well our work-flows, but I was able to unpick what it was doing fairly easily to get my protocol servers installed properly. The initial docs on manual installation were sparse, but this has been improved following er, user feedback ;-). As well as the installer and protocol support, there is also a new performance metrics tool which can collect data from the protocol support parts as well as core GPFS functionality.

4.2 arrived in mid November, just after SC15 in Austin, bringing a host of new features. Perhaps this biggest of these was the GUI. This allows monitoring and basic control of GPFS clusters including ILM and protocol features. IBM spent a lot of time demoing the GUI pre-release and listening to user feedback, so hopefully we'll see some of that feedback going into improving the GUI. A GUI installer has been added as well, though I'm a little unsure about really who this is targeted at. Compression for the file-system is now available as a policy driven process with un-compression happening on file access. Basic QOS features for maintenance procedures is now available for example to limit impact from a restripefs. This isn't something I've been able to test yet as it requires the file-system and metadata to be fully upgraded to 4.2 version which you can't do if you have older versions mounting the file-system.... I nearly have all my clusters up to 4.2 though now!

4.2 also had updates for the object interface, and now provides unified file and object, i.e. you can do POSIX style access to a file as well as access via object methods. Finally, some updates to the hadoop interface have been added with an updated connector.

We've also seen the strengthening of the UK development team, led by Ross who has been great at working with the User Group, kicking off the meet the devs meetings here and getting members of his team along to our events. Meet the devs has been on UK tour this year, getting to London, Manchester, Warwick and Edinburgh (and we're heading to Oxford in February 2016). The UK team is working on the installer and all things related to installing as well as working on some of the 2016 major projects (problem determination being one of them). I know the UK team has been talking to UK customers and members of the group and taking in feedback to help drive development. I also met the product exec who was in the UK for a few days (and again in Austin) to talk user group issues.

And finally, away from pure IBM work on Spectrum Scale, we've seen the release of Seagate's ClusterStor appliance running Spectrum Scale, DDN have the SFA14k storage, ESS and GSS have been updated as well. I've also had chats with other vendors looking at appliance approaches, so it will be interesting to see what 2016 brings!

The IBM CleverSafe acquisition will be an interesting one to watch for 2016 and how IBM go about integrating it with Spectrum Scale ...

The user group has been busy this year and I can't believe how busy its been since I took over from Jez as chair in June! We've seen the launch of a US chapter with a meet the devs meeting in New York and a half day session at SC15 in Austin. The US group is being taken forward by Kristy and Bob with help from Doug and Pavali at IBM - thanks guys! The Austin meeting was a great success and very well attended. We also closed out the year in the UK with a successful user group at Computing Insight UK in Coventry.

Plans for 2016 are in flow, with the UK user group meeting booked for May and the first meet the devs planned for February.

And really finally, thanks to the user community - we wouldn't be able to do meetings and events without you attendance and talks, and also to IBM, Akhtar and Doris, the marketing people and all the developers, researchers and managers who get the developers along to the meetings.