Tuesday, 21 July 2015

SMB protocol support with Spectrum Scale (aka GPFS 4.1.1)

In a break from posting on OpenStack and GPFS, I've been working on one of my other GPFS related projects.

First lets get the naming out of the way, I've been using GPFS for a few years, and it will forever stay GPFS, but with version 4.1.1 which was released in June 2015, the product was renamed. Spectrum Scale. Now we've got that out of the way, I can get on with posting about protocol support!

4.1.1 was the first release to come with official protocol support for SMB and Object (it also includes the new Ganesha NFS server as well). One of my projects has been to build a research data store, naturally we looked at GPFS for this - its scale out storage after all, and has nice features like the policy engine, tiering and tape support meaning we can automatically move files which have aged to cheaper storage tiers and down to tape, yet have them come back online automatically if a user requests them.

Before I go on to talk about IBM Spectrum Scale protocol support, a bit of history first!

Plan 1. Use SerNet samba precompiled packages.

Initially the client presentation layer was built on SerNet samba, as this was the only pre-compiled SMB package set to include the GPFS tools in the build (AFAIK RedHat Enterprise packages aren't built with the GPFS VFS module).

The plan was to use the pre-compiled binaries with CTDB to do IP address fail-over, and this all worked when we were building on CentOS 6.3.

However as we moved on in time, we looked at upgrading to CentOS 7.0 and this is where our CTDB woes started. 7.0 releases come with CTDB 2.x and the SerNet binaries had dependencies on 1.x. OK, SerNet also provide some CTDB packages (if you dig around on their site enough) based on 1.x, however these didn't support systemd and seemed rather unstable for us.

At this point we looked at two options, recompile the src rpms, or roll back to CentOS 6.x. The second it became clear wasn't really an option as moving to the latest 6.x release also included CTDB 2.x based packages, so essentially the same problem. Which brought us to...

Plan 2. Re-compile SerNet samba packages for CTDB 2.x

This was actually quite easy, I used to do a lot of rpm building in a previous role, so I know my way round a spec file and it was pretty simple to tweak the spec to allow CTDB 2 packages, strip out the bits that conflicted with the OS CTDB packages and go with that.
An hour or so later, I had some working packages that we deployed and tested, which all seemed to work fine. A little cautiously we decided to proceed on this basis, though uneasy about the prospect of regularly having to fix the spec file and rebuild, we felt we didn't have much choice.

Along comes Spectrum Scale 4.1.1!

In May I was talking at the GPFS User Group in York, and Scott Fadden from IBM was talking the GPFS roadmap (see slides) and he mentioned the release date of the long promised protocol support. This got me thinking and I decided to push off the pilot phase of our data store for a few weeks to try out the upcoming 4.1.1 release including protocol support.

We were moving from 4.1.0, and the 4.1.1 upgrade also wasn't the smoothest GPFS upgrade I've ever done. I managed to deadlock the file-system. I'm putting that down to me doing something silly with quorum or quorum nodes at the time, but I'd strongly suggest you test this process carefully before doing it on a live GPFS system as a non-destructive upgrade. As an aside, I've had 4.1.1 deadlock since when we were doing some DR/HA testing of our solution, but its circumstances I wouldn't expect to see if normal operation and was following a number of quite convoluted DR tests. (We've tested many failure modes like split brain the cluster, cutting the fibres between data centres, pulling parts of the storage systems).

But overall it was fine. As we were not quite piloting the system, I was OK to shutdown all the nodes and it restarted fine.

Getting SMB protocol support working

The next step was to actually install IBM SMB protocol support. The expectation currently is that this is down using the new installer. We use a separate config management tool and being able to reinstall nodes in the system and get them working is essential to us, so I unpicked the chef recipes (as that is how the installer is implemented) to work out that really, we just need to add the gpfs.smb package which is provided in the protocol release of Spectrum Scale.

I posted a few messages to the GPFS User Group list about getting things working and got some guidance back from some IBMers (thanks!).

SMB support is provided as part of Cluster Export Services, the cluster needs to be running on EL7, has to be running CCR and with the LATEST file-system features.

CCR worried me at first as in the previous release you couldn't use mmsdrrestore to add a node that had been reinstalled into a CCR based cluster, however Bob on the GPFS UG mailing list pointed out that this was fixed in 4.1.1 - thanks Bob!

The rest of the requirements were just a few GPFS commands, these were run on my NSD server cluster:
mmchconfig release=LATEST
mmcrfileset gpfs ces -t "ces shared root for GPFS protocols"
mmlinkfileset gpfs ces -J /gpfs/.ces-root
mmchfs gpfs -k nfs4

CES needs a space to store its config. The documentation suggests using a separate file-system, but it works fine with a file-set. We might revisit this at some point in the future and create a small file-system with local replicas on our protocol cluster.

Then a few more config commands on the protocol cluster to setup CES:
mmchconfig cesSharedRoot=/gpfs/.ces-root
mmchcluster --ccr-enable
mmchnode -N <NODECLASS> --ces-enable

CES will handle IP address allocation for the protocol cluster. We have 4 protocol servers and 4 floating IP addresses with a DNS round robin name pointing to the 4 servers to provide some level of client load balancing. Adding IP addresses is a simple process:
mmces address add --ces-ip
mmces address add --ces-ip

and mmces address list will show how the addresses are currently distributed.

Once CES is enabled, it is then necessary to enable the SMB service on the protocol nodes. Again just a single GPFS command is needed - mmces service enable SMB

Once enabled authentication needs configuring for the SMB services and Spectrum Scale provides a number of options for this (pure AD with SFU or RFC2037, LDAP + KRB), but neither of these fits our requirement. - Like many research institutions, we use AD for authentication, but local LDAP settings for identity and this isn't available as one of the pre-defined authentication schemes, however user defined authentication is possible.

One thing to note here, if you are using the "pure" approach, the starting and stopping GPFS will change the contents of nsswitch.conf and also krb5.conf. There's also currently an issue where nsswitch.conf gets edited in user defined authentication on shutdown (IBM have a ticket open on this and as its a ksh script that does it, I fixed this locally for now).

To use user defined authentication, krb5.conf and nsswitch.conf need to be configured appropriately for your AD and chosen identity source (I use nslcd, but sssd would also work).

We now need to configure CES to use user defined for file:
mmuserauth service create --type userdefined --data-access-method file

At this point there is an SMB cluster, but its not joined to the domain and needs a little tweaking to get it working. The mmsmb command is provided to manipulate the samba registry, however it restricts which properties you can set, however the net command is shipped and so its possible to use this to change the samba registry. Some of the following might not be needed as I played with various config options and mmuserauth settings before getting to a stable and working state.
net conf delparm global "idmap config * : backend"
net conf delparm global "idmap config * : range"
net conf delparm global "idmap config * : rangesize"
net conf delparm global "idmap config * : read only"
net conf delparm global "idmap:cache"

net conf setparm global "netbios name" my-netbios-server-name
net conf setparm global "realm" DOMAINSHORTNAME.FULL.DOMAIN
net conf setparm global "workgroup" DOMAINSHORTNAME
net conf setparm global "security" ADS

net ads join -U myadminaccount

One final thing to note is that I also required winbind running on the protocol servers for authentication to work. This is provided by the gpfs-winbind service which is started along with gpfs-smb when CES starts up on a node, however its only started if you are using a pre-defined authentication type and its not possible to enable it from systemd as it requires CES to be running first and the file-system to be mounted. It would be nice to have a flag in the CES config to enable the service for user defined mode, but there is a workaround, and that is to set gpfs-winbind as a dependency in systemd for gpfs-smb:

The only other command which is worth mentioning is creating a share, this is simply down with:
mmsmb export add shareName "/gpfs/path/to/directory"

Offline (HSM) files just work!

We have a mix of Windows, Linux and OS X clients to contend with, but providing SMB access is a reasonable compromise for all our users as everyone has a centrally provided AD account. Historically our experience of other tiered file-systems is that the mostly work with Windows clients when the archive bit is set, but OS X finder users tended to always call file recall when accessing a folder for preview, however with SMB3, there is also offline as a file flag which seems to be respected on my Mac and files were't recalled until accessed (and in Windows they even show up with a little X icon to show they are offline).

I'm very impressed that this has all been thought through and the GPFS VFS module for samba does all these things!

One area I still need to look at is VSS and previous versions - in theory GPFS snapshots should appear as previous versions of files, but I just haven't had time to verify this yet.

CES compared to Samba/CTDB

With CES, IP address failover is handled by CES rather than the CTDB process as in normal cluster samba. There are various policies for CES to move IP addresses round, for example even-coverage (default), or based on load on the protocol server. One slight downside is that CES node failure is handled in the same way as GPFS expel, and so this can take a bit longer for an IP address to be moved over in the event the protocol server fails. Now in normal operation, you can move the IP address off a node and disable CES if you planned to reboot the node, so its only really in an HA failure handover where its slower to complete.

One thing that did stump me for a while was accessing shares from my Mac client, it repeatedly failed to connect, however I eventually worked out that I was still using the legacy cifs:// paths, whereas as it only supports SMB2 and SMB3, then in fact you need to use smb://.

Overall impressions

Overall, I'm very happy with the SMB protocol support, with a few niggles and tweaks to documentation, I think it will be an excellent addition to the GPFS product. I've had a few issues with it, for example if the multi-cluster file-system goes inquorate then CES fails which I'd expect, however it doesn't restart when the file-system remounts, and I think a predefined callback would be good to resolve this.

And I'm pretty happy with the support I've had from IBM, I've had a couple of con-calls with various people in the UK and USA on my experience, and provided them with my feedback directly on a number of things I think need tweaking in the docs to get things moved on. So thanks IBM GPFS team for listening and taking an interest!

I'm very interested in picking up the Object support on our protocol nodes, now I just need to find some time in my schedule! I'm hoping that will be pretty easy to do, and I might even try out the installer to get the first node into the system to see what packages need adding.