Thursday, 21 August 2014

Thoughts on Storwise V3700 and GPFS

I've recently taken delivery of our first Storwise V3700 storage array, prior to this we were using DS3500 series controllers and shelves (which I understand we essential OEM's NetApp products). The V3700 is developed by IBM, and apparently the software for Storwise is developed at IBM Hursley Labs in the UK.
V3700 with 2.5" drive option, image IBM Redbook

Its a relatively low cost storage array and has recently been upgraded to support 9 expansion storage shelves giving up to 132 drives from a single controller head. The one for this project consists of 24 1TB SAS drives and 84 4TB NL-SAS disks.

The v3700 has lots of features given the price point, its dual controller (or canister), and supports features such as clustering, mirroring, auto tiering (some via feature on demand). We're planning to use GPFS (or Elastic Storage as is now being branded) though, so these features aren't actually of use to us, I do have a lot of respect for IBM for the price point whilst allowing you to uplift to more advanced features should you want to.

The canisters can act in active/active mode and will have a set of volumes (LUNs) that are provided by the canister but will fail over to the other canister if a canister fails. This means its possible to distribute the IO over both canisters.

As standard the V3700 has four SAS ports on each canister, 1-3 are used for host connection and 4 is used for the SAS loop between shelves. It also has a PCIe slot which can take either a SAS card or FC-AL card, so you can use it on a SAN if you want. For our use-case, we're only going to have 2 GPFS NSD servers attached, so it makes sense to just use the SAS ports (two cards in each server, 1 port attached to each canister). Due to the vagaries of the config tools, we also ended up with the extra SAS cards. What is important to note is that the SAS ports on the V3700 are mini SAS-HD (SFF-8644), and the cables we were initially sent to connect to our HBAs were mini SAS-HD at both ends whereas we needed SFF-8088.

The GUI!

I must say, I'm not over-joyed by the web GUI, but its significantly more responsive than the Storage Manager for the older kit. Its a web-based GUI and seems to have been made to look pretty. One of the things I don't like about the GUI is not being able to easily create multiple mdisks (RAID sets) easily whilst specifying the size of the set. You can select the number of drives to add, but then Storwise decides how it will build arrays under that, so for example using 84 drives, I'd like 4x spare and 8x RAID 6 (8+2p) arrays, but it wanted to build several 12 disk arrays. Anyway, that's easily worked around if a little tedious (yes I could do it via the CLI, but I was playing around with the GUI), by created 10 disk arrays one at a time and by manually marking the spare drives.

One other comment on the GUI, I found it quite hard at times to navigate around, I'm not sure its entirely intuitive, but once yet get used to where things are, its actually OK to use, and as I mentioned, significantly quicker than Storage Manager.

mdisks, volumes and pools

The normal way of using the V3700 is to create mdisks (RAID sets), put these into pools and then to create volumes (LUNs) from the pools. Of course if you are interested in tiering in the hardware, then this is a neat feature, but with GPFS we'll use placement policies to drive this. We essentially make an mdisk, assign that to the pool, and then create a volume - the pool contains exactly one mdisk and the volume exactly one pool.

We're expecting to get about 250TB usable space but right now I'm a bit unsure about the size and number of files - its going to be running under OpenStack with glance images. General guidelines for GPFS metadata is 5-10% of your storage, I'm going with 4%, which is handily ~10TB metadata which we can make from 2x RAID10 sets over the 1TB SAS drives, which will also leave us 4 spare drives.

For the bulk data, I've provisioned 8x RAID6 sets with 10 drives per set and leaves 4 drives spare.

I've left the strip size the standard size of 256kb in all the RAID sets, but will probably go with a GPFS block size of 1024kb, which should allow alignment of the RAID strip with GPFS blocks.

When assigning the LUNs to the two GPFS servers, I've changed the preferred canister from automatic to balance the LUNs over the controllers. So for example, metadata LUN0 will be preferred on canister 1 and metadata LUN1 will be preferred on canister 2.

Similarly having 8 LUNs means I can evenly balance 4 LUNs per canister so hopefully IO over the two canisters and over the two NSD servers and over the two SAS cards on each NSD server should be relatively evenly balanced.

Each of the NSD servers is of course using multipathd so we should be some degree of fault tolerance against various failures. The only failure I'm not sure about is if half a controller fails - traditionally we'd use top down/bottom up loops, but the cabling docs for the V3700 don't list this, and in fact the supplied cables are too short to implement top down/bottom up. In all honesty, I'm not sure this matters - we don't have enough shelves to stripe down the shelves to be able to sustain a shelf loss without disruption, so we're probably as safe as we can be.

I'd be interested in any thoughts people have on performance tuning the V3700 and IO balancing when using for GPFS.

More detailed spec on the v3700 is of course in the IBM Redbook.

No comments:

Post a Comment