Monday, 22 June 2015

STFC SCD Seminar

A few weeks ago I was invited to give a seminar at STFC's Daresbury Lab for the Scientific Computing Division, this was mostly on the CLIMB project with a focus on how we're using GPFS.

Wednesday, 17 June 2015

Tweaking v3700 memory

I was doing some work on one of our v3700 arrays today creating a bunch of new RAID sets and got back the message:

"The command cannot be initiated because there is insufficient free memory that is available to the I/O group"

Which confused me as at this point I wasn't assigning the raid sets to Pools or Volumes, just purely trying to create some new raid sets.

Digging around, the IBM docs don't give a lot of clues on how to fix this other than to "increase the amount of memory that is allocated to the I/O group".

Looking at the config on the array (and you'll need to delve in by ssh to do this), there are 5 pre-defined I/O groups:

>lsiogrp
id name            node_count vdisk_count host_count 
0  io_grp0         2          21          2          
1  io_grp1         0          0           0          
2  io_grp2         0          0           0          
3  io_grp3         0          0           0          

4  recovery_io_grp 0          0           0  

And by default we see that they are using 40MB memory for RAID services:
>lsiogrp -delim : 0
id:0
name:io_grp0
node_count:2
vdisk_count:21
host_count:2
flash_copy_total_memory:20.0MB
flash_copy_free_memory:20.0MB
remote_copy_total_memory:20.0MB
remote_copy_free_memory:20.0MB
mirroring_total_memory:20.0MB
mirroring_free_memory:20.0MB
raid_total_memory:40.0MB
raid_free_memory:33.7MB
maintenance:no
compression_active:no
accessible_vdisk_count:21
compression_supported:no
max_enclosures:10
encryption_supported:no

I had to increate the raid_total_memory to 80MB before I could create the new RAID sets (something  smaller would have probably done but I was in a hurry!). You do this with:
>chiogrp -feature raid -size 80 io_grp0

This got me thinking, this memory is carved out of the cache available on the system, and as I'm not using flash copy, remote copy and mirroring, or I/O groups 1/2/3, can I reclaim this memory? Well the answer appears to be yes:
>chiogrp -feature remote -size 0 io_grp0
>chiogrp -feature flash -size 0 io_grp0
>chiogrp -feature mirror -size 0 io_grp0

(and repeat for the other unused I/O groups)

Dealing with faults with storage ... migrating data without downtime!

One of my V3700 storage arrays has been having issues recently and now it looks like one of the canisters in the controller needs to be replaced. This process looks like it might be disruptive to the service running on top of it as the fault might be a software issue that requires us to reboot both canisters in the controller to resolve it.

This is the storage array running our GPFS for our OpenStack cloud.

But of course this is GPFS, and I have a spare storage array waiting to join the CLIMB storage here, so I'm planning to move all the data over to the new storage array before doing the maintenance on the controller.

Why do I have a spare controller you ask - well, it was bought to add to the file system, but we wanted to do some testing with block sizes on these controllers before doing that, actually we'll probably end up rebuilding the GPFS file system at some point to reduce the block size to 1MB. For various time reasons I haven't done this, so I have a fully decked v3700 with no data on it.

Now when I originally set up the CLIMB file system here, I set metadata to be replicated across two RAID 10 LUNs on the controller.

On the new controller, I've instead setup a number of RAID 1 sets. Eventually this will happen on the original controller instead of the RAID 10s.

Now for the magic of software defined storage.... I've added the LUNs as new NSD disks in the same failure group as one of the RAID 10s holding metadata.

I then simply issue a "mmdeldisk climbgpfs diskname", and hey presto, GPFS replicates all the metadata from the LUN on the one v3700 to the new LUNs on the new v3700.

Once that is complete I plan to use mmrepldisk to replace the disks on the faulty v3700, and GPFS will magically move all the data to the replacement v3700.

All with no disruption to service. Nice!