Wednesday, 17 June 2015

Dealing with faults with storage ... migrating data without downtime!

One of my V3700 storage arrays has been having issues recently and now it looks like one of the canisters in the controller needs to be replaced. This process looks like it might be disruptive to the service running on top of it as the fault might be a software issue that requires us to reboot both canisters in the controller to resolve it.

This is the storage array running our GPFS for our OpenStack cloud.

But of course this is GPFS, and I have a spare storage array waiting to join the CLIMB storage here, so I'm planning to move all the data over to the new storage array before doing the maintenance on the controller.

Why do I have a spare controller you ask - well, it was bought to add to the file system, but we wanted to do some testing with block sizes on these controllers before doing that, actually we'll probably end up rebuilding the GPFS file system at some point to reduce the block size to 1MB. For various time reasons I haven't done this, so I have a fully decked v3700 with no data on it.

Now when I originally set up the CLIMB file system here, I set metadata to be replicated across two RAID 10 LUNs on the controller.

On the new controller, I've instead setup a number of RAID 1 sets. Eventually this will happen on the original controller instead of the RAID 10s.

Now for the magic of software defined storage.... I've added the LUNs as new NSD disks in the same failure group as one of the RAID 10s holding metadata.

I then simply issue a "mmdeldisk climbgpfs diskname", and hey presto, GPFS replicates all the metadata from the LUN on the one v3700 to the new LUNs on the new v3700.

Once that is complete I plan to use mmrepldisk to replace the disks on the faulty v3700, and GPFS will magically move all the data to the replacement v3700.

All with no disruption to service. Nice!

1 comment:

  1. With traditional storage methods, it was difficult for companies to analyze their databases, requiring well-trained specialists. The larger the company, the longer it would take. With the Cloud, you have the tools you need to sort through data quickly and smoothly.
    data room virtual

    ReplyDelete