Tuesday, 27 January 2015

Using the GPFS Cinder driver with OpenStack

I've blogged a couple of times about using GPFS with OpenStack, in this post I'm going to focus on using setting up the GPFS Cinder driver. This is tested using Juno with RDO.

First I'd like to send out thanks to Dean Hildebrand (IBM Cloud Storage Software team) who I met at SC14, who put me in touch with Bill Owen who works in the GPFS and OpenStack development team, Bill helped me work out what was going on and how to check it was working correctly.

I'll assume you have both Glance and Cinder installed. These should be putting their image stores onto a GPFS file-system and using the same fileset for both, for example I have a file-set "openstack-bham-data" where I have cinder and glance directories, the fileset it mounted at /climb/openstack-bham-swift.

The basic magic is that the cinder driver uses mmclone of the glance images to create copy-on-write versions, which can be done almost instantly and is very space efficient. It will also only work on raw images from the glance store.

# ls -l /climb/openstack-bham-data/
total 0
drwxr-xr-x 2 cinder cinder 4096 Jan 27 20:14 cinder
drwxr-xr-x 2 glance glance 4096 Jan 27 19:08 glance

In the /etc/cinder/cinder.conf file, we need a few config parameters setting:
gpfs_mount_point_base = /climb/openstack-bham-data/cinder
volume_driver = cinder.volume.drivers.ibm.gpfs.GPFSDriver
gpfs_sparse_volumes = True
gpfs_images_dir = /climb/openstack-bham-data/glance
gpfs_images_share_mode = copy_on_write
gpfs_max_clone_depth = 8
gpfs_storage_pool = nlsas

(Docs on the parameters are online).
Of course my Glance instance is also configured (in /etc/glance/glance-api.conf) to use:
filesystem_store_datadir = /climb/openstack-bham-data/glance

A couple of things to note here, nlsas is one of my storage pools. You can use this parameter to determine which pool cinder volumes are placed in, we're also using copy_on_write which means we only copy blocks as they change giving better storage utilisation.

Now that we have cinder configured, restart the services:
# systemctl restart openstack-cinder-api.service
# systemctl restart openstack-cinder-scheduler.service
# systemctl restart openstack-cinder-volume.service

(at this point I should note that this is running on CentOS7, so its systemd based, the GPFS init script is a traditional SysV init script - it would be nice to be systemdified so that we could add dependencies on swift, glance and cinder on GPFS being active)

We'll now do a basic cinder test to ensure cinder is working:
# cinder create --display-name demo-volume1 1
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2015-01-27T20:32:25.244948      |
| display_description |                 None                 |
|     display_name    |             demo-volume1             |
|      encrypted      |                False                 |
|          id         | f7f5c7a1-bf56-41a1-b9f9-a7c74cac748d |
|       metadata      |                  {}                  |
|         size        |                  1                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 None                 |
+---------------------+--------------------------------------+
# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| f7f5c7a1-bf56-41a1-b9f9-a7c74cac748d | available | demo-volume1 |  1   |     None    |  false   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
# ls -l /climb/openstack-bham-data/cinder
-rw-rw---- 1 root root 1073741824 Jan 27 20:32 volume-f7f5c7a1-bf56-41a1-b9f9-a7c74cac748d
# cinder delete f7f5c7a1-bf56-41a1-b9f9-a7c74cac748d

OK, so basic cinder is using, now lets try out using the GPFS driver. Remember, it will only work with raw images.

We need to define the GPFS driver type in cinder:
# cinder type-create gpfs
# cinder type-list
+--------------------------------------+------+
|                  ID                  | Name |
+--------------------------------------+------+
| a7db8364-9051-4e40-99a5-c43842443ef7 | gpfs |
+--------------------------------------+------+

If we don't have a raw image in glance, lets add one:
# glance image-create --name 'CentOS 7 x86_64' --disk-format raw --container-format bare --is-public true --copy-from http://cloud.centos.org/centos/7/devel/CentOS-7-x86_64-GenericCloud-20140916_01.raw

# glance image-list
+--------------------------------------+---------------------+-------------+------------------+------------+--------+
| ID                                   | Name                | Disk Format | Container Format | Size       | Status |
+--------------------------------------+---------------------+-------------+------------------+------------+--------+
| e3b37c2d-5ee1-4bac-a204-051edbc34c31 | CentOS 7 x86_64     | qcow2       | bare             | 8587706368 | active |
| bf756074-ab13-45bb-b899-c83586df4ea8 | CentOS 7 x86_64 raw | raw         | bare             | 8589934592 | active |
+--------------------------------------+---------------------+-------------+------------------+------------+--------+

Note that I have two images here, one is qcow2, the other raw. Its important that the image is actually raw - I found that the CentOS image from cloud.centos.org named raw was actually qcow2 and things didn't work properly for me, so I had to convert the image file before the mmclone would work.

So to reiterate, the image must actually be raw  the glance --disk-format will accept raw even if it isn't and it doesn't check. To be sure it is raw, lets check:
# qemu-img info /climb/openstack-bham-data/glance/bf756074-ab13-45bb-b899-c83586df4ea8 
image: /climb/openstack-bham-data/glance/bf756074-ab13-45bb-b899-c83586df4ea8
file format: raw
virtual size: 8.0G (8589934592 bytes)
disk size: 8.0G

OK, we're happy it is raw, lets now create a cinder volume from it:
# cinder create --volume-type gpfs --image-id bf756074-ab13-45bb-b899-c83586df4ea8 8
+---------------------+--------------------------------------+
|       Property      |                Value                 |
+---------------------+--------------------------------------+
|     attachments     |                  []                  |
|  availability_zone  |                 nova                 |
|       bootable      |                false                 |
|      created_at     |      2015-01-27T20:43:09.213366      |
| display_description |                 None                 |
|     display_name    |                 None                 |
|      encrypted      |                False                 |
|          id         | 91d8028a-c2cb-4e2c-a336-aa0636488b88 |
|       image_id      | bf756074-ab13-45bb-b899-c83586df4ea8 |
|       metadata      |                  {}                  |
|         size        |                  8                   |
|     snapshot_id     |                 None                 |
|     source_volid    |                 None                 |
|        status       |               creating               |
|     volume_type     |                 gpfs                 |
+---------------------+--------------------------------------+
The image should be ready within a second or so - even with an 8GB image (if it takes a minute or two, you almost certainly have  problem with the mmclone). Lets take a look at the volumes we have:
# cinder list
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
|                  ID                  |   Status  | Display Name | Size | Volume Type | Bootable | Attached to |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
| 91d8028a-c2cb-4e2c-a336-aa0636488b88 | available |     None     |  8   |     gpfs    |   true   |             |
+--------------------------------------+-----------+--------------+------+-------------+----------+-------------+
And lets also check that it is a cloned image:
# mmclone show /climb/openstack-bham-data/cinder/volume-91d8028a-c2cb-4e2c-a336-aa0636488b88 
Parent  Depth   Parent inode   File name
------  -----  --------------  ---------
    no      1          449281  /climb/openstack-bham-data/cinder/volume-91d8028a-c2cb-4e2c-a336-aa0636488b88

If it isn't a clone, you'll get something like:
# mmclone show /climb/openstack-bham-data/cinder/*
Parent  Depth   Parent inode   File name
------  -----  --------------  ---------
                               /climb/openstack-bham-data/cinder/volume-69a88475-d1f1-448d-9dfc-3861913f2716

Note there is no parent inode listed. We can also check that the glance image is now a parent mmclone file:
# mmclone show /climb/openstack-bham-data/glance/bf756074-ab13-45bb-b899-c83586df4ea8
Parent  Depth   Parent inode   File name
------  -----  --------------  ---------
   yes      0                  /climb/openstack-bham-data/glance/bf756074-ab13-45bb-b899-c83586df4ea8

Just to compare timings, using the mmclone method on my 8GB image took less than a second to be ready (as quick as I could type cinder list), whereas the traditional copy method took a couple of minutes. I guess this will vary based on how busy the GPFS file system is, but mmclone is always going to be quicker than copying the whole image over.

To cover a couple of troubleshooting tips, if you get errors in the cinder volume.log like:
VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Could not find GPFS file system device: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128).

This means that you probably forgot to do the "cinder type-create gpfs" step to create the volume type.

Second, if you find that it isn't cloning the image, check that the source is actually of type raw using "qemu-img info" to verify the type actually is raw.

Those are the only two problems I ran into, but enabling debug and verbose in the cinder.conf file should help with diagnosing problems.

Once again, thanks to Dean and Bill at IBM GPFS team for helping me get this working properly.

Update - Nilesh at IBM pointed out that we might also want to set default_volume_type = gpfs so that new volumes default to GPFS, this is important if more than one volume type is defined.