Thursday, 10 March 2016

Connecting data intensive instruments ... enter the Brocade VDX fabric

I promise not to mention Spectrum Scale (much) in this post. We have a big chunk of Scale storage which is for research data and coupled to our compute resources for researchers, but we also have a lot of data intensive instruments which we need to get data off and onto the Scale storage.

The number of data intensive instruments we are seeing is increasing rapidly and we need to provide them with secure isolation whilst allowing them to get data out of their facilities and into our central storage.

Our Scale storage servers each have multiple 10GbE connections onto our research data network, but getting the data across the campus network has its issues. We've therefore implemented a research data network based on Brocade VDX fabric switches running a VCS fabric. Mostly we run VDX 6740 and 6740T for edges of the research network, but we'll be adding some 6940-144 in the next few months (up to 144 10GbE ports in 2U!). One of the things we really don't want to do is have to push traffic via a firewall from data intensive instruments, but be able to route it, ideally just switching it where we can and our VDX fabric design allows us to do this so keeping speed up across the network. So much so, we are hoping researchers will move away from a local staging server in the lab and just stream data directly to our central storage.


We've moved to using 6740T for edge switches as it presents copper ports, we do get the option to use the QSPF+ ports with Mellanox QSA adapters so we can plug in 10GbE optics, this is nice as it means we don't need to fork our for 40GbE optics to uplink to the 6740T, but can just use 10GbE LR optics. For now we are running edge ports at 1GbE, but we can POD license up to 10GbE, though if the building copper will sustain this is a whole different question!

The VDX fabric switches act to a certain extent like a giant stack which we can span right across campus (up to 48 switches) and they will even run in metro mode. They also have pretty low port to port latency. One of the very nice features we like is the ISL capability of the switches, pretty much we can add arbitrary ISLs between switches without worrying about loops or having to configure the ports as trunks. OK, there are a few considerations to think about, like spanning port groups, not that this breaks it, but spanning port groups acts like two ISLs in a traditional trunk, but within a port group ISL traffic is sprayed at layer 1 level, meaning we won't see one half of an ISL full whilst the second half is underused. And adding new links is pretty much a case of plugging in optics and fibre.

I mentioned we don't need to worry about ISLs looping between switches. The VCS fabric blows away the traditional tree structure of a network and pretty much any switch can be connected to any other in the fabric. There are a few limitations, like you can't build a ring of more than 6 switches, but the ability to have traffic flow pretty much anywhere between switches reduces our management and design headaches. If an ISL fails between two switches, and there is another path over the fabric, the switch will just kick over to using that instead.

Being enterprise switches we get all sorts of nice features like edge loop detection across the whole fabric which means we don't need nasty protocols like spanning tree. As a fabric, management is pretty simple, one of the switches assumes a VIP as the master of the cluster for configuration.

They are also Layer 3 devices and can run VRRPe etc for HA routing, add into this the ability to use short path forwarding and we can pretty much prevent traffic tromboning if the switch is a member of the VRRP group ... even though it doesn't host the "default gateway IP", it can short cut the routing. Pretty neat!

There are also nice features like variable port buffers allowing us to reallocate port buffers if needed, for example if we have a few big storage servers and edge devices, then we can steal buffers from the edge device links and allocate to the storage server ports.

Throw in some "basic" functionality like port on demand, 40GbE breakout and all in all its a pretty good switch for our data intensive applications!

In related work, we are developing a research compute cloud and will be hooking the VDX fabric into this, whether we use the VTAP functionality, not sure, but the OpenStack ML2 plugin is probably going in, and we'd really like to be able to connect compute VMs hosted in our cloud for research applications directly into research facilities over a low latency, high speed network. Can we encourage researchers to move away from lab based compute resource? Well, only time will tell!

If you are interested in how we are building the network and the design, get in touch and I'm happy to have a discussion on this!