MultiSite Stretched Cluster with Server 2016 TP5

Microsoft has introduced several additions to failover clustering , noteworthy are Storage Spaces Direct and Storage Replicas. Here, I am posting my POC testing of Storage Replica on a multisite stretched cluster with Server 2016 TP5. This is a converged Compute-Storage cluster accounting for scalability.

The Challenge

The present Hyper-V cluster is single site converged. It’s a 3 node Hyper-V cluster on an iSCSI highly available LUN. A recent power outage on the datacenter due to a natural calamity brought the clusters down, which meant that few of the critical services were affected across the globe.

The Solution

Stretch the clusters across the sites. Design the proposal which gives an improved RTO and RPO. Use a layered approach to allow for maximum scalability but without risking performance.

There were several offerings for storage volume replication across sites, but I decided to test the option provided by Server 2016. Server 2016 Storage Replication, like other vendors, has both synchronous and asynchronous replication options, but it works slightly different. Server 2016 storage replication isn’t a snapshot based replication and the deltas replicate like a mirrored volume synchronously and asynchronously.Synchronous replication ensures consistency of data across the volumes before the IO transaction to the application is completed. Asynchronous replicates the data to secondary volume after completing the IO transaction, which allows faster response time for a solution which also works as a DR across sites.

The L7 Sketch

Storage Layer

3x Dell Poweredge R520 running clustered Server 2012 R2 iSCSI target Server on 3x10G ethernet. This also provides block storage for multiple services with several dependencies- so I avoided another clustered role SOFS on this to avoid an additional dependency. Two nodes run on primary site and one on the DR site.

4x Dell Poweredge R320 connected to iSCSI target LUNs via MPIO round robin over 3x10G ethernets. Additional 2x10G RDMA ethernets are used for SMB access for Hyper-V. This runs Scale-out-File-Server role which provides SMB shares for Hyper-V VMs. Three nodes run on primary site and fourth one runes on the DR site.

The volumes are formatted with ReFS which has several improvements for Hyper-V including faster checkpoint merging and better resilience etc. CSVFS is on top of this which then hosts the SOFS file shares. This allows several hosts to access the same volume giving an active-active scenario.Quorum model is a file share witness on a third site (You can use Azure cloud witness too, thanks to Microsoft for this 🙂 )

Storage being assymetric, Storage replication is configured in the failover cluster. Server 2016 Storage replication is configured with the DR site as the replica secondary. All the CSV volumes and the log volume are configured with assynchronous replication.

Compute Layer

4x Dell Poweredge R320 running a Hyper-V cluster on an SMB share provided by SOFS storage layer. Two 10G RDMA NICs are used for SMB connectivity with Switch Embedded Teaming configured. This leverages all the benefits of SMB3.0 including SMB multichannel and SMB direct.

Post POC testing

Successful failover of the services in the event of a host / site shutdown.

Achieved a good RTO and RPO with this solution

Successfully utlized the benefits of SMB3.0 – better throughput, lower latency and lower CPU utilization

Good IOPS by using storage tiers. Fast tier (mix of SSD and HDDs – 2400 IOPS) volumes are used for production VMs and slow tier (only HDDs – 325 IOPS) are used for VM backups.

Storage data corruption test with Diskspd and StoragePerforometer passed successfully.

The Benefits

We now got a highly available multisite stretched cluster in place with increased throughput, better RTO/RPO and resiliency.

What’s next?

I used an additional layer which added manageability complications and cost evnthough performance wasn’t affected. So my next POC is to simplify the model to a hyper-converged solution with either Server 2016 Storage Spaces Direct or with hardware based apprach (Nutanix, Starwind, vxRail etc)