When it comes to designing a storage solution for your compute cluster, the main areas to consider are capacity and performance benchmarking. Performance benchmarking should be done at both storage level and at network level. At storage level, we deal primarily with what is known as IOPS – Read IOPS and write IOPS.
Storage Level – RAID selection and IOPS
Total IOPS: Average number of I/O operations per second
Read IOPS: Average number of read I/O operations per second
Write IOPS: Average number of write I/O operations per second
The formua for calulating the IOPS is as below. Please note that this is the IOPS for the disk alone, and assuming that the RAID controller cache and the software bus cache is turned OFF.
IOPS Estimated = 1 / ((seek time / 1000) + (latency / 1000))
Below are the common averages per disk type:
|SSD||6000 and above|
|15k||175 – 300|
|10k||125 – 250|
|7.2k||75 – 125|
Deciding the type of RAID is another question. Eventhough RAID 6 gives fault tolerance for upto 2 disks, it has got a high write penalty. Below are the write penalties for various RAID groups:
|RAID Level||IO Penalty|
Now lets calculate how to decide a RAID 5 array based on the IOPS requirement:
Lets assume that the Read/Writes are Random and not Sequential. Just because, data might not be written in sequential blocks, but it might be scattered in random blocks all over the drive. For a SQL cluster or a VDI cluster, if you would require an IOPS of 2000 with a read-Write percentage of 50% each, then the calculation would be
Required IOPS : 2000
Read Requirements = 50% of 2000 = 1000 IOPS
Write Requirements = (50% of 2000) * 4 = 4000 IOPS
Needed IOPS from the array : 1000+4000 = 5000
IOPS per 15k SAS disk : 175
Total 15k SAS disks needed : 28
This can be further reduced by opting for disk tiers. You can mix SSDs and HDDs to get higher IOPS and lower latency.
Main areas to consider while planning for better IOPS
- Type of storage : NVMe, SSDs will have higher IOPS than traditional HDDs. If you got only HDDs in your array, then RPM is the next factor. 15K disks have higher RPM than 7.2k and 10k disks. So its better to mix and match disk types based on your IOPS requirement.
- Type of RAID : RAID 10 gives the highest IOPS, followed by RAID 5.Note that RAID 6 has the highest IO penalty (of 6) even though it gives a 2 disk fault tolerance.
- Number of disks in the array : If you need to increase the IOPS, you can add more disks to the array.
- Type of access : Sequential Read/ Write will give a higher IOPS and performance than random read/writes but that is not common. So defragmenting disks very often can improve performance.
- Average latency for IO should be as low as possible.
- Block size : High block size means high IOPS when the access is sequential. But for random access, this is not of much concern.
Tools to test your IOPS
You can use Microsoft free tool Diskspd. See the documentation here:
Starwind has another tool to test data corruption on LUNs by performing a sequential/ random read/write to check data integrity especially when you do a storage migration or host failover.
Previously (but, I should say most of the current deployments) MPIO with round robin through 10G ethernet was used to achieve greater throughput as well as resiliency.
But now using RDMA NICs (supported with Windows Server 2012 R2 and above), we could achieve a greater throughput using SMB multichannel and SMB direct. Each RDMA NIC type (RoCE, iWARP or Infiniband) has implementation differences, so it’s better to read their documentation to make sure you plan the networking properly. Please note that if you are on Windows Server 2012 R2 platform, you can use it only for your storage SMB traffic. So you would need a different NIC team for you VM live migrations, management etc.Also, you can’t team your RDMA NICs because you loose RDMA functionality in a team.
However Server 2016 has made things far more easy. You can use switch embedded teaming (SET) which helps us to use the same RDMA NIC team for all the traffic, thereby reducing the amount of NICs (well, networking as well) needed for your storage layer.
A sample benchmarking process:
- Testing network
Ensure that you have a high bandwidth link (10G) across the storage layer – this includes iSCSI MPIO links and SAN replication links. A slower link here is a problem, as SAN will work at the performance of the slower link.
I use iPerf to measure network performance.
- Testing iSCSI throughput
Here, we first need to identify what the highest throughput from the iSCSI device is. Before doing the below, turn off the controller cache (Write back cache), so the collected results corresponds to that of the RAID array.
I do a test against an iSCSI RAM disk, which is a LUN created directly on the server memory. So this will definitely have a greater performance than network and we will be able to identify what the highest throughput of the iSCSI device will be.
Use IOMeter to do the test with different access specifications – viz 64kb, 100% Write, 100% Random etc.
Do this test from all the compute resources using this iSCSI target and record the observations.
Starwind has an offering to create RAM disk, so it’s better to check their documentations at http://www.starwind.com
Now, do the same IOMeter test for the disk LUN with different access specifications and from all the compute resources – viz, 16kb, 32kb, 64kb; 100% Read, 100% Write, 50/50 Read/Write; 100% Sequential, 100% Random. Note down all your observations.
A. Compare all your collected data and analyze if this is ok or whether something needs to be looked into?Some areas would be jumbo frames, firewall, AV software etc.
B. Compare your collected data with older data and notice how the variation is. Do a trand analysis.
Now turn on the controller cache (Write back cache) and repeat all the above tests. Perform the above comparison. Do a trend analysis.
Some areas to get most out of your storage performance:
- Plan to use RDMA NICs which gives the best of SMB3.0. Install DCB component on the server
- If using MPIO, use identical high bandwidth links and set the policy to round robin.
- Plan for the best RAID strip size that suits your requirements.
- Check firewalls have 3260 and 3261 open.
- Proper NIC teaming and bonding decisions.
- Finally, periodic performance benchmarking.
I hope this blog was really useful, please reach out to me if you have any questions.