Search This Blog

Wednesday 24 June 2020

Hyperconverged isn't a magic bullet (what the vendors woan't tell you)

Hi all,

and thank you for reading my blog. In recent years there has been a massive shift from a traditional three tried architecture (SAN Switching and Compute) to a Hyperconvered model (Storage and compute in the same tin). There are many attractions to this, some of the main "Sells" being that when you add compute and add storage, you add IOPS when you do this and the platform scales with you; It also saves on data centre space with the nodes typically being 2U or 4U without the need for a SAN head and disk shelves.

This all seems too good to be true and as a whole its a really sensible model. However, there are some major pitfalls in this design which means some workloads do not perform well on them, and not only that they would typically perform much worse than the SAN you have just replaced.

So what's the problem you ask?

Well let's take a traditional SAN from a vendor; This would have some spinning disks either in RAID 5 or 6 and most of the large vendors would accelerate this with a bank of SSD's for frequently accessed data (something like EMC Fast Cache for example). On a hyper-converged platform, for example, Nutanix, you have 4 JBOD drives with 2 SSD cache disks for frequently access data and typically this is furthermore cached in RAM. For that frequent access data, the Hyperconverege tin would have a lower latency in reading that data (assuming the VM is on the same piece of tin as where the data is) than a traditional SAN which would have to traverse some switching, either fibre channel or TCP/IP with iSCSI.

But what about cold data, and not just cold data; data the is also sequentially read... and here lies the problem. Cold data is absolutely on the spinning disks and a lot of the hyper-converged vendors us an algorithm for data reads which when it sees a sequential read, typically will read that from the spinning disks also as its quicker than reading from SSD (SSD's work best for random reads).

Knowing this, if you think about the architecture of those spinnings disks we described at the start we have a SAN with RAID 5 or RAID 6 which the more spindles you add the faster it gets or the hyper-converged platform which is JBOD (single disks). The fastest a Hyperconverged platform can read cold data or sequential reads is as fast as a single disk can read that data (100 - 140 IOPS) whereas a traditional SAN could be 10x, 20x etc faster depending on the number of spindles used to create that RAID set.

If that disk is also reading or writing data for another VM at the same time, then due to the head moving across that platter of the disk performance could be a lot worse then the 100 - 140 IOPS as well (try running multiple VM's on your desktop PC for example).

So, while hyper-converged provides a great platform for a lot of companies, if your workload is, for example, replicating data between VM's, running data warehousing reports on an infrequent basis or something similar then hyper-converged might not be the right decision for you.

Also just as a final thought. You could absolutely use hyper-converged as you compute platform but also complement this with a traditional SAN for those workloads that don't fit well on them.