Search This Blog

Wednesday 24 June 2020

Hyperconverged isn't a magic bullet (what the vendors woan't tell you)

Hi all,

and thank you for reading my blog. In recent years there has been a massive shift from a traditional three tried architecture (SAN Switching and Compute) to a Hyperconvered model (Storage and compute in the same tin). There are many attractions to this, some of the main "Sells" being that when you add compute and add storage, you add IOPS when you do this and the platform scales with you; It also saves on data centre space with the nodes typically being 2U or 4U without the need for a SAN head and disk shelves.

This all seems too good to be true and as a whole its a really sensible model. However, there are some major pitfalls in this design which means some workloads do not perform well on them, and not only that they would typically perform much worse than the SAN you have just replaced.

So what's the problem you ask?

Well let's take a traditional SAN from a vendor; This would have some spinning disks either in RAID 5 or 6 and most of the large vendors would accelerate this with a bank of SSD's for frequently accessed data (something like EMC Fast Cache for example). On a hyper-converged platform, for example, Nutanix, you have 4 JBOD drives with 2 SSD cache disks for frequently access data and typically this is furthermore cached in RAM. For that frequent access data, the Hyperconverege tin would have a lower latency in reading that data (assuming the VM is on the same piece of tin as where the data is) than a traditional SAN which would have to traverse some switching, either fibre channel or TCP/IP with iSCSI.

But what about cold data, and not just cold data; data the is also sequentially read... and here lies the problem. Cold data is absolutely on the spinning disks and a lot of the hyper-converged vendors us an algorithm for data reads which when it sees a sequential read, typically will read that from the spinning disks also as its quicker than reading from SSD (SSD's work best for random reads).

Knowing this, if you think about the architecture of those spinnings disks we described at the start we have a SAN with RAID 5 or RAID 6 which the more spindles you add the faster it gets or the hyper-converged platform which is JBOD (single disks). The fastest a Hyperconverged platform can read cold data or sequential reads is as fast as a single disk can read that data (100 - 140 IOPS) whereas a traditional SAN could be 10x, 20x etc faster depending on the number of spindles used to create that RAID set.

If that disk is also reading or writing data for another VM at the same time, then due to the head moving across that platter of the disk performance could be a lot worse then the 100 - 140 IOPS as well (try running multiple VM's on your desktop PC for example).

So, while hyper-converged provides a great platform for a lot of companies, if your workload is, for example, replicating data between VM's, running data warehousing reports on an infrequent basis or something similar then hyper-converged might not be the right decision for you.

Also just as a final thought. You could absolutely use hyper-converged as you compute platform but also complement this with a traditional SAN for those workloads that don't fit well on them.

Monday 13 July 2015

Broadcom v's Intel

Hi all,

had a really interesting experience recently. We are running a VMware environment with Broadcom 57800 nic's with iSCSI hardware offload, but were seeing really high datastore latency in VMware even though the actual storage was reporting normal latency values.

At 1st we though it was the switches, but after several calls with VMware and then the storage vendor we decided to have a play with the NIC's in the servers. 1st thing was to go from the Hardware acceleration on the Broadcom 57800 NIC's to VMware software iSCSI initiator.

Just making this change we went from (using iometer) 10mbps to 21 mbps and 51 ms average latency to 24 ms average latency (100% write work load) and fro, 265 mbps to 402mbps and 1.98ms latency to 1.3 ms latency (100% read workload)! This is on a 10GB Cat 6 network all layer 2, server -> switch -> SAN. This is something we were not expecting as you would think that hardware would be faster than software.

The next change was to swap the NIC's completely to Intel but still using the software VMware initiator. Again we saw a massive improvement with the Broadcom hardware delivering around 33,000 IOPS, Broadcom with software imitator just short of 50,000 IOPS and with Intel NIC but VMware software initiator delivering over 60,000 and peaked at nearly 70,000 IOPS.

This is a staggering result with us improving the performance of our SAN nearly 100% by simply swapping the host network adaptors from Broadcom to Intel. This will defiantly be the last time I use Broadcom as a NIC and shows that even though Intel's are a little bit more expensive, they are certainly worth the extra cash.

Hope this helps

Andy

Test done using ESX 5.5 enterprise plus, Dell R720 servers, Juniper 4550 10gb Base-T switches and Tegile Hybrid SAN

Wednesday 24 June 2015

What is your ESXi storage doing?

Currently working on a cloud platform for an ISP and found a great command line tool for your ESXi hosts to see how your storage is performing. SSH into one of your servers and run ESXTOP and press the D key.
This shows usful information like how long a storage command has spent in the kernel v's the amount of time that it has spent going to the device.


See link below for further information:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1008205

Thursday 20 February 2014

Installing vCenter 5.5 Client on Windows 2012 R2

Hi all,

I had to install vCenter Client on a Windows 2012 R2 server for a customer and what a pain it turns out to be. If you just run the installer you get an error message saying "Internal Error 28173. -2146498298" which in plain English means that it doesn't have access to the Microsoft .Net framework 3.5 as this isn't installed as standard on a windows 2012 R2 box.

So the next step is to install .Net 3.5; you go about this by adding a feature. .Net 3.5 is one of the 1st items on this list, but if you just tick it and next next next all the way through the install, the install will fail. You need the windows installation DVD to actually install this and on the last option for the install at the bottom of the screen there is an option "specify an alternative source path", click this and enter X:\sources\sxs (where X is there letter of your DVD drive). When you next through the remaining sections this will now install correctly.

Talk about making something that should be a 2 minute job completly overly complicated!

Tuesday 6 August 2013

Cisco VIC 1225 with VMware | Redundant Networking Problem

Hi all,

I have been installing some Cisco C220 servers with the VIC 1225 CNA (converged network adapter) for a local company. As part of the installation these servers were connected to a pair of Nexus 5K's with the CNA used for both fiber channel and 10gbps ethernet for VMware ESXi 5.1.

The connectivity was pretty standard to any installation of this type with one 10 gig link going to one switch and the other going to the second switch, exactly like if this was a 1 gig implementation with catalyst switches. So i installed VMware and as part of the default installation vmware takes the first network card and uses that as its management interface and also uses the MAC address of that NIC for the MAC address of the management console. FINE! But that will come back and trouble me later on.

So i give the management interface a static IP address and connect using the vSphere client and all's going well until the point where i add the second NIC, and all connectivity to the host was lost!  So i connect the CIMC and run the test network connectivity and everything fails. So i remove one of the NIC's and everything comes back up again. So after lots and lots of testing and nothing working when everything looked correct, we resulted in logging a support call with Cisco.

It turns out that there is currently a fault on the VIC 1225 where by, even though all the vNIC's are in promiscuous mode, if one port see's the MAC address of another vNIC it doesn't forward the packet. I have been told that there is a fix on the way and should be available in September.

So the solution, if you get this as a problem is to delete the default VMware management interface and create a brand new one, this is given a VMWare MAC address stating in 00 and is therefore not a MAC address that appears on the VIC. Add your two network to this new management interface and everything works as you would expect.

Thanks for reading and i hope this was of use to someone.

Monday 13 May 2013

Installing ESXi onto new Dell Servers

Hi all,

just a quick post, i've recently had to install ESXi onto some new Dell server (R720 and the 1u equivalent) for some clients. I normally download the latest release from VMware and burn it to CD before i go to site to save some time, but this has caught me out with the new Dell Servers.

What you will find is that when the ESXi disk starts to boot you get an error message stating that there are no network cards in the server (even though i had on this occasion 2x quad port broadcom NIC's). What you need to do is go to Dell's website, enter in the server tag for the server you have purchased and got to the enterprise solutions part of the support and drivers section. There you can download a Dell customized version of ESXi which includes all the dell drivers you need to successfully install ESXi

Hope this helps and thanks for reading.

Andy

Thursday 7 February 2013

Increase VMware converter performance

Hi All,

I was involved with a P2V migration last week and while the number of servers to migrate were small the amount of data held on these servers were around 250gb - 500gb in size. This conversion was to a vSphere 5.1 infrastructure using the VMware standalone converter running in a VM. When i started the migration off, we were experiencing terrible transfer rates in the region of 2MB/s and it was saying well over 8 hours of the P2V to complete per server.

After a little bit of googling i found the following article on VMwares communities forum which says that from converter 5.0 and onward, the transfer of data has been encrypted by default and that this could slow down the transfer of data.(http://communities.vmware.com/message/1866091).

I stopped the job and turned off the encryption by editing the converter-worker.xml file and started the job again. The transfer rate, once stabilized went up to around 25MB/s and drastically reduced the amount of time the P2V took!

If you are experiencing slow P2V rate this might be something worth trying.

kind regards and thanks for reading

Andy