Recently a client was having some IO performance problems so they decided that the easiest solution would be to through an SSD under the log files on the VM. Easy and cheap solution. The server in question has about 50 databases on it. In order to speed up the transaction log file transfer I decided to run multiple file copies at once. However doing so pushed the CPU on the server to 100%, the network would randomly crap out causing my RDP connection to drop and the file transfer was taking forever.
All the settings in both vSphere, Windows, the Array we were copying from, etc. all looked fine. This VM happened to be running on vSphere 4.1 Update 1 and couldn’t be upgraded because the vCenter server hadn’t been upgraded yet. The guest had 8 vCPUs and 96 Gigs of RAM. The hardware had two physical sockets which had 8 physical cores each (hyperthreading is turned on) and 128 Gigs of RAM on the host (this VM was the only VM on the host). According to the vSphere 4.1 Best Practices a VM will by default be run with all the CPUs within a single NUMA node. This means that this VM has it’s CPUs in a single NUMA node and some of the memory is in each NUMA node.
I wouldn’t expect this to be a problem, but apparently it was. Each of the file copies was running a CPU core up to 100%. When I’m running 10 file copies at once that’s over 100% of the CPU needed.
We reduced the amount of RAM available to the VM from 96 Gigs of RAM to 60 Gigs of RAM. Suddenly the guest was able to copy the files very fast, the file copies took next to no CPU power, and the network problems went away.
Why vSphere and Windows was having these problems I’ve got no idea, but this did fix the problem.
Denny