Well today I had an interesting conundrum. Was doing some routine patching of an ESX cluster and suddenly alerts were going off about VM’s being disconnected.
It turns out we hit the default port limit of the vSwitch on the destination ESX host, which is 64 (or 56 usable).
To get services back online quickly, I simply migrated a few machines off the over-allocated host, then re-enabled the interfaces on the affected VM’s. A better monitoring system would’ve been helpful here, or if I were faster with powercli, perhaps finding the disabled interfaces through that… I ended up going through all the VM’s in that cluster to be sure I’d got them all.
The ultimate fix is to carefully juggle the VM’s around so you don’t hit the limit again, then increase the port limit on each vSwitch in the affected cluster….
Between this, a massive spanning tree issue taking down half the campus and an abandoned snapshot…. I think I’ve had enough disaster for one day.