Hi everyone,
We did a lot of troubleshooting steps (logs, network, io latency, etc) and decided to migrate the machines to a different server within the Proxmox cluster. This seems to have calmed down the CPU issue, but it's still rather mysterious why everything went high cpu. Most of our customers had degraded service yesterday as we only migrated after hours. Of course we have to wait and see what happens under load today.
The whole server was running high CPU and even after we migrated all the vms off it was still running pegged at 25% bursting to 95% . However this morning the original server is now running at 2% cpu. We were leaning towards it being a hardware fault, but I guess the proof will be today under load.