Hyper-V Clusters crashing… Due to network disconnects

I faced a strange issue with Hyper-V servers. My Windows Server 2012 cluster started crashing randomly. Last time, I had a similar experience but that was having a pattern. Every Friday between 1.3o PM and 2 PM, servers will crash. After digging, We realized that the Antivirus client (SEP) was having a scheduled scan running. However, after few minutes of scanning, the system will crash with BSOD. However, this issue was not happening to all nodes and the cluster was mostly up.

In this case, majority of the servers are crashing in a sequence. Started analyzing the logs and memory dumps.

BSOD - Probably caused by : VMMDHCPSvr.sys ( VMMDHCPSvr+80e6 )

Memory Dump  – Probably caused by : VMMDHCPSvr.sys ( VMMDHCPSvr+80e6 )

 

 

The second dump like this..

 

Memory Dump - Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

Memory Dump – Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

 

The event logs was having events which says that Network is getting partitioned. This opened a clue and started looking at the connectivity. Though servers are connected to the same switch, the servers are loosing pings with in the same VLAN frequently. This issue was specific to the communication with in the subnet and external connectivity was mostly fine.

 

We got our network team involved and after some analysis, we identified one Windows 2008 R2 server which was flooding the network. Disconnected this server and the issue is sorted out.

Lesson Learned – The issues in the echo system can also lead to failures. No where related with Hyper-V servers or the VMs, but got badly impacted.

 

 

Comments

comments