Hyper-V Clusters crashing… Due to network disconnects

I faced a strange issue with Hyper-V servers. My Windows Server 2012 cluster started crashing randomly. Last time, I had a similar experience but that was having a pattern. Every Friday between 1.3o PM and 2 PM, servers will crash. After digging, We realized that the Antivirus client (SEP) was having a scheduled scan running. However, after few minutes of scanning, the system will crash with BSOD. However, this issue was not happening to all nodes and the cluster was mostly up.

In this case, majority of the servers are crashing in a sequence. Started analyzing the logs and memory dumps.

BSOD - Probably caused by : VMMDHCPSvr.sys ( VMMDHCPSvr+80e6 )

Memory Dump  – Probably caused by : VMMDHCPSvr.sys ( VMMDHCPSvr+80e6 )

 

 

The second dump like this..

 

Memory Dump - Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

Memory Dump – Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

 

The event logs was having events which says that Network is getting partitioned. This opened a clue and started looking at the connectivity. Though servers are connected to the same switch, the servers are loosing pings with in the same VLAN frequently. This issue was specific to the communication with in the subnet and external connectivity was mostly fine.

 

We got our network team involved and after some analysis, we identified one Windows 2008 R2 server which was flooding the network. Disconnected this server and the issue is sorted out.

Lesson Learned – The issues in the echo system can also lead to failures. No where related with Hyper-V servers or the VMs, but got badly impacted.

 

 

System Center 2012 R2 – Rollup Update 2 released

http://support.microsoft.com/kb/2932881/en-us

For SCVMM, a bunch of bugs are getting fixed… 😀

Update Rollup 2 for System Center 2012 R2 Virtual Machine Manager resolves the following issues:

  • Files cannot be found on a network-attached storage device that uses NetApp storage and Server Message Block version 3 (SMBv3) protocol.
  • After an account’s password is changed, the Scale-Out File Server provider goes into a “not responding” state.
  • VMM wipes the System Access Control List (SACL) configurations on ports.
  • When an uplink profile’s name contains the “or” string, VMM console cannot show its details when it adds the uplink profile to a logical switch.
  • Dynamic disks cannot be used as pass-through disks.
  • When you create a standard virtual switch on a host without selecting the Allow management operating system to share this network adapter check box, the virtual switch is created. However, the virtual switch still binds to the host unexpectedly.
  • Network address translation (NAT) uses port number 49152 or a larger number, which Windows prohibits from being used by NAT.
  • When a virtual machine IP address type is static Out-of-Band and there is no IP address pool that is associated with the virtual machine network or the logical network, migration is complete with multiple errors.
  • If a highly available virtual machine is migrated from one node to another node by using Failover Cluster Manager, you receive an error message that indicates the absence of VHD files.
  • Some performance issues in VMM.
  • Connection with Operations Manager fails in a non-English environment.
  • After you upgrade VMM from System Center 2012 Service Pack 1 (SP1) to System Center 2012 R2, VLAN settings disappear and cannot be saved in the virtual machine.
  • Virtualization gateway could not be discovered by management packs.
  • The New-SCPhysicalComputerProfile Windows PowerShell cmdlet fails with a NullReferenceException exception.
  • Assume that you put a host into maintenance mode. When any highly available virtual machines cannot evacuate successfully, they are put into a saved state instead of into task failures.
  • Assume that you have a computer that is running VMware ESX Server to host virtual machines. Additionally, assume that cumulative progress for many applications, scripts or actions (that is reported by guest agent) becomes large. In this situation, all deployments time out, as the guest agent cannot communicate to the server successfully.
  • You cannot deploy a service template to VMware ESX 5.1 hosts. Additionally, you receive an error 22042 and a TimeoutWhileWaitingForVmToBootException (609) exception.
  • When you migrate a virtual machine together with Out-of-Band checkpoints, database corruption occurs.
  • Pass-through disks are not updated correctly in the database after they are refreshed from an Out-of-Band migration.
  • Assume that hosts establish a Common Information Model (CIM) session that can send policies to the host after the Hyper-V Network Virtualization initialization. Additionally, assume that a policy-sending activity is initiated before the CIM session creation is completed. In this situation, policies are stuck in the sending queue, and the host does not receive any Hyper-V Network Virtualization policies.
  • Communication is broken in Hyper-V Network Virtualization.
  • When you use a same user name for Run As Accounts in guest customization, a conflict occurs.
  • You cannot use a parameter together with .sql scripts for a Run As Account during a service installation.
  • You deploy a template that uses empty classification to a cloud. However, the template does not respect storage classifications that are set on the cloud.
  • When Windows Server fast file copy cannot deploy files successfully, the Background Intelligent Transfer Service (BITS) fallback task continues using the fast file copy credentials.

 

 

http://support.microsoft.com/kb/2932926/en-us

 

 

Why Hyper-V ?

This is one of the hot topic which I am seeing for the last few years. Though I am not an expert on VMWare, I thought of writing my view point why should you consider Hyper-V now. I say it again – This post is only about why you should consider Hyper-V… Not about why you should not consider VM Ware.

VMWare ESX is in the virtualization world from 2001 and considered as the popular Virtualization platform. As other products started entering into Virtuilization spectrum, by 2008 consumers got few choice to choose between multiple Virtualization platforms. VMWare ESX was the spotlight until Windows Server 2012 was released. And now with Windows Server 2012 R2, I feel that Microsoft will dominate with a larger number of market share with in the next two years.

My view point.

Going through the Gartner reports for the last few years for Server Virtualization – Its evident that Microsoft has invested  a lot and proved that Hyper-V is one among the best.  I hope that in the upcoming Gartner report, Microsoft will be more closer to VM Ware and may be in the next year – they will share the same position. :) . In the mean time, VMWare was stagnate with out any change in these years – though being the top. If you are a VMWare fan  – I know you have a strong disagreement.

 

 

Gartners Magic Quadrant

Gartners Magic Quadrant

 

I did an evaluation of VMWare few months back and while comparing with Hyper-V, I dont see much difference as a whole. We would like to see how quickly we can deploy a private cloud through VMWare as I took two months to setup SCVMM and may be few weeks to setup a private cloud . I was standing neutral and passive on the whole deployment and configuration. For me – I dont think that VMWare is also not so simple and may need good understanding to make the running smooth.

I made a private cloud with SCVMM 2012 R2 with just Hyper-V and SCMM 2012 R2. For the first time, I struggled a lot as I need the exact information on how to implement this as per my requirement. It took approximately 2 months for me to completely deploy it and start using it. Once I did it, I felt like it was simple if I know the components inside VMM. Now I can confidently say that I can build a private cloud in 2 days. I did had a look while trying out vCAC with ESX. That was also not a magical configuration with few clicks – it had similar configurations as SCVMM.

With my SCVMM 2012 R2, I could provision a VM in less than 3 minutes. I am yet to see a better time from VMWare on a similar pay load. And on the roles – I just used SCVMM to build a Private cloud and cloud admins can use the SCVMM thick console for managing/self-provisioning their VMs. To achieve a similar setup, I see multiple components involved for VMWare. I am not saying that its bad. My point is that SCVMM could do this without having anything else.

If you get a chance to do an evaluation, the key points to be identified is the requirement. There are thousands of features or add On s which you may hear from the people around. Its your judgement that you really need those features or not. Some may be good to have and some may be must to have. Other few may be not to have ! So my take here is evaluate what you need to have and see what is good to have. Rather than allowing the people around you to showcase what they have ;).

On every session I had for VMWare, I hear these words again and again.. Business Critical, Up Time, Performance, IOPS, Memory ballooning…. ! Even after few such sessions, I started feeling that Microsoft is not the right candidate to have a Business critical VM to have guaranteed UP time and performance 😀 – Then realized that its something not logical.

The reason is simple.

IT is not simple.

IT Admins needs knowledge on the product. The more knowledge, the more confident to manage the product.

An Administrator who knows the product can really say that a product is good or bad. I don’t know VMWare – So I am not going to comment on VMWare. 😀

This is applicable for any product including VMWare or Hyper-V. Its the subject knowledge which will make you feel that a product is simple or complex.

I am not going to do a comparison with VMWare and Hyper-V. For a detailed comparison among different virtualization platforms – Refer http://www.virtualizationmatrix.com

 Why Hyper-V will gain more market share?

The growth for Hyper-V in the last few years was rapid.  The figure I have is from the Technet blog.

This is specific to the Virtualization market in Latin America and the growth is evident.

 

virtualizationmarketWe are yet to see similar reports for other region. However – Virtualization market is expanding. And in such an expending market – Hyper-V is moving up – which is really impressive.

Conclusion

If you plan for evaluating a virtualization platform, spend some time. Understand what you want. Look for what these products can offer. And finally, learn the product while you deploy it. Manage with confidence !

 

Good luck !

 

Note – This is my personal view point. You may agree or disagree or even close the browser 😀 if you feel so.

 

Hyper-V Dashboard – V3 Released

As mentioned in my last post, Hyper-V Dashboard V3 is ready and available in Technet Gallery.

With the new version, this become a full Hyper-V Dashboard as this report will provide you details on VMs, Physical Host and the storage.

New Features with V3.

Added Host Report to capture the Processor/Memory utilization of Hyper-V Host

Added Color Coding to identify servers which needs attention

New look

Here is few screen shots from the report.

Hope you enjoyed the script !

Hyper-V VM Dashboard

Hyper-V VM Dashboard

Hyper-V Host Dashboard

Hyper-V Host Dashboard

 

Hyper-V Storage Dashboard

Hyper-V Storage Dashboard

 

Hyper-V Dashboard – V3 … Coming soon !

This month was a busy month for me due to various factors. Though I couldn’t make much posting this month, I was working further on enhancing the Hyper-V Dashboard. The V3 is getting ready and I am doing the finishing works.

The new features with V3 are

  • A new section to list out the Host utilization with respect to Memory and Processor. We already have two sections in the existing script – One for VMs and one for the CSV Storage. One more table will get added which will list out the Host utilization.
  • Added the CSV Volume details along with the VM details. So its easy to understand on which CSV Volume this specific VM is residing.
  • Removed Snapshot details. Instead, only first snapshot date is there. Logically thinking, If snapshot exists, First snapshot date also should exists.
  • Now trying out to make some color coding to identify the overall health of a VM / Host or CSV Volume.

One of the plan which I had previously was to include an option to calculate the white-space in VHDX. This will help us in making a decision if its worth to shrink the disk. I couldnt find  a straight way here and hence took a long route using multiple WMI queries and to get an estimation on what space in the disk is actually used. However, I see that there are many chances where the calculation may go wrong. So I am looking out for a better option.

I really feel that Microsoft should give a commandlet which will give some information on the whitespace in a VHDX disk.

Keep watching this page for the release of HyperV Dashboard V3.

Cheers !

Shaba

 

HP BL460c Gen8/ Eumlex LOM/ VM Network Disconnect

You may be already aware of the issue related with HP Bl460c Gen8 servers using FlexFabric.

http://www.hyper-v.nu/archives/mvaneijk/2013/11/vnics-and-vms-loose-connectivity-at-random-on-windows-server-2012-r2/

I see that lot many are impacted due to this issue – However its still not clear where the issue is. The only details I see on this issue is on the blog from Marc van Eijk.

As per my observation, the issue happens only for the VMs which generates network load. And at a random point, the VM will get disconnected from network though the vNic stands connected. At this stage, you have the below options.

  • Disconnect the vNic from the Virtual Siwtich and then reconnect
  • Change the VLAN ID and then revert back to the orgnial VLAN ID
  • Reboot the VM
  • Fail-over the VM to another node (Least preferred)

Failover some times cause the VM to get stuck at saving state. And if that’s the case, migration will never happen and we may have trouble to get out of this crisis.

Get Stats Failed - Event 76

Get Stats Failed – Event 76

 

On the event viewer, you can refer for Event 1d 76. And the workaround for this issue is to start moving one more VM from the same server which will solve the dead lock for the problematic VM and move successfully to the next node. This worked for me all time.

 

Building Private Cloud – Part 6

Now we are almost ready with the Private Cloud. We have the HyperV cluster ready with fabric configured with VM Networks for each subsidiary group. We have the VM templates, Hardware Profiles and Gust OS Profiles ready. We have the Private cloud for each subsidiary group.

In this part, We will delegate the access for IT Admins of each subsidiary using User Roles.

Create a security group for each cloud.

From SCVMM ->  Settings -> User Roles

Create User Roles

Creating User Role - Name

Creating User Role – Name

 

On the Profile page, select the appropriate role profile. We have four per-defined roles available.

Fabric Administrator: Members of the Delegated Administrator user role can perform all administrative tasks within their assigned host groups, clouds, and library servers, except for adding XenServer and adding WSUS servers. Delegated Administrators cannot modify VMM settings, and cannot add or remove members of the Administrators user role.

 

Read-Only Administrator: Read-only administrators can view properties, status, and job status of objects within their assigned host groups, clouds, and library servers, but they cannot modify the objects. Also, the read-only administrator can view Run As accounts that administrators or delegated administrators have specified for that read-only administrator user role.

 

Tenant Administrator:  Members of the Tenant Administrator user role can manage self-service users and VM networks. Tenant administrators can create, deploy, and manage their own virtual machines and services by using the VMM console or a web portal. Tenant administrators can also specify which tasks the self-service users can perform on their virtual machines and services. Tenant administrators can place quotas on computing resources and virtual machines.

 

Application Administrator: Members of the Self-Service User role can create, deploy, and manage their own virtual machines and services by using the VMM console or a Web portal.

 

For our requirement, I prefer “Tenant Administrator” role.

Creating User Role - Setting the delegated Role

Creating User Role – Setting the delegated Role

 

In the Members page, add the user/security group which should get access through this role. I am using a security group for this purpose. Members in this security group will get the access through this role.

Creating User Role - Defining Security Group

Creating User Role – Defining Security Group

 

In the Scope page, we need to define the scope where this user role gets access. Scope is defined through the Cloud.

Creating User Role - Defining Cloud

Creating User Role – Defining Cloud

 

In the next page, we may define the quota. On the cloud, we have defined a  quota which is the maximum a cloud can have. However, it doesnt means that the entire resources should be utilized by a single user role. We can have multiple user role and have different quotas for the user roles, however the quota will be within the total quota allocated for cloud.

 

Creating User Role - Allocating compute resources

Creating User Role – Allocating compute resources

 

In the next page, we will allocate the VM Networks which will be used along with the VM Deployment. As we have dedicated VM networks for each groups, select the appropriate network for S1 IT.

Creating User Role - Allocating VM Networks

Creating User Role – Allocating VM Networks

 

In the resources page, Add the VM template, Hardware profiles and OS Profiles which will be allocated for this group.

 

Creating User Role - Adding Profiles and templates

Creating User Role – Adding Profiles and templates

 

In the permission page, we can adjust the available permissions to some extend. However, this is not an RBAC based delegation.

 

Creating User Role - Delegating the tasks

Creating User Role – Delegating the tasks

 

 

In the next page, select the run as accounts which will be used along with the VM Templates or OS profiles.

Creating User Role - RunAs Accounts

Creating User Role – RunAs Accounts

 

On the next screen, review the changes and proceed with the User Role creation.

 

 

 

Power Saved in last 30 days – Incorrect Reporting

I have successfully configured Power Optimization and Dynamic Optimization few months back, however the report from SCVMM 2012 R2 Console always says that the Power Saved in last 30 days is 0 hours. I am sure that servers are going down as per the schedule and coming back online at the end of the schedule. However, the report seems to be incorrect.

 

Power Saved in last 30 days - Overview

Power Saved in last 30 days – Overview

I had a case opened with Microsoft support and got a confirmation now that its a bug.

Should be fixed in the upcoming update roll-up 3.