SDx

Dynamic Enforcement for Network Selection in vRA 7

December 22, 2016 - - 6 Comments

I’m working in the design and deployment of a large Enterprise Cloud project where the multi-tenancy is vital for the success of the project. When you work in a private cloud project a common shared infrastructure is the typical approach to make feasible the business case. Usually, the physical segregation is based mainly on security requirements and not anymore on whom purchased the hardware. The multi-tenancy segregation is moved up to the logical level, making the infrastructure a shared commodity.

From network standpoint, the logical segregation is supported by the use of VLAN, PVLAN, GRE, VXLAN protocols and so on. Every time a virtual machine is provisioned through vRealize Automation, it requires the network access to communicate with other workloads.

In the case of vSphere as an endpoint in vRA, the NIC(s) of the virtual machine will be connected to one of the port groups available within the reservation where that is being provisioned. The first level of logical segregation in vRA is the business group object, where it can have one or more reservations (the second level). If the user is not member of the business group with the linked reservation(s), it cannot provision any virtual machine on that.

Source: VMware.com

The reservation includes what network(s) the user can connect its virtual machine, but these networks are not populated dynamically in vRA. The easy approach is the creation of the custom property called “VirtualMachine.NetworkN.Name”, where the N is the index of your virtual machine NIC (starting at 0). During the creation of this custom property, you will have the chance to create a static list, or a dynamic one using any script action you have available in your vRealize Orchestrator platform that vRA is consuming.

The main challenge of a static list is the maintenance of the network list that is shown to the user, also this list is shared for all the users. From security standpoint, this is not an issue since the network must be enabled in the user reservation, if that is not enabled the user will get an error that the virtual machine cannot be provisioned because the reservation is not entitled to consume the network. While this is not involve a security breach, expose the whole network list is not the best approach since sensitive information can be contained in the network name. For this reason, create a dynamic network list on fly based in the user business group and the linked reservation to that, seems like the better approach I’ve found until now. When you’re dealing with a multi-tenancy environment, the security requirements are always the most importance piece of the design.

To achieve that, we are going to leverage a built-in vRO action available from version 7.1 called “getApplicableNetworks” (getApplicableNetworks). We will tweak a bit this action since out-of-the-box shows all the entitled networks for the user regardless for what business group is requesting the virtual machine. To filter the networks based in the business group, we will create a custom property in the business group with the value equal to the business group name. The reason to create this custom property in the business group is because this is not exposed during the wizard, for this reason vRO cannot get the value using the ASD properties. To solve this gap, the vRO action will include a string input populating the value of the business group custom property.

vRO Configuration

Let’s start with the configuration of our dynamic enforced network list:

  • Copy the built-in vRO action in a new location. With this step we ensure in future releases of the vRA plug-in nothing is broken. Also, the copy will be the action to modify and include the filter to retrieve only the business group’s reservation(s) that the requester is entitled for.

Copy the vRO script action “getApplicableNetworks”

In vRealize Orchestrator version 7.2, VMware has changed the action code and the previous dependency in 7.1 with the action called “getReservationsForUser”, has been changed to “getReservationsForUserAndComponent”. The issue is the action “getApplicableNetworks” doesn’t work in 7.2 anymore because VMware has forgotten to update the script to include as an input some values that the dependency “getReservationsForUserAndComponent” requires.

If you are using vRO 7.1, you can skip this step.

  • Duplicate the current “getReservationsForUserAndComponent” action

Duplicate the “getReservationsForUserAndComponent” action

  • Downgrade to version 1.0. The action name will be changed to “getReservationsForUser”.

Downgrade the copied action to version 1.0 (getReservationsForUser)

  • Edit the copied action and add a string input called “subtenant”

Add a string input

  • Go to the script tab and update the 4th line with the action folder and name you have duplicated and downgraded above.

var reservations = System.getModule(“com.joseluisgomez.vra.reservations“).getReservationsForUser(user, tenant, host);

  • Between the 4th line (var reservations) and the 5th line (var applicableNetworks), add the following line to find the business group populated with the custom property value from vRA.

var subtenants = vCACCAFEEntitiesFinder.findSubtenants(host, subtenant);

Code updated (lines 4th and 5th)

  • The last step with the action is to add a conditional. It will look for the business group name (subtenant name) matching with the input populated in vRA in all the gather reservations. Add the following line after the “for each“, also you can add an additional tabulation to the current code to properly align. In addition, an additional closing curly bracket is required to close the conditional. The red code is the new one to add.

for each(var res in reservations) {
 if(res.getSubTenantId() == subtenants[0].id){
  var extensionData = res.getExtensionData();
  if(extensionData) {
   var networks = extensionData.get(“reservationNetworks”);
   if(networks) {
    for each(var network in networks.getValue()) {
     var path = network.getValue().get(“networkPath”);
     applicableNetworks.put(path.label, path.label);
    }
   }
  }
 }
}
return applicableNetworks;

Conditional to add

We are done with vRO. Now it’s the moment to consume this action from vRA and our blueprint.

vRA Configuration

  • Create a custom property for the business group with your preferred name. In my case, I’ve used the following one:
    • Property name. NGDC.Software.VMware.vRA.Subtenant.Name
    • Property value. The business group name (ex.: Tenant_Environment_Service_Index)

Custom Property with Business Group’s name as value

  • Create a custom property definition called “VirtualMachine.Network0.Name” with the following settings:
    • Name: VirtualMachine.Network0.Name
    • Label: Select a network
    • Visibility: All tenants
    • Display order: 1
    • Data type: string
    • Required: yes
    • Display as: Dropdown
    • Values: External values
    • Script action: Click the change button and select the script action called getApplicableNetworksBySubtenant we have created in the previous steps.
    • Input parameters: Check the bind checkbox and as value type the business group custom property “NGDC.Software.VMware.vRA.Subtenant.Name“. Note: The business group custom properties are not automatically discovered if you click the dropdown list. You must type the custom property.

  • The last step is to add into the blueprint the custom property definition you have created in the step above. The blueprint doesn’t require any network object, just the vSphere virtual  machine object and add to that a virtual NIC. The virtual NIC0 must have as custom property the following one:
    • Name: VirtualMachine.Network0.Name
    • Value: Empty
    • Encrypted: No
    • Overridable: Yes
    • Show in Request: Yes

Blueprint custom property

Result

The best way to test our custom property definition is to create two business groups with one reservation each, make the same user the group manager and create two service catalogue entitlements, one for each business group.

The next screenshot shows the first business group (Tenant1-Global). As you can see, this business group has two networks. A static dvPortGroup (Main) and a NSX VXLAN dvPortGroup (5008-Photon-Hosts)

Tenant1-Global business group dynamic enforced network list

The following screenshot shows the networks for the second business group (CORP_PROD_APP1_01). As you can see, the business groups also has two networks but different ones. Two NSX VXLAN dvPortGroups (5005-NTXCE-Mgmt and 5004-NTXCE-VMs)

CORP_PROD_APP1_01 business group dynamic enforced network list

Conclusion

I hope you have found it useful and can help you with new and current deployments. This is the best approach I have found until now supporting the multi-tenancy based in business group. This approach also works if the multi-tenancy is done using the tenant functionality in vRA.

If you have liked it, don’t hesitate to share it with your contacts.

ESXi 6.0.x host doesn’t register Cisco ACI’s ARP reponses with Mellanox 10/40 Gb Nics and nmlx4_en driver loaded

August 8, 2016 - - 3 Comments

I’m currently working in a project designing and delivering a private cloud platform based on VMware vRealize and Cisco ACI as the SDN solution.

For almost two days we weren’t able to ping from the ESXi host (Mellanox) to its default gateway provided by a subnet within the Cisco ACI Bridge Domain (BD). However, a physical Windows box (Broadcom) member of the same EPG than the ESXi hosts, was able to ping the same default gateway. This behavior was odd since the ping between members of the same EPG worked fine like between ESXi hosts, or also with the physical Windows machine.

ACI

The first thought that comes to your mind is that you’re missing some setting in your ACI. Why?, because we’re talking about SDN solutions, the philosophy and logic behind that change radically. Now you must know about multi-tenancy, bridge domains, endpoint groups, contracts and so on, so it’s really easy to miss something during the configuration.

Environment

  • ESXi host.
    • HP DL360 Gen9
    • Mellanox 10/40 Gb – MT27520 Family (affected with ARP bug)
      • NIC Driver info:
        • Driver: nmlx4_en
        • Firmware Version: 2.35.5100
        • Version: 3.1.0.0
  • Cisco ACI version 2.0(1n)
  • VMware ESXi 6.0.x
    • Update 1
    • Update 2
    • VMware and HPE OEM ISOs tested

Symptom

  • ESXi host doesn’t reach its default gateway (ACI BD IP).
  • Any traffic routed through the gateway doesn’t reach its destination.
  • ACI replies the ARP request from ESXi but the last one doesn’t register that

Tcpdump-uw in ESXi didn’t show the ACI responses. When we run Wireshark in the physical machine, we could see to ACI reply the ARP requests from ESXi.

capture2

Resolution

After the installation of the last version of Mellanox driver available in the VMware website, the ESXi host began to see the ARP responses. These responses were registered and the communication from the ESXi hosts to the default gateway and other networks worked properly.

Troubleshooting Commands

The following commands were used to perform the troubleshooting from the ESXi host side.

# Display physical network adapter information (counters, ring and driver)
/usr/lib/vmware/vm-support/bin/nicinfo.sh

# Display ARP table
esxcli network ip neighbor list

# Display VMkernel network interfaces
esxcli network ip interface list

# Display the virtual switches
esxcli network vswitch standard list

# Verify port connection
nc -z IP Port

# Capture traffic
tcpdump-uw -vv