ESXi 6.0.x host doesn’t register Cisco ACI’s ARP reponses with Mellanox 10/40 Gb Nics and nmlx4_en driver loaded

August 8, 2016 - - 0 Comment

I’m currently working in a project designing and delivering a private cloud platform based on VMware vRealize and Cisco ACI as the SDN solution.

For almost two days we weren’t able to ping from the ESXi host (Mellanox) to its default gateway provided by a subnet within the Cisco ACI Bridge Domain (BD). However, a physical Windows box (Broadcom) member of the same EPG than the ESXi hosts, was able to ping the same default gateway. This behavior was odd since the ping between members of the same EPG worked fine like between ESXi hosts, or also with the physical Windows machine.

ACI

The first thought that comes to your mind is that you’re missing some setting in your ACI. Why?, because we’re talking about SDN solutions, the philosophy and logic behind that change radically. Now you must know about multi-tenancy, bridge domains, endpoint groups, contracts and so on, so it’s really easy to miss something during the configuration.

Environment

  • ESXi host.
    • HP DL360 Gen9
    • Mellanox 10/40 Gb – MT27520 Family (affected with ARP bug)
      • NIC Driver info:
        • Driver: nmlx4_en
        • Firmware Version: 2.35.5100
        • Version: 3.1.0.0
  • Cisco ACI version 2.0(1n)
  • VMware ESXi 6.0.x
    • Update 1
    • Update 2
    • VMware and HPE OEM ISOs tested

Symptom

  • ESXi host doesn’t reach its default gateway (ACI BD IP).
  • Any traffic routed through the gateway doesn’t reach its destination.
  • ACI replies the ARP request from ESXi but the last one doesn’t register that

Tcpdump-uw in ESXi didn’t show the ACI responses. When we run Wireshark in the physical machine, we could see to ACI reply the ARP requests from ESXi.

capture2

Resolution

After the installation of the last version of Mellanox driver available in the VMware website, the ESXi host began to see the ARP responses. These responses were registered and the communication from the ESXi hosts to the default gateway and other networks worked properly.

Troubleshooting Commands

The following commands were used to perform the troubleshooting from the ESXi host side.

# Display physical network adapter information (counters, ring and driver)
/usr/lib/vmware/vm-support/bin/nicinfo.sh

# Display ARP table
esxcli network ip neighbor list

# Display VMkernel network interfaces
esxcli network ip interface list

# Display the virtual switches
esxcli network vswitch standard list

# Verify port connection
nc -z IP Port

# Capture traffic
tcpdump-uw -vv

Nutanix .NEXT 2016 Conference Highlights

July 4, 2016 - - 0 Comment

My opinion of the Nutanix .NEXT 2016 Conference is coming a bit late but I did not have the chance to take a look until now about what Nutanix announced in the conference.

Let’s to analyze that I consider the most exciting announced features that come the next months. You can see the full list of announcements at Nutanix .NEXT 2016 Announcements: Innovation is Just a Click Away.

A Single Platform for All Workloads

Nutanix .NEXT 2016

Source: Nutanix.com – Single Nutanix Fabric

Nutanix goes step forward with its old message from its beginnings, #NoSAN. Many workloads still run in physical servers because their requirements around resources could jeopardize the performance of other virtual machines within the hyper-converged infrastructure. Those physical workloads require a SAN array, but until now Nutanix didn’t support out-of-the-box the block storage functionality. Even deploy a VSA software on top of Nutanix and expose iSCSI targets was not feasible, it could incur in a performance degradation for the virtual workloads running in the platform.

Nowadays with flash storage price coming down and emerging technologies like NVMe more and more adopted by vendors, starts to make sense the leveraging of unused IOPS and available space of the hyper-converged infrastructure, and expose them to the physical workloads. For this, Nutanix has developed the feature called Acropolis Block Services (ABS). This capability is planned to be available in the 4.7 release.

Acropolis Block Services

Based on the iSCSI protocol, customers can use it similarly to Amazon Elastic Block Store (EBS). I believe the customers will take a look to this feature when they require to replace their SAN arrays. In addition, the distributed storage architecture is a plus from reliability and performance standpoint. I love how easy is to scale a distributed storage solution and how quick customers get more storage and performance in minutes.

Nutanix .NEXT 2016

Source: Nutanix.com – Acropolis Block Services

But, this is not reason enough to replace a SAN array. Many of the SAN arrays are also NAS, that provides file services like NFS and CIFS/SMB. What does Nutanix have to say around this? Nutanix already announced in March 2016 the Acropolis File Services (AFS).

Nutanix .NEXT 2016

Source: Nutanix.com – Acropolis File Services

With both features, the new Acropolis Block Services and the recent Acropolis File Services, Partners are now in the position to keep discussions with customers around if the replacement of their SAN array should be a new array again, or otherwise they can extend their current hyper-converged platform with the deployment of Nutanix storage nodes and use both features, ABS + AFS.

In my opinion, Nutanix still has a step forward more to close the storage cycle. I miss the capability to provide object storage, it’s funny because the Nutanix Distributed File System (NDFS) is based on object storage, but they don’t provide this feature. Developers could use the Nutanix platform like they use Amazon S3. Also it’s true I don’t see many customers consuming object storage on premise.

All Flash on All Platforms

Like I mentioned above, the price of flash storage is coming down and this is an opportunity to include the technology across all platforms (we’re using all flash home labs, why not customers?). The only all flash appliance is the NX-9000, but the new all flash configurations for all platforms will be available this month.

I have the doubt if the all flash option will also be available for Nutanix Xpress platform.

Nutanix .NEXT 2016

Source: Nutanix.com – All-Flash Everywhere

Nutanix Self-Service

Many customers are looking to build their own private cloud using Cloud Management Platform software, but most of them have enough as foundation if they can provision virtual machines in an easy manner (IaaS). If customer uses the CMP just for virtual machine provisioning, they are wasting their investment as the licensing model is usually CPU-based and the entire platform must be licensed.

The Nutanix Self-Service will be a great feature and will help customers to reduce the TCO, same they’re doing now with the adoption of Acropolis Hypervisor (AHV)

Nutanix .NEXT 2016

Source: Nutanix.com – Nutanix Self Service

Operational Tools

Operational teams love Nutanix for its simplicity. In my opinion it’s the Veeam or Rubrik of the hyper-convergence. Nutanix is pushing hard its “Invisible Infrastructure” approach and I must say they’re doing a great job. The “One Click Everything”  functionalities are brilliant, making easy the life for operators.

I’m stunned how powerful and friendly is the analytics module. It’s pretty fast returning results on a readable format. At the same time you can trigger operations from your search, it means you can remediate undesirable situations on a quick and easy manner. Nutanix makes vast use of machine learning to predict and anticipate the operations.

The following functionalities around management and operations were announced:

  • The already mentioned Self-Service.
  • Capacity planning through scenario based modeling.
  • Network visualization.
Nutanix .NEXT 2016

Source: Nutanix.com – Nutanix Network Visualization

Acropolis Container Services

The differentiation of Nutanix’s offer about containers and its competitors is the support of stateful applications. The Acropolis Distributed Storage Fabric provides persistent storage support for containers through the Docker volume extension. How Nutanix manages the containers as virtual machines is not new, VMware already showed the same functionality almost a year ago.

Nutanix .NEXT 2016

Source: Nutanix.com – Acropolis Container Services

Conclusion of Nutanix .NEXT 2016 Conference

Exciting times ahead for Nutanix’s customers with all the new functionalities coming and the new ones in the roadmap. Nutanix has a big margin of improvement ahead and if they follow the same way like at the moment, I’m sure they will be in the market for a long time and will provide solutions for those customers that don’t want to move all their workloads to the public cloud.

Dell PowerEdge C6100: Upgrading to All-Flash Home Lab – Part 2

June 4, 2016 - - 0 Comment

In the previous article, we walked through what kind of reasons made me to acquire an old Dell PowerEdge C6100 with 3 nodes as my “new” home lab. In this article, you will see the upgrades I did to get a All-Flash Home Lab. These upgrades allow us to run solutions like VMware VSAN, or Nutanix Community Edition.

1st Upgrade – USB as ESXi drive

When I bought the C6100 I added to my configuration 3 x 300 GB SAS 3.5″ – 10K RPM drives because at the beginning I didn’t think around build an All-Flash VSAN. If you are thinking to buy a C6100 home lab and make it all-flash, I recommend you to do NOT buy the SAS drives. You can use these money for the USB memories.

Like you know, ESXi can be installed in a SD/USB memory. The minimum space required to install ESXi is really low, so you can just buy a 8 GB memory per node. I bought 3 x SanDisk SDCZ33-008G-B35 8GB for £4.14 each one.

SanDisk

SanDisk SDCZ33-008G-B35 8GB for ESXi installation

I followed the recommendation of Vladan Seget (@vladan) in his article about to use VMware Workstation as tool to install ESXi in the USB drives. It worked like a charm.

2st Upgrade – NVMe + Adapter

Samsung SM951 NVMe 128GB M.2 (Cache for All-Flash Home Lab)

Following the recommendations that William Lam (@lamw) got from his readers and posted in his blog virtuallyGhetto, I bought just 2 x Samsung SM951 NVMe 128GB M.2 for “Caching” Tier. I bought just one at the beginning to see if it worked on a C6100. After check the performance and reliability, I decided to acquire a second one to build a VMware VSAN ROBO deployment (2 x VSAN nodes + 1 x Witness appliance running on Workstation). To install the VSAN Witness Appliance I followed this article by William Lam too, “How to deploy and run the VSAN 6.1 Witness Virtual Appliance on VMware Fusion & Workstation?

You have available two model of Samsung SM951 NVMe 128GB: MZVPV128HDGM (I got this) and MZHPV128HDGM. The first one is a bit cheaper, but the main difference between both is you can’t boot an OS with the first one. If you’re looking to boot from NVMe, you must buy the MZHPV128HDGM model with AHCI support.

NVMe

SM951 NVMe 128GB (MZVPV128HDGM) for “Caching” Tier

Lycom DT-129 Host Adapter for PCIe-NVMe M.2

The Dell PowerEdge C6100 doesn’t include a M.2 socket, so you need a PCI adapter to install the NVMe. The C6100 has only one PCIe 2.0 slot and one Mezzanine slot as well. It means you have limited options to install additional components in your nodes. Anyway, with these two slots you will have enough for install a NVMe drive (PCIe slot) + 10 GbE or LSI® 2008 6Gb SAS (Mezzanine slot). Currently I’m just using the PCIe 2.0 slot for the NVMe, so in the future I can expand with a 10 GbE mezzanine card. The PCI adapter I bought is the 2 x Lycom DT-129, it supports PCIe 3.0 as well as 2.0.

DT-129-4

Lycom DT-129 Host Adapter for PCIe-NVMe M.2

Note: you won’t get the max. performance of your NVMe drive since we’re using PCIe 2.0, but it will be enough to do functional tests.

3rd Upgrade – SSD + 2.5″ to 3.5″ Converter

Samsung 850 EVO 500 GB 2.5″ SSD (Capacity All-Flash Home Lab)

To build an all-flash VSAN platform, I needed to replace the 300 GB – SAS 3.5″ drives for SSD drives as “capacity” tier. I bought 2 x Samsung 850 EVO 500 GB. The drive is connected to the SATA ports available on the motherboard (ICH10) with a “Queue Length” equal to 31.

SSD

Samsung 850 EVO 500 GB 2.5″ SSD for “Capacity” Tier

Tip: If you’re looking to build an all-flash home lab, don’t make the same “mistake” than me, you don’t need to add any drive during the configuration of your Dell PowerEdge C6100 bundle, just add the caddies (at least 3). I added the SAS drives because I didn’t know if the SSD and USB drives would work fine. Now you know that, use this money to buy other components like the USB memories for ESXi installation.

ICY Dock EZConvert 2.5″ to 3.5″

The Dell PowerEdge C6100 is available in two  models, 12-bay of 3.5″ or 24-bay of 2.5″. The model I bought is the 12-bay of 3.5″, so I needed a converter from 2.5″ to 3.5″ for the SSD drives. I did a lot of research to see which converters worked properly since the disks are mounted in a caddy and inserted in a bay. If the SSD doesn’t fit as a 3.5″, you will face lot of connectivity issues with them.

I bought two different converters to see which one was better. The “official” Dell converter (1 x Dell 9W8C4 Y004G) and the 2 x ICY Dock EZConvert Lite MB882SP-1S-3B. To be honest, even using the cheaper model of ICY Dock, it’s much better than the “official” Dell converter. The Dell converter is cheaper, but is really weak keeping the SSD drive on the air, and the holes don’t match properly. I highly recommend the ICY Dock EZConvert Lite MB882SP-1S-3B for any 3.5″ bay regardless of your server vendor.

ICY Dock

ICY Dock EZConvert Lite MB882SP-1S-3B (RECOMMENDED)

 

9W8C4-Y004G-adapter

Dell 9W8C4 Y004G (NOT recommended)

Conclusion

With these upgrades, you can have a really powerful home lab with a reduced investment. The VSAN ROBO installation with 2-nodes + Witness in Workstation works like a charm, even using the embedded dual-port NIC.

I can still do more upgrades and support more workloads just with another small investment. The future upgrades I’m looking for are:

  • Buy the components above for the third node and move from VSAN ROBO with 2-nodes + Witness in Workstation, to an all-flash VSAN platform with 3-nodes.
  • If I need more resources like CPU & RAM, I can still add a second CPU in each node, and 48 GB of RAM more per node to get a total of 6 x Intel L5630 + 288 GB RAM (around £350 this upgrade). In case of storage, I can still add two drives more per node.
  • When the price of 10 GbE switches drops, I can add the 10 GbE Mezzanines and increase the network performance and avoid any bottleneck with VSAN or any other SDS solution.
  • Finally, if I still need add more resources a forth node is not so expensive. For £250-300 you have a node with 2 x Intel L5630 / 96 GB RAM / 128 GB NVMe / 500 GB SSD

Below you can see the Bill of Materials (BoM) of the upgrade. Also, you have available how much will be the next upgrade to enable the third node as part of the VSAN cluster.

BoM Dell PE C6100 Upgrade

Component Qty. Price Total
SanDisk SDCZ33-008G-B35 8GB 3 £4.14 £12.42
SM951 NVMe 128GB (MZVPV128HDGM) 2 £59.99 £119.98
Lycom DT-129 Host Adapter for PCIe-NVMe M.2 2 £21.99 £43.98
Samsung 850 EVO 500 GB 2.5″ SSD 2 £112.00 £224.00
ICY Dock EZConvert Lite MB882SP-1S-3B 2 £11.05 £22.10
Grand Total £422.48
Future Upgrade Investment for 3rd node
SM951 NVMe 128GB (MZVPV128HDGM) 1 £59.99 £59.99
Lycom DT-129 Host Adapter for PCIe-NVMe M.2 1 £21.99 £21.99
Samsung 850 EVO 500 GB 2.5″ SSD 1 £112.00 £112.00
Grand Total (Upgrade 1-node) £193.98

Currently (04/06/2016) the invested money for this home lab is £1007.44

For Sale!!!

The C6100 came with MR-SAS-8704EM2 SAS/SATA controller on the PCIe slot. This controller is supported by VMware vSphere 6.0 U2, but NOT for VSAN. Anyway, vSphere sees the controller as an Avago (LSI) MegaRAID SAS 1078 Controller with a “Queue Depth” of 975. This SAS controller has a bandwidth of 3 Gb/s and 1-port. If you’re interested to buy one of those controllers, I’m selling mine (3) for £25 each one (£350 new). You can reach me with a comment or through Twitter.

SAS

Avago (LSI) MegaRAID SAS 8704EM2 with QL = 975 for VSAN

Nutanix Xpress, the new HCI product line for SMB

May 25, 2016 - - 0 Comment

Yesterday Nutanix announced a new product line for SMB called Nutanix Xpress. The leader vendor of hyper-converged infrastructure took the smart decision of join a huge market like is SMB.

How does Nutanix define SMB?

The meaning of SMB/SME from size/revenue point of view are completely opposite between US and rest of the globe. As EU citizen and having had the chance to see how is the market across Europe, I believe the Nutanix’s approach to cover the spectrum of SMB across the globe is the right one.

How Does Nutanix Define SMB?

Source: Nutanix

What Is Nutanix Xpress?

Nutanix Xpress is a new product line with dedicated teams. It’s NOT a platform of the current Enterprise product line, so it means the product will follow a different release program and also a business unit focused on it.

The solution has the following “limitations”, it means if you require bigger scalability and enterprise features, your choice must be the Enterprise product line (NX Series):

  • Minimum 3 nodes per cluster.
  • Up to 4 nodes per cluster.
  • Up to 2 clusters.

It means you can have up to 8 nodes. If you expect on the future to add more nodes, currently doesn’t exist any kind of upgrade to NX. Nutanix is working to provide this option on future product releases.

New Product Line!

Source: Nutanix

Software – What’s different?

Like you can expect, to achieve a reduction on cost the vendor needs to cut down from somewhere. The easy way is reduce hardware resources, but it’s not enough to be cost-attractive. Many vendors achieve to be competitive because they release different editions of their software with different features on each one. The picture below shows SMB vs. Enterprise product lines:

Software - What's different?

Source: Nutanix

Platform SX-1065 Specifications

The appliance can be configured with the required resources using the same partner tool for Enterprise product line, and it will shipped from factory straight to customer. Nutanix provides an example bundles:

Platform: 1065, Configure to Order

Source: Nutanix

Pricing

The starting price is at $25,000 for the low spec. bundle. The price includes the Software (Xpress Edition), 3-nodes (SX-1065), and Support 3-years (Xpress Support)

Pricing - Suggested Customer Price

Source: Nutanix

Xpress Support

The team providing the support for Nutanix Xpress product line will be the same than for Enterprise. The SMBs will enjoy the same support, it means same quality, same engineers, lot of expertise behind that. The table below shows the differences between supports:

Xpress Support - SMBs get access to world class Nutanix support

Source: Nutanix

Xpress Timeline

The first quotes could be done from June 1st, but it won’t be until early july when Nutanix starts to dispatch the orders. In the meanwhile you can start to take a look the Xpress website to see if it’s the SMB solution you was waiting for.

timeline

Source: Nutanix

Don’t miss the launch event by June 28th. It will be a LIVE Virtual and you can register now!

nutanixevent

Source: Nutanix

Conclusion

I’m excited to see how Nutanix has heard partners and customers to create a new solution for SMBs. I believe the cost vs. features is now more competitive than ever, and many customers will have the chance to take the advantage of this hyper-converged technology.

The challenges I see for partners is how to educate customers to understand which features are really important for them. We can see across many customers how they acquired some solution and using just a low percent of its features.

It’s time to see how moves other vendors like EMC/VMware, and also other players like SimpliVity.