In Part 1 of the VMware Hybrid Cloud Extension troubleshooting series, issues generally seen during the initial setup, and ways to troubleshoot them will be discussed. The connectivity diagram in the index page will be referred frequently, so do have a quick look.

Relevant traffic path is shown for each of the section.

Download OVA

The first step while setting up HCX is to download the on-prem connector OVA. The option is initially greyed out after navigating to the HCX Cloud Manager “<hcxCloudMgr>/hybridity/ui/hcx-client/index.html#/administration/upgradeapp” System Updates page but give it a few seconds for the download option to be made available. This step involves the flow below from the cloud manager to hybridity-depot URL. In the VMware cloud side this is automatically taken care of.

HCX OVA download connectivity.
Source: https://hcx.design/2019/12/13/hcx-network-port-diagrams
  • The OVA is downloaded from the CDN hybridity-depot.vmware.com, so the client machine from which the image is downloaded needs internet connectivity
  • Some enterprise proxies can slow down the download speed, have limits set, cause timeouts and even failures.
HCX Connector Download link

Deploy HCX Connector OVA

OVA deployment is pretty straight forward and issues are generally not seen during the step. The VM deployment failures, if any, are generally related to the environment and the troubleshooting can be done as with any image deployment.

If a lot of time is spent to complete the VM deployment wizard steps, the deployment immediately throws error after submitting the task. This is common to any VM deployment and not related to HCX – so keep the IP details handy to input in the wizard.

Activation

The first step after the on-prem HCX Connector is online, is to activate it. Refer documentation for respective environments to get the license key. The on-prem connector reaches out to the HCX service URL connect.hcx.vmware.com to complete the activation –

HCX Activation flow

Activation would fail if the connector is unable to connect to the internet. Some of the errors –

“The provided HCX Activation Server URL is invalid”

“SSL Error. Untrusted SSL Connection”

“Failed to activate HCX instance”

Connectivity failure to ‘connect.hcx.vmware.com’ URL is a primary reason for the failures.

Possible CausesTroubleshooting Steps
Time driftVerify NTP Server under hcxConnectorIP:9443 > Time Settings
(or)
From HCX Connector SSH session: type ‘date’ to verify
Unable to resolve connect.hcx.vmware.com1. Verify DNS IP under hcxConnectorIP:9443 > Administration > Network Settings > DNS Servers
2. From HCX Connector SSH session: Ping or nslookup the URL connect.hcx.vmware.com to verify name resolution
No Internet connection1. Verify routing/firewall whether the HCX Connector IP subnet can get to the Internet
2. Configure proxy hcxConnectorIP:9443 > Administration > Network Settings > Proxy
MITM ProxySSL inspection proxies could send their own certificates to the connector instead of original service certificate. Refer next section (Proxy setup) for more details on how to troubleshoot this
HCX Activation – Possible causes and troubleshooting steps

Proxy Setup

HCX connector needs to connect to the public URLs and is generally configured with a proxy to achieve this (though not mandatory).

HCX Proxy setup

> Add Exclusions as in the user guide

“By default, when you configure a proxy server, the system uses that server for all HTTPS connections, including the local vCenter Server, ESXi, NSX, and HCX-IX. For a successful deployment, define all related proxy exceptions when you configure a proxy server.”

The internal networks as a whole could be excluded or the management subnets covering all the above components at the minimum should be excluded e.g. 10.0.0.0/8 and associated FQDNs using wildcard *.domain.com. Also exclude the HCX Cloud Manager IP if it is not supposed to go through the proxy.

Important vCenter/NSX registration failures, site pairing issues and even migration failures are some of the symptoms if this step is not done correctly.

> Check certificates

Even after configuring the proxy correctly and adding exclusions, connector connections may fail – especially activation and site pairing.

Symptoms Errors like the below are symptoms that point to connectivity issues between the Connector and service URLs. These are caused by the proxy typically.

“The HCX Manager has failed to reach https://connect.hcx.vmware.com beyond the grace period. Restore this connection to resume HCX services.”

“SSL Error: Untrusted SSL Connection”

“PKIX path building failed: java.security.cert.CertPathBuilderException: Unable to find certificate chain”

Potential Cause This could be caused by the Man-In-The-Middle (MITM) proxies. Refer to this ZScaler link to see how such a proxy handles SSL negotiations on behalf of the client. Because of this, the connector may not trust the certificate sent by the proxy and connections fail.

Try the below command from HCX Connector SSH session. HCX service URL generally sends the output similar to the below:

openssl s_client -connect connect.hcx.vmware.com:443 -showcerts -servername connect.hcx.vmware.com
in this output you should see:
Certificate chain
0 s:/C=US/ST=California/L=Palo Alto/O=VMware, Inc/CN=connect.hcx.vmware.com
i:/C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2012 Entrust, Inc. - for authorized use only/CN=Entrust Certification Authority - L1K

Resolution If the proxy server sends its own certificate instead of the above, import the required proxy certs using either of these methods:

  1. Import certificate using the URL from Connector: hcxConnectorIP:9443 > Select Certificate > Trusted CA Certificate > URL and paste the Service URL connect.hcx.vmware.com or the HCX cloud manager URL for example. Refer here.
  2. Full certificate chain needs to be imported in some cases – get the certificates from the security team and import using the content or file option.

> Proxies if configured, also play a role while downloading or updating the service mesh appliances. Due to security policies, some enterprise proxies can slow down the download speed, have limits set, cause timeout and even failures.

> Finally, use proxy credentials that doesn’t expire.

Unable to access HCX Plugin

Symptoms Cannot access the HCX Portal. Errors such as ‘Unable to process HCX Inventory details‘ and ‘Error loading site pairings‘ are shown while accessing the HCX plugin.

Potential Cause The user account used to login to the vCenter or user portal does not have sufficient permissions under HCX vSphere Role mapping.

Resolution

  • Add the relevant ‘vCenter SSO domain\group’ under the vSphere Role mapping page for the member accounts to have HCX access. By default, ‘vsphere.local\administrators’ group is added.
  • If using a custom vCenter SSO domain name, change vsphere.local to correct name as shown in the picture below
  • If AD groups need to be added under vSphere Role mapping, ensure the AD groups are part of a local vCenter SSO group. If not, add the AD groups to a SSO group prior to adding in HCX.
  • If the AD group isn’t already part of the local Administrators group and is not desired: Create a new vCenter SSO group e.g. HCXadmins > add AD groups in it so that HCX can recognize them > Now add the same AD groups under the vSphere Role Mapping page as shown below.
HCX vSphere Role Mapping

Site Pairing

This step involves the connection from the connector to the HCX Cloud manager IP or URL

HCX Site pairing Connection

Failures in site pairing means an issue with the above connection.

  1. Routing and Firewall (port 443 outbound) between the Connector IP and Cloud Manager IP/URL should be checked.
  2. Proxy configuration and exclusion should be checked as described in the previous section
[admin@hcx-manager ~]$ telnet <Cloud Manager IP/URL> 443
Trying <Cloud Manager IP/URL>...
Connected to <Cloud Manager IP/URL>.
Escape character is '^]'.

At times everything is configured fine, firewall can see the traffic passing through and so on but the connection still fails. It is then maybe time to go back to basics – check asymmetric routing, duplicate IPs for the on-prem HCX appliances etc., (this happens more frequently than one thinks!)

Thanks to VMware HCX team, field teams, PMs, support, customers for all of the above inputs. Please add your feedback in the comments to include additional issues and fixes.

5 thoughts

  1. Thanks for the help, got an issue with Unable to process HCX Inventory details‘ and ‘Error loading site pairings‘ are shown while accessing the HCX plugin and i realize that we have a custom sso domain.

    Like

  2. Dude, you’ve saved the day again. I have moved ESXi servers do the different subnet. HCX tried to push OVA via proxy, so adding exclusion fixed the issue. Thanks!

    Like

Leave a comment