In Part 1 of the VMware Hybrid Cloud Extension troubleshooting series, issues generally seen during the initial setup, and ways to troubleshoot them will be discussed. The connectivity diagram in the index page will be referred frequently, so do have a quick look.
- HCX Troubleshooting series – Index
- Part 1: Install & Setup (this post)
- Part 2: Service Mesh & Profiles
- Part 3: Network Extension (Coming Soon)
- Part 4: Migrations (Coming Soon)
- Part 5: Performance (Coming Soon)
- Part 6: Logs and miscellaneous (Coming Soon)
Relevant traffic path is shown for each of the section.
The first step while setting up HCX is to download the on-prem connector OVA. The option is initially greyed out after navigating to the HCX Cloud Manager ><a rel="noreferrer noopener" href="https://<hcxCloudMgr>/hybridity/ui/hcx-client/index.html#/administration/upgradeapp" target="_blank" data-type="URL" data-id="https:// System Updates page but give it a few seconds for the download option to be made available. This step involves the flow below from the cloud manager to hybridity-depot URL. In the VMware cloud side this is automatically taken care of.
- The OVA is downloaded from the CDN hybridity-depot.vmware.com, so the client machine from which the image is downloaded needs internet connectivity
- Some enterprise proxies can slow down the download speed, have limits set, cause timeouts and even failures.
Deploy HCX Connector OVA
OVA deployment is pretty straight forward and issues are generally not seen during the step. The VM deployment failures, if any, are generally related to the environment and the troubleshooting can be done as with any image deployment.
If a lot of time is spent to complete the VM deployment wizard steps, the deployment immediately throws error after submitting the task. This is common to any VM deployment and not related to HCX – so keep the IP details handy to input in the wizard.
The first step after the on-prem HCX Connector is online, is to activate it. Refer documentation for respective environments to get the license key. The on-prem connector reaches out to the HCX service URL connect.hcx.vmware.com to complete the activation –
Activation would fail if the connector is unable to connect to the internet. Some of the errors –
“The provided HCX Activation Server URL is invalid”
“SSL Error. Untrusted SSL Connection”
“Failed to activate HCX instance”
Connectivity failure to ‘connect.hcx.vmware.com’ URL is a primary reason for the failures.
|Possible Causes||Troubleshooting Steps|
|Time drift||Verify NTP Server under hcxConnectorIP:9443 > Time Settings |
From HCX Connector SSH session: type ‘date’ to verify
|Unable to resolve connect.hcx.vmware.com||1. Verify DNS IP under hcxConnectorIP:9443 > Administration > Network Settings > DNS Servers|
2. From HCX Connector SSH session: Ping or nslookup the URL connect.hcx.vmware.com to verify name resolution
|No Internet connection||1. Verify routing/firewall whether the HCX Connector IP subnet can get to the Internet|
2. Configure proxy hcxConnectorIP:9443 > Administration > Network Settings > Proxy
|MITM Proxy||SSL inspection proxies could send their own certificates to the connector instead of original service certificate. Refer next section (Proxy setup) for more details on how to troubleshoot this|
HCX connector needs to connect to the public URLs and is generally configured with a proxy to achieve this (though not mandatory).
> Add Exclusions as in the user guide
“By default, when you configure a proxy server, the system uses that server for all HTTPS connections, including the local vCenter Server, ESXi, NSX, and HCX-IX. For a successful deployment, define all related proxy exceptions when you configure a proxy server.”
The internal networks as a whole could be excluded or the management subnets covering all the above components at the minimum should be excluded e.g. 10.0.0.0/8 and associated FQDNs using wildcard *.domain.com. Also exclude the HCX Cloud Manager IP if it is not supposed to go through the proxy.
Important vCenter/NSX registration failures, site pairing issues and even migration failures are some of the symptoms if this step is not done correctly.
> Check certificates
Even after configuring the proxy correctly and adding exclusions, connector connections may fail – especially activation and site pairing. This could be caused by the Man-In-The-Middle (MITM) proxies. Refer to this ZScaler link to see how such a proxy handles SSL negotiations on behalf of the client. Because of this, the connector may not trust the certificate sent by the proxy and connections fail.
Errors like the below are symptoms that point to connectivity issues between the Connector and service URLs. These are caused by the proxy typically.
“The HCX Manager has failed to reach https://connect.hcx.vmware.com beyond the grace period. Restore this connection to resume HCX services.”
“SSL Error: Untrusted SSL Connection”
“PKIX path building failed: java.security.cert.CertPathBuilderException: Unable to find certificate chain”
From HCX Connector SSH session:
openssl s_client -connect connect.hcx.vmware.com:443 -showcerts -servername connect.hcx.vmware.com in this output you should see: Certificate chain 0 s:/C=US/ST=California/L=Palo Alto/O=VMware, Inc/CN=connect.hcx.vmware.com i:/C=US/O=Entrust, Inc./OU=See www.entrust.net/legal-terms/OU=(c) 2012 Entrust, Inc. - for authorized use only/CN=Entrust Certification Authority - L1K
If the proxy server sends its own certificate, import them using either of these methods:
- Import certificate using the URL from Connector: hcxConnectorIP:9443 > Select Certificate > Trusted CA Certificate > URL and paste the Service URL connect.hcx.vmware.com or the HCX cloud manager URL for example. Refer here.
- Full certificate chain needs to be imported in some cases – get the certificates from the security team and import using the content or file option.
> Proxies if configured, also play a role while downloading or updating the service mesh appliances. Due to security policies, some enterprise proxies can slow down the download speed, have limits set, cause timeout and even failures.
> Finally, use proxy credentials that doesn’t expire.
Unable to access HCX Plugin
Symptoms Cannot access the HCX Portal. Errors such as ‘Unable to process HCX Inventory details‘ and ‘Error loading site pairings‘ are shown while accessing the HCX plugin.
Potential Cause The user account used to login to the vCenter or user portal does not have sufficient permissions under vSphere Role mapping.
- Add the relevant ‘vCenter SSO domain\group’ under the vSphere Role mapping page for the member accounts to have HCX access. By default, ‘vsphere.local\administrators’ group is added.
- If using a custom vCenter SSO domain name, change vsphere.local to correct name as shown in the picture below
- If AD groups need to be added under vSphere Role mapping, ensure the AD groups are part of a local vCenter SSO group. If not, add the AD groups to a SSO group prior to adding in HCX.
- If the AD group isn’t already part of the local Administrators group and is not desired: Create a new vCenter SSO group e.g. HCXadmins > add AD groups in it so that HCX can recognize them > Now add the same AD groups under the vSphere Role Mapping page as shown below.
This step involves the connection from the connector to the HCX Cloud manager IP or URL
Failures in site pairing means an issue with the above connection.
- Routing and Firewall (port 443 outbound) between the Connector IP and Cloud Manager IP/URL should be checked.
- Proxy configuration and exclusion should be checked as described in the previous section
[admin@hcx-manager ~]$ telnet <Cloud Manager IP/URL> 443 Trying <Cloud Manager IP/URL>... Connected to <Cloud Manager IP/URL>. Escape character is '^]'.
At times everything is configured fine, firewall can see the traffic passing through and so on but the connection still fails. It is then maybe time to go back to basics – check asymmetric routing, duplicate IPs for the on-prem HCX appliances etc., (this happens more frequently than one thinks!)
Thanks to VMware HCX team, field teams, PMs, support, customers for all of the above inputs.