By Troy Couch, Enterprise Architect, Entisys360
Citrix Virtual Apps and Desktops (formerly known as Citrix XenApp/XenDesktop) leverages SQL Server to host the required Site database. Citrix and Entisys360 recommend a highly available SQL Server deployment as a best practice. A highly available SQL Server setup should protect against a database outage, but there are times when communication can be lost which would impact user connections. When this occurs, Citrix falls back on a feature called Local Host Cache (LHC) to continue connection brokering during the outage. LHC was available in 6.5 and prior versions, but was not brought back until 7.12 (LHC also replaced the connection leasing feature delivered in XenDesktop 7.6).
The one flaw in using LHC as a fall back is if the LHC itself becomes corrupted. The most common causes for LHC corruption are an orphaned SID, bad icon file or rebooting during the import process. If this happens and connection to the database fails, the LHC will fail to operate as expected which creates an outage.
Some blogs and support articles have stated this has been fixed in newer releases (7.14+), but the latest round of issues that led to this write-up occurred with a long-time client running a Virtual Apps and Desktops 7.18 environment (Note: You should know that from now on Citrix will be using a YYMM format for all future releases similar to Windows 10).
Citrix does have a couple articles about the corrupt LHC issue namely CTX228758 and CTX230775, but a non-Citrix blog provides the best explanation – https://citrixguyblog.com/2017/08/22/localhostcache-error-505-the-citrix-config-sync-service-failed-an-import/. There are 3 significant parts of this article
- It explains how to enable the import logging
- Second, it points out the folder should be deleted completely when performing an LHC reset per the Citrix articles
- The most critical take away is that any environment configured with LHC *must* have alerts in place to determine when an import fails on any Delivery Controller. One of the best sources for the Windows events is here https://blog.citrix24.com/xendesktop-7-15-local-host-cache-explained/ under the Monitoring section. The following are the events that should be monitored with whatever Windows Event ID monitor solution you leverage. (Note: The 505 Error is the most critical)
Local Host Cache Event IDs to Watch
|Error||505||Citrix High Availability Service||Broker Server||An import to the local DB failed; see below for more information|
|Information||3500||Citrix Broker Service||Broker Server||The Citrix Broker Service has detected that the issue with communication with the database has been resolved and will resume normal brokering activity using configuration in the main site database.|
|Information||3501||Citrix Broker Service||Broker Server||The Citrix Broker Service has detected an issue with communication with the database. To preserve functionality, responsibility for brokering requests will be handed over to the Citrix High Availability Service using locally cached site configuration.|
|Information||3502||Citrix High Availability Service||Broker Server||The Citrix High Availability Service has become active and will broker user request for sessions until the issue discovered with the normal brokering activity is resolved.|
|Information||3503||Citrix High Availability Service||Broker Server||The issue discovered with the normal brokering activity has been resolved, and the Citrix High Availability Service has now stopped participating in brokering user requests for sessions|
Monitoring these events should help identify:
- If a LHC import failed…
- If a LHC became active…
So in a nutshell….
- Monitor for 505 error on all Delivery Controllers as part of your basic deployment.
- If a 505 error occurs, run the PS script under CTX230775 to look for orphaned SIDs on the Delivery Controller experiencing the issue, and clean them up.
- If a 505 error continues, enable logging per the ‘Citrixguyblog.com’ article on the Delivery Controller experiencing the issue,. Then look for the icon generating the error and fix it.
- If a 505 error continues, perform the LHC reset at Step 3 under CTX230775 BUT also delete any garbage found in C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Temp\ during the process on the Delivery Controller experiencing the issue.
- Import should work after that. Continue to monitor for 505 because this will all happen again.
We would like to thank Lucas Doran (long time client and part of the Entisys360 family) for his contribution to this write-up.
Written by: Troy Couch