Nerd on a wire: Packet-Panning for Operational Gold at 20 gb/s

Gathering Operational Intelligence in today’s 10,40 and soon 100 Gigabit Ethernet networks is a far different task than it was back in the 100 meg or even gigabit days. In this post I want to discuss the different triggers that make up the Citrix Alpha bundle and why they are important and just what they are collecting. I like to think of ExtraHop like a pan that someone may use to try and find Gold. When the data current is at 10’s of gigabits, the use of triggers can be your best friend helping sift through the digital gravel and sand to find those gold nuggets of information that reside within everyone’s wire data watershed.

What are triggers:
While ExtraHop collects a myriad of metrics (over 2000 per device) there are times when you may want to sift through the packets to pull specific nuggets of information out of the data stream. An example would be, ExtraHop collects metrics on ICA latency and organizes it for you on a per server basis allowing you to drill into the VDA instance and observe the latency associated with the users on that box. That is great but using triggers we have the ability to get into even finer details. In the example in the previous article, we can also write a trigger that parses out the 24 bit network ID and collate the average Latency for an entire subnet. In the trigger text you see below, we are setting up key value pairs that will be written to the datastore. Note we instantiate the client IP with the “var IP” variable. Then you see that we have applied the “mask” object to the client ip metric to get the IP’s 24 bit network ID. This mask can be changed to accommodate how you have subnetted your network.
#########################################################################################################################################################
var appname = ‘XenApp’
var IP = Flow.client.ipaddr;

//Begin grabbing ICA Open Info
if (event == “ICA_TICK”) {
//Grab ICA Channel Info by UserName and write to CTXOPS
for(I=0; I < ICA.channels.length; I++) {
log(ICA.channels[I].description)
Application(appname).metricAddCount(“ica_channel_user_cnt_” + ICA.channels[I].description, ICA.channels[I].serverBytes,1);
Application(appname).metricAddDetailCount(“ica_channel_user_cnt_detail”, “User: ” + ICA.user + ” Server: ” + ICA.host + ” Channel: ” + ICA.channels[I].description, ICA.channels[I].serverBytes);
}

Application(appname).metricAddDataset(“ica_lat_by_subnet”,ICA.networkLatency);
Application(appname).metricAddDetailSampleset(“ica_lat_by_subnet_detail”,IP.mask(24), ICA.networkLatency);
Application(appname).metricAddDetailSampleset(“ica_lat_by_user_detail”, ICA.user, ICA.networkLatency);
Application(appname).metricAddDetailSampleset(“ica_lat_by_clientIP_detail”, Flow.client.ipaddr, ICA.networkLatency);

############################################################################################################################################################

I have now used ExtraHop triggers to take my wire data mining to a whole new level. Below I want to talk about all of the triggers that I have included in the Alpha Bundle and explain what they are doing.

Citrix Infrastructure Metrics: (Assigned to all VDA’s and ICA Listeners)
This is a more holistic trigger that fires on the ICA_OPEN and ICA_TICK events. In this trigger I am panning for ICA Launch metrics via the ICA_OPEN event. I am also gather ICA Channel footprint information from the ICA_TICK event. The ICA_TICK event also has my Latency metrics that I used to report user latency.

CTX Request Ticket Cancelled: (Assigned to DDC, XML Brokers or VDAs)
This trigger came about as I was troubleshooting 1030 errors with wire shark. As some of you know, the 1030 error (or also called “protocol driver error”) can be somewhat maddening in larger environments. Basically you get a call saying users are getting kicked out when they try to launch Citrix. What I noticed in the XML is that there was a heading called “RequestTokenCancelled” that fired every time I got a 1030 error. So whenever I see that text in the XML between the DDC/XML Broker and the clients I log the Citrix server IP Address so that Citrix teams can quickly narrow down problem servers. This is also an example of how ExtraHop is consistent with the “panning for gold” narrative. We can tap into things like the XML communications between hosts and pull out very useful information. This is what separates us from our competitors who provide very narrow and rigid, non-editable metrics.

DDC Registrations: ( For XenDesktop 5x-7x)
This was set up specifically to monitor the number of servers that have “phone home” to their DDC. You can assign it to your Citrix server device group or the DDCs. The way I found this was that I noticed that every five minutes, the VDAs would connect to the DDC over a URI that had iRegistrar as part of the uri-stem. I simply increment counters and map them to a specific DDC. This can be used to tell you if your DDCs are load balancing or if you have had a rash of systems that did not come up from the night before.

ICA DB Errors: (Assign to VDAs)
This is actually not how it sounds. While I have limited enterprise applications with the Demo Data the goal here is to log the DB errors that come from your ICA servers and VDI desktops. The goal here is to be able to provide service desk staff as well as engineers the ability to troubleshoot the actual applications. An example of this would be if a user called saying that a database application was having issues, the person supporting them could look at the ICA DB Errors and see if that user or their VDI instance saw any database issues. The idea here being that the call maybe actually get escalated to someone other than the Citrix team for a change? Depending on your environment, we would customize this trigger to accommodate your web/http/cifs based applications and keeping their metrics tied to just the Citrix environment.

ICA User Debugging (CTXOPS):
This trigger was written around the same idea as ICA DB Errors. We gather some user metrics by firing on the ICA_OPEN, ICA_TICK and ICA_CLOSE events. This gives us data on user sessions that can, if needed, be included in the escalation information.

Profile Server Performance – XenApp/XenDesktop: (Assing to VDAs)
This nearly always has to be customized to accommodate the CIFS path for the users profiles. If you are running one of the premium profile services we can retrofit a trigger for Appsense, Res, etc. Basically we want to look for the ISO that is on the user’s desktop or the 20GB of music they have in their roaming profile.

PVS Trigger: (Assign to VDAs or PVS farm)
These are layer 4 turn time triggers set up specifically to look for PVS traffic. I have not seen a problem PVS environment yet so I am not sure what to provide here. What you could use this for is to log in and make sure everybody booted up properly or if you are having issues check the performance, turn times of the PVS traffic.

Slow ICA load time debugging: (Assign to VDAs)
In this trigger I am earmarking what a slow launch and a normal launch is. This trigger makes up the metrics on the grid that shows Fast and Slow launches. I am also looking for CIFS errors that could cause issues with the Citrix load times.

XenApp DNS Server Performance: (Assign to VDAs)
Few people have a full appreciation for how slow DNS will wreck everything. This triggers keeps track of DNS performance for your Citrix farm. This can be used to troubleshoot slow logons, slow applications and all around general weirdness that ensues when DNS is less than pristine.

XenApp DNS Timeouts: (Assign to VDAs)
Self-explanatory, it is very important that you not have DNS timeouts for legit records such as SRV records or application server records.

XenApp Zero Windows:
Here we are monitoring every time the XenApp/XenDesktop server closes its TCP window (always due to I/O related issues) as well as when their peers close their windows. This can be handy when the proverbial “Citrix is Slow” and you notice that the back end Database or Web Server is closing its TCP window.

XML Broker Performance: (Assign to XML Brokers)
Again, while looking in wire shark I noticed that whenever apps enumerated a uri stem with wnpr.dll would show up. So we key off the performance of this URI and provide metrics on it.

I felt the need to include this and provide information on how we were gathering the data and what/why these triggers were important. I also want to assure the readers that, at worst, they may have to edit these triggers similar to what 80% of you have already had to do with Power Shell scripts that you have downloaded. We also offer the interface up to you and have a support forum where you can get help when you want to write your own triggers. Unlike some of the more closed architectures that leave you no/minimal ability to customize the dashboard, we leave you the option of your own canvas where you can paint your own operational masterpiece.

I will be recording a video that includes a walk thru. There are a few other triggers in the bundle that are not mentioned, they are not quite ready for the public but send me an email if you would like to try it and we will work through it.

Thank you for reading

John M. Smith, CTP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s