Custom User Groups based on Domains

In an hosting environment, you may have multiple customers logging into your Citrix environment. If you have domain trusts set up, the users may be authenticating and launching applications based on their domain credentials. In edgesight you can run reports based on user groups configured based on each individual domain, but first it has to be configured correctly.

The first thing that needs to happen is you have to get the list of domains from the Edgesight database.

SELECT     domain_name
FROM         vw_es_usergroup_ica_users
GROUP BY domain_name
ORDER BY domain_name

You are then presented with results like the following:

Getdomains

The information you need from this query is the text that is presented in the results. Next you will need to go into your edgesight website. GO to Configure –> Company Configuration –> User Groups. On this window you will need to click New User Group. Enter a name for the domain group you are about to create and make sure that “Take me to the Add Members Page” is checked.

CreateGroup

On the next page make sure you select “Queries” and click “Next” A new page will open and ask you to select a query. Here you will want to select “New Query”. This is where you need to remember your results from the first query. Once you click “New Query”, you will have to type in a name for the query, and in the query text window enter the following:

SELECT     sessid
FROM         vw_es_usergroup_ica_users
WHERE     (domain_name = 'Customer')

createquery
Save the query and test it out. You should see all the users in the domain you listed. In the bottom window make sure you click “Add Query” so that the group uses the query you just created. Congratulations! You can now use this user group to run reports.

Preparing for life without Edgesight with ExtraHop

So, the rumors have been swirling and I think we have all come to the quiet realization that Edgesight is going to be coming to an end. At least the Edgesight we know and Love/Hate.  While we all await the next version of HDX Edgesight we can almost be certain that the data model and all of the custom queries we have written over the last three years will not be the same.

For those of us who have continued with this labor of love trying squeeze every possible metric we could out of Edgesight we are likely going to have to come to grips with the fact that the next generation of Edgesight will not have the same level of metrics we have today.

Let’s be honest, Edgesight has been a nice concept but there has been extensive problematic issues with the agent both from a CPU standpoint (firebird service taking up 90% CPU) and keeping the versions consistent. The real-time monitoring requires elevated permissions of the person looking into the server forcing you to grant your service desk higher permissions than many engineers are comfortable with. I am, for the most part, a “tools”-hater. In the last 15 years I have watched millions of dollars spent on any number of tools, all of which told me that they would be the last tool I would need and all of them in my opinion, for the most part, underwhelming. I would say that Edgesight has been tolerable to me and it has done a great job of collecting metrics but, like most tools I have worked with, it is Agent based, also it cannot log in real-time. The console was so unusable that I literally have not logged into it for the last four years. (In case you were wondering why I don’t answer emails with questions about the console).

For me, depending on an agent to tell you there is an issue is a lot like telling someone to “yell for help if you start drowning”. If a person is under water, it’s a little tough for them to yell for help. With agents, if there is an issue with the computer, whatever that is (CPU, Disk I/O, Memory) will likely impact the agent as well. The next best thing, which is what I believe Desktop Director is using, is to interrogate a system via WMI. Thanks to folks like Brandon Shell, Mark Schill and the people at Citrix who set up the Powershell SDK. This has given rise to some very useful scripting that has given us the real-time logs that we have desperately wanted. That works great for looking at a specific XenApp server but in the Citrix world where we are constantly “proving the negative” it does not provide the holistic view that Edgesight’s downstream server metrics provided.

Proving the negative:

As some of you are painfully aware, Citrix is not just a Terminal Services delivery solution. In our world, XenApp is a Web Client, a Database Client, Printing Client and a CIFS/SMB client. The performance of any of these protocols will result in a ticket resting in your queue regardless of the downstream server performance. Edgesight did a great job of providing this metric letting you know if you had a 40 second network delay getting to a DFS share or a 5000ms delay waiting for a server to respond. It wasn’t real-time but it was better than anything I had used until then.

While I loved the data that Edgesight provided, the agent was problematic to work with, I had to wait until the next day to actually look at the data, unless you ran your own queries and did your own BI integration you had, yet another, console to go to and you needed to provide higher credentials for the service desk to use the real-time console.

Hey! Wouldn’t it be great if there were a solution that would give me the metrics I need to get a holistic view of my environment? Even better, if it were agentless I wouldn’t have to worry about which .NET framework version I had; changes in my OS, the next Security patch that takes away kernel level access and just all around agent bloat from the other two dozen agents I already have on my XenApp sever. Not to mention the fact that the decoupling of GUIDs and Images thanks to PVS has caused some agents to really struggle to function in this new world of provisioned server images.

It’s early in my implementation but I think I have found one….Extrahop.

Extrahop is the brain-child of ADC pioneer Jesse Rothstein who was one of the original developers of the modern Application Delivery Controller. The way Extrahop works is that it sits on the wire and grabs pertinent data and makes it available to your engineer and, if you want, your Operations staff. Unlike wireshark, a great tool for troubleshooting; it does not force you, figuratively, to drink water from a fire hose. They have formed relationships with several vendors, gained insight into their packets and are able to discriminate between which packets are useful to you and which packets are not. I am now able to see, in real-time, without worrying about an agent, ICA Launch times and the Authentication time when a user launches an application. I can also see client latency, Virtual Channel Bytes In and Bytes Out for Printer, Audio, Mouse, Clipboard, etc.

(The Client-Name, Login time and overall Load time as well as the Latency of my Citrix Session)

In addition to the Citrix monitoring, it helps us with “proving the negative” by providing detailed data about Database, HTTP and CIFS connections. This means that you can see, in real-time, performance metrics of the application servers that XenAPP is connecting to. If there is a specific URI that is taking 300 seconds to process, you will see it when it happens without waiting the next day for the data or having to go to edgesightunderthehood.com to see if John, David or Alain have written a custom query.

If there is a conf file that has an improper DNS entry, it will show up as a DNS Query failure. If your SQL Server is getting hammered and is sending RTOs, you will see it in real-time/near-time and be able to save yourself hours of troubleshooting.

(Below, you see the different metrics you can interrogate a XenApp server for.)


Extrahop Viewpoints:
Another advantage of Extrahop is that you can actually look at metrics from the point of view of the downstream application servers as well. This means that if you publish an IE Application and it connects to a web server that integrates with a downstream database server you can actually go to that web server you have published in your application and look at the performance of that web server and the database server. If you have been a Citrix Engineer for more than three years, you should already be used to doing the other team’s troubleshooting for them but this will make it even faster. You basically get a true, holistic view of your entire environment, even outside of XenApp, where you can find bottlenecks, flapping interfaces and tables that need indexing. If your clients are on an internal network, depending on your topology you can actually look at THEIR performance on their workstations and tell if the switch in the MDF is saturated.

Things I have noted so far looking at Extrahop Data:

  • SRV Record Lookup failures
  • Poorly written Database Queries
  • Exessive Retransmissions
  • Long login times (thus long load times)
  • Slow CIFS/SMB Traffic
  • Inappropriate User Behavior

GEOCODING Packets:
Another feature I like is the geocoding of packets, this is very useful to use if you want to bind a geomap to your XenApp servers to see if there is any malware making connections to China or Russia, etc. (I have an ESUTH post on monitoring Malware with Edgesight.) Again, this gives me a real-time look at all of my TCP Connections through my firewall or I can bind it on a per-XenApp, Web Server or even PC node. The specific image below is of my ASA 5505 and took less than 15 seconds to set up (not kidding).

On the wire (Extrahop) vs. On the System (Agent):
I know most of us are “systems” guys and not so much Network guys. Because there is no agent on the system and it works on the wire, you have to approach it a little differently and you can see how you can live without an agent. Just about everything that happens in IT has to come across the wire and you already have incumbent tools to monitor CPU, Memory, Disk and Windows Events. The wire is the last “blind spot” that I have not had a great deal of visibility into from a tools perspective until I started using Extrahop. Yes there was wireshark but for archival purposes and looking at specific streams are not quite as easy. Yes, you can filter and you can “flow TCP Stream” with wireshark but it is going to give you very raw data. I even edited a TCPDUMP based powershell script to write the data to SQL Server thinking I could archive the data that way. I had 20GB of data inside of 30 minutes, with Extrahop you can actually trigger wire captures based on specific metrics and events that it sees in the flow and all of the sifting and stirring is done by Extrahop just leaving you to collect the gold nuggets.

Because it is agentless you don’t have questions like “Will Extrahop support the next edition of XenAPP?” “Will Extrahop Support Windows Server2012” “What version of the .Net Framework do I need to run Extrahop” “I am on Server Version X but my agents are on version Y”

The only question you have to answer to determine if your next generation of hardware/software will be compatible with Extrahop is “Will you have an IP Address?” If your product is going to have an IP Address, you can use Extrahop with it. Now, you have to use RFC Compliant protocols and Extrahop has to continue to develop relationships with vendors for visibility but in terms of deploying and maintaining it, you have a much simpler endeavor than other vendors. The simplicity of monitoring on the wire is going to put an end to some of the more memorable headaches I have had in my career revolving around agent compatibility.

Splunk/Syslog Integration:
So, I recently told my work colleagues that the next monitoring vendor that shows up saying I have to add yet another console I am going to say “no thanks”. While the Extrahop console is actually quite good and gives you the ability to logically collate metrics, applications and devices the way you like, it also has extensive Splunk integration. If there are specific metrics that you want sent to an external monitor, you can send them to your syslog server and integrate them into the existing syslog strategy be it Envision, KIWI Syslog Server or any other SIEM product. They have a javascript based trigger solution that allows you to tap into custom flows and cherry pick those metrics that are relevant to you. Currently, there is a very nice and extensive Splunk APP for Extrahop.

I am currently logging (in real-time) the following with Extrahop:

  • DNS Failures (Few people realize how poor DNS can wreck nth-tiered environments)
  • ICA OPEN Events (to get logon times and authentication times)
  • HTTP User Agent Data
  • HTTP Performance Data

So if this works by monitoring the wire, isn’t it the Network team’s tool?
The truth is it’s everybody’s tool, the only thing you need the network team to do is span ports for you (then log in and check out their own important metrics). You can have the DBA log in and check the performance of their queries, the Network Engineers can log in and check jitter, TCP retransmissions, RTOs and throughput, the Citrix guy can log in and check Client Latency, STA Ticket delivery times, ICA Channel throughput, Logon/Launch Times, the Security team can look for TCP Connections to China, Russia and catch people RDPing home to their home networks and the Web Team can go check which user-Agents are the most popular to determine if they need to spend more time accommodating tablets. Everybody has something they need on the wire; I sometimes fear that we tend to select our tools based on what technical pundits tell us too. In our world, from a vendor standpoint, we tend to like to put things in boxes (which is a great irony given everyone’s “think outside the box” buzz statement). We depend on thought leaders to put products in boxes and tell us which ones are leaders, visionaries, etc. I don’t blame them for providing product evaluations that way, we have demanded that. For me, Extrahop is a great APM tool but it is also a great Network Monitoring tool and has value to every branch of my IT Department. This is not a product whose value can be judged by finding its bubble in a Gartner scatter plot.

Conclusion:
I have not even scratched the surface of what this product can do. The triggers engine basically gives you the ability to write nearly any rule you want to log/report any metric you want. Yes, there are likely things you can get with an agent that you cannot get without an agent but in the last few years these agents have become a lot like a ball and chain. You basically install the appliance or import the VM, span the ports and watch the metrics come in. I have had to change my way of thinking of metrics gather from system specific to siphoning data off the wire but once you wrap your head around how it is getting the data you really get a grasp of how much more flexibility you have with this product than with other agent based solutions. The Splunk integration was the icing on the cake.

I hope to record a few videos showing how I am doing specific tasks, but please check out the links below as they have several very good live demos.

To download a trial version: (you have to register first)
http://www.extrahop.com/discovery/

Numerous webinars:
http://www.extrahop.com/resources/

Youtube Channel:
http://www.youtube.com/user/ExtraHopNetworks?feature=watch

Thanks for reading and happy holidays!

John


Anyone have an Edgesight Database they can donate so I can make some videos?

One of the barriers to conducting Edgesight Training is the fact that there is not a sample database that can be used to conduct a training session.

I would like to emulate what I have done with some of the Netscaler videos on the http://xen-trifuge.com site but I don’t have a database to test with.

If anyone has a database they can back up and make available for XenAPP and even XenDesktop/Endpoints  if possible. 

What you get?

I would like to record about six hours of Custom SQL Queries and maybe even some custom report writing.

Thanks

John

EdgeSight: Timezone offsets

Intro

If you have implemented any of the ad hoc SQL queries available on this site, you may have noticed that most time queries are offset by –4 or –5 hours. This is due to the fact that the EdgeSight database uses GMT to record time and John and I are located in the U.S. Eastern Time Zone.

In this post we will take a look at some tables in the EdgeSight database that you can utilize to make your queries more local and portable.

seamonsterThere Be Monsters Here!

Most of my experience with EdgeSight has been with the database views that summarize and organize the vast amounts of data that EdgeSight collects. On occasion I’ve gone where few dare to tread to look directly at the tables for the data I need.

EdgeSight’s views are dizzying enough, but the table structure of the EdgeSight database is intimidating to the SQL neophyte. Despite this, I was inspired to look deeper after David did his post on session counts. His query uses the ‘timezone’ table to determine the time offset for his query and this got me curious. How can I utilize this to make my queries easier to maintain and more portable?

Timezone table

Lets take a look at the timezone table

SELECT *
FROM timezone

image

The above picture is only a portion of the table. It consists of 74 rows. Yeah makes  total sense right? Naturally, I had to do some more checking. If you check the company table, we get a clue.

SELECT *
FROM company

image

As you can see in the above picture, each company in the EdgeSight database has an associated Time Zone and Language. In this case, we have a timezone id (tzid) of 13 and a culture_name of en-US. If we cross reference the tzid with the timezone table we get:

image

Looking at the result above, we can see that this is for the U.S. Eastern time zone and includes daylight savings time as well. You can configure this in the EdgeSight console by clicking on the Configure tab. Look under the Server Configuration section and click on Companies to see where to add/edit company information.

image

So for the example above, I have the language set to English and the time zone set to U.S. Eastern Time which has a GMT offset of –5 hours.

How does this help me?

Let’s take a look at a query I’ve posted on this site before:

DECLARE @today datetime
DECLARE @app varchar(20)
SET @today = convert(varchar,getdate(),111)
SET @app = 'notepad.exe'
SELECT DISTINCT CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111) AS 'Date', serv.machine_name AS 'Server', serv.[user] AS 'Username', serv.client_name, serv.client_address, serv.client_version, icatbl.client_directory, apptbl.app_description, apptbl.exe_name, apptbl.exe_version
FROM vw_es_archive_application_usage apptbl, vw_ctrx_archive_server_start_perf serv, vw_es_usergroup_ica_users icatbl
WHERE apptbl.exe_name like '%'+@app+'%'
and apptbl.account_name <> 'UNKNOWN'
and serv.client_address not like '192%'
and icatbl.client_directory not like '\%'
and convert(varchar(10),dateadd(hh,-4,apptbl.time_stamp), 111) >= @today-30
and apptbl.sessid = serv.sessid and icatbl.sessid = serv.sessid
and CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111) = CONVERT(VARCHAR(10),DATEADD(hh,-4,serv.time_stamp), 111)
ORDER BY CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111), 'username'

As you can see above, all the timedate fields are offset by –4 hours. To keep from having to change the offset to –5 or –4 depending on what time of year it was (standard vs. daylight savings time), I developed a simple select query that determines the current offset by checking the timezone table.

DECLARE @tzbias INT
SELECT @tzbias = case when use_daylight = '0' then standard_bias else daylight_bias end from timezone where tzid = 13

In layman’s terms, look at the timezone table where the timezone id (tzid) is equal to 13. If the field ‘use_daylight’ is equal to zero, use the ‘standard_bias’ otherwise use the ‘daylight_bias’.

I’m setting whichever bias this query returns equal to the variable @tzbias. I then use the @tzbias variable in my timedate fields in my queries. If we rewrite the above query with the tzbias variable, we get the following:

DECLARE @tzbias INT
SELECT @tzbias = case when use_daylight = '0' then standard_bias else daylight_bias end from timezone where tzid = 13
DECLARE @today datetime
DECLARE @app varchar(20)
SET @today = convert(varchar,getdate(),111)
SET @app = 'notepad.exe'
SELECT DISTINCT CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111) AS 'Date', serv.machine_name AS 'Server', serv.[user] AS 'Username', serv.client_name, serv.client_address, serv.client_version, icatbl.client_directory, apptbl.app_description, apptbl.exe_name, apptbl.exe_version
FROM vw_es_archive_application_usage apptbl, vw_ctrx_archive_server_start_perf serv, vw_es_usergroup_ica_users icatbl
WHERE apptbl.exe_name like '%'+@app+'%'
and apptbl.account_name <> 'UNKNOWN'
and serv.client_address not like '192%'
and icatbl.client_directory not like '\%'
and convert(varchar(10),dateadd(mi,@tzbias,apptbl.time_stamp), 111) >= @today-30
and apptbl.sessid = serv.sessid and icatbl.sessid = serv.sessid
and CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111) = CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,serv.time_stamp), 111)
ORDER BY CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111), 'username'

Since the timezone bias is in minutes, I had to change the DATEADD functions to use mi for minutes. Now I can use my queries year around without worrying about daylight savings time changes.

I hope this provides you some options when doing ad hoc queries against the EdgeSight database. As always, I welcome all comments and questions.

Thanks,
Alain

Are You There EdgeSight? It’s me Worker

Intro

If you rely on EdgeSight to provide accurate and timely information about your farm you have to assume that all your EdgeSight Worker Agents are functioning as expected.  Or do you?  In this post, we will review the information that the EdgeSight console provides you as well as creating a dashboard that can give you detailed information on your EdgeSight Worker agents.

EdgeSight Console: Configuration Tab

The first place you can check the health of your EdgeSight server and its agents in under the Configuration Tab

image

Along the left-hand side of this screen you will see way to configure your workers, alerts, and other server settings. We’re going to spotlight some items under Server Configuration and Server Status.

image

Server Configuration: Status
Your first overview of server health comes when you click on Status under Server Configuration.

image

The first line lists the workers that were and were not updated in the current 24 hour period as well as newly added workers.  Right away you see (in this case) that 48 workers updated and 32 did not.  That’s a large number of EdgeSight agents that have not uploaded their data into your database and therefore any reports you are running will not include these systems.  The question becomes which systems did not update and why?

ES_ZQUEUE…Gesundheit!

The service on the EdgeSight server that processes payloads from the worker agents is the es_zqueue (seen under Server Script Host Status).  This process is not reporting any issues and there are no pending payloads to process (we’ll look at this more later).

Server Status: Messages
Message Status lists all the system messages generated by EdgeSight.  This includes Agent errors, payload errors, and new agents alerts to name a few.

image

Here you will see which servers had a payload issue (Data Upload), but not a reason why systems have not updated the database.

Server Status: Server Script Host
Clicking on this in your EdgeSight Console will show you the following screen:

image

The es_zqueue manages the modules that keep the EdgeSight database updated, cleaned, and running smoothly. The core_zpd_loader 1 and 2 manage the data payloads from devices with the EdgeSight agent including errors. Clicking on the triangle will show the following menu.

image

Clicking on View Log will allow us to investigate why a payload might have failed or created an error.

4/13/2012 5:18:03 AM: PayloadLoader: Starting payload load for C:\Program Files (x86)\Citrix\System Monitoring\Server\EdgeSight\Data\WebLoad\Inst_33.zpd
4/13/2012 5:18:11 AM: PayloadLoader: Payload load completed with errors for C:\Program Files (x86)\Citrix\System Monitoring\Server\EdgeSight\Data\WebLoad\Inst_33.zpd. Error: -2146233088: Citrix.EdgeSight.Loader. System.Exception

As we can see if this example, the payload completed with an error and we can try searching Citrix to see if there is a resolution related to this error, but we do not see which server failed to upload any data.

I’ve walked through the diagnostic information that is available in the EdgeSight console to show that we still do not have a clear sign of which servers have updated the EdgeSight database recently. To address this issue, I did some digging around in the EdgeSight database and created a query that links the instance, machine, and OS_version tables.

The Query

DECLARE @tzbias INT
SELECT @tzbias = case when use_daylight = '0' then standard_bias else daylight_bias end from timezone where tzid = 13
SELECT    i.instid, m.name as 'System', ip_address AS 'IP', product_version AS 'ES Version',
CASE dept_set_type    WHEN 1 THEN 'XenApp' WHEN 2 THEN 'Endpoint' END AS 'ES Agent',
CONVERT(VARCHAR,DATEADD(mi,@tzbias,last_sync),100) AS 'Last Sync',
CONVERT(VARCHAR,DATEADD(mi,@tzbias,last_config_start),100) AS 'Last Config Check',
CONVERT(DECIMAL(19, 2),(last_db_size/1048576.0)) AS 'Last FBDB Size (MB)',
CASE o.short_name    WHEN 'Windows Server 2008' THEN 'W2K8'
WHEN 'Windows Server 2008 R2' THEN 'W2K8R2'
WHEN 'Windows Server 2003' THEN 'W2K3'
WHEN 'Windows XP'           THEN 'XP'
ELSE 'Other' END AS 'OS',
CASE o.ptype        WHEN 'Standard x64 Edition' THEN 'Std x64'
WHEN 'Professional'         THEN 'Pro'
WHEN 'Enterprise Edition'   THEN 'Ent'
WHEN 'Standard Edition'     THEN 'Std'
WHEN 'Enterprise x64 Edition' THEN 'Ent x64'
ELSE 'Other' END AS 'Edition',
sp_level,
CONVERT(VARCHAR,DATEADD(mi,@tzbias,tstamp),111) AS 'Date Added',
cps_farm_name,i.cps_product_name, i.cps_product_version, i.cps_product_service_pack
FROM instance i,machine m, os_version o
WHERE m.machid=i.machid and i.osid = o.osid
ORDER BY dateadd(mi,@tzbias,last_sync) DESC

The Report

image

Click on the image to see a larger version..

With this dashboard (I created it based on the query above in SQL Reporting Services) you can quickly see which servers have updated (Last Config Ck) and which have not. Armed with this information you can review your EdgeSight Agent worker schedules or check the agent on the system in question to make sure it is communicating with the EdgeSight server.

As always I welcome all questions and comments.

Thanks,
Alain