Go with the Flow! Extrahop’s FLOW_TICK feature

I was test driving the new 3.10 firmware of Extrahop and I noticed a new feature that I had not seen before (it may have been there in 3.9 and I just missed it). There is a new trigger called FLOW_TICK, that basically monitors connectivity between two devices at layer 4 allowing you to see the response times between two devices regardless of L7 Protocol. This can be very valuable if you just want to see if there is a network related issue in the communication between two nodes. Say, you have an HL7 interface or a SQL Server that an application connects to. You are now able to capture flows between those two devices or even look at the Round Trip time of tiered applications from the client, to the web farm to the back end database. When you integrate it with Splunk you get an excellent table or chart of the conversation between the nodes.

The Trigger:
The first step is to set up a triggler and select the “FLOW_TICK” event.

Then click on the Editor and enter in the following Text: (You can copy/Paste the text and it should appear as the graphic below)

log(“RTT ” + Flow.roundTripTime)
RemoteSyslog.info(
” eh_event=FLOW_TICK” +
” ClientIP=”+Flow.client.ipaddr+
” ServerIP=”+Flow.server.ipaddr+
” ServerPort=”+Flow.server.port+
” ServerName=”+Flow.server.device.dnsNames[0]+
” RTT=”+Flow.roundTripTime
)

Integration with Splunk:
So if you have your integration with Splunk set up, you can start consulting your Splunk interface to see the performance of your layer 4 conversations using the following Text:
sourcetype=”Syslog” FLOW_TICK | stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

This should give you a table that looks like this: (Note you have the Client/Server the Port and the total number of sessions as well as the Round Trip Time)

If you want to narrow your search down you can simply put a filter into the first part of your Splunk Query: (Example, if I wanted to just look at SQL Traffic I would type the following Query)
sourcetype=”Syslog” FLOW_TICK 1433
| stats count(_time) as TotalSessions avg(RTT) by ClientIP ServerIP ServerPort

By adding the 1433 (or whatever port you want to filter on) you can restrict to just that port. You can also enter in the IP Address you wish to filter on as well.

INFOSEC Advantage:
Perhaps an even better function of the FLOW_TICK event is the ability to monitor egress points within your network. One of my soapbox issues in INFOSEC is the fact that practitioners beat their chests about what incoming packets they block but until recently, the few that got in could take whatever the hell they wanted and leave unmolested. Even a mall security guard knows that nothing is actually stolen until it leaves the building. If a system is infected with Malware you have the ability, when you integrate it with Splunk and the Google Maps add-on, to see outgoing connections over odd ports. If you see a client on your server segment (not workstation segment) making a 6000 connections to a server in China over port 8016 maybe that is, maybe, something you should look into.

When you integrate with the Splunk Google Maps add-on you can use the following search:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)\sServerPort” | rex field=_raw “ServerIP=(?<NetID>\b\d{1,3}\.\d{1,3}\.\d{1,3})” |geoip IP | stats avg(RTT) by ClientIP IP ServerPort IP_city IP_region_name IP_country_name

This will yield the following table: (Note that you can see a number of connections leaving the network to make connections in China and New Zealand, the Chinese connections I made on purpose for this lab and the New Zealand connections are NTP connections embedded into XenServer)

If you suspected you were infected with Malware and you wanted to see which subnets were infected you would use the following Splunk Query:
sourcetype=”Syslog” FLOW_TICK
%MalwareDestinationAddress%
| rex field=_raw “ServerIP=(?<IP>.[^:]+)\sServerPort” | rex field=_raw “ClientIP=(?<NetID>\b\d{1,3}\.\d{1,3}\.\d{1,3})” | geoip IP | stats count(_time) by NetID

Geospatial representation:
Even better, if you want to do some big-time geospatial analysis with Extrahop and Splunk you can actually use the Google Maps application you can enter the following query into Splunk:
sourcetype=”Syslog” FLOW_TICK | rex field=_raw “ServerIP=(?<IP>.[^:]+)\sServerPort” | rex field=_raw “ClientIP=(?<NetID>\b\d{1,3}\.\d{1,3}\.\d{1,3})” |geoip IP | stats avg(RTT) by ClientIP NetID IP ServerPort IP_city IP_region_name IP_country_name | geoip IP

Conclusion:
I apologize for the RegEx on the ServerIP field, for some reason I wasn’t getting consistent results with my data. You should be able to geocode the ServerIP field without any issues. As you can see, the FLOW_TICK gives you the ability to monitor the layer 4 communications between any two hosts and when you integrate it with Splunk you get some outstanding reporting. You could actually look at the average Round Trip Time to a specific SQL Server or Web Server by Subnet. This could quickly allow you to diagnose issues in the MDF or if you have a problem on the actual server. From an INFOSEC standpoint, this is fantastic, your INFOSEC team would love to get this kind of data on a daily basis. Previously, I used to use a custom Edgesight Query to deliver a report to me that I would look over every morning to see if anything looked inconsistent. If you see an IP making a 3389 connection to an IP on FIOS or COMCAST than you know they are RDPing home. More importantly, the idea that an INFOSEC team is going to be able to be responsible for everyone’s security is absurd. We, as SyS Admins and Shared Services folks need to take responsibility for our own security. Periodically validating EGRESS is a great way to find out quickly if Malware is running amok on your network.

Thanks for reading

John M. Smith

Where wire data meets machine data

So what exactly IS wire data? We have all heard a lot about Machine data but most folks do not know what wire data is or how it can both augment your existing Operational Intelligence endeavor as well as provide better metrics than traditional APM solutions. Extrahop makes the claim that they are an Agentless solution. They are not unique in the claim but I believe they are pretty unique in the technology. It comes down to a case of trolling and polling. Example: A scripted SNMP process is “polling” a server to see if there are any retransmissions. Conversely Extrahop is Trolling for data as a passive network monitor and sees the retransmissions as they are occurring on the wire. Polling is great as long as the condition you are worried about is happening at the time you do the polling. It is similar to saying “Are you having retransmissions?” (SNMP Polling) vs. “I see you are having a problem with retransmissions”. Both are agentless but there is a profound difference in terms of the value each solution delivers.

Where an agent driven solution will provide insight into CPU, Disk and Memory, wire data will give you the performance metrics of the actual layer 7 applications. It will tell you what your ICA Latency as it is measured on the wire, it will tell you what SQL Statements are running slow and which ones are not, it will tell you which DNS records are failing. The key thing to understand is that Extrahop works as a surveillance tool and is not running on a specific server and asking WMI fields what their current values are. This is profoundly different than what we see in traditional tools in the last 10-12 years.

When Machine data meets Wire Data:
I am now over 9 months into my Extrahop deployment and we have recently started a POC with Splunk and the first task I performed was to integrate Extrahop wire data into the Splunk Big Data back end. All I can say is that it has been like yin and yang. I am extremely pleased with how the two products integrate together and the fusion of wire data with machine data will give my organization a level of visibility that they have never had before. This is, in my opinion, the last piece to the Operational Intelligence puzzle.

In this post I want to talk about three areas where we have been able to see profound improvement in our environment and some of the ways we have leveraged Splunk and Extrahop to accomplish this.

How does data get from Extrahop to Splunk?
Extrahop has this technology called Triggers. Basically, there is a mammoth amount of data flowing through your Extrahop appliances (up to 20GB per Second) and we are able to tap into that data as it is flowing by and send it to Splunk via Syslog. This allows me to tap into CIFS, SQL, ICA, MQSeries, HTTP, MySQL, NFS, ORACLE as well as other Layer 7 flows and allow me to send information from those flows (Such as Statement, Client IP, Server and Process Time for SQL) to Splunk where I can take advantage of their parsing and big data business intelligence. This takes data right off the wire and puts it directly into Splunk. Just like any Unix based system or Cisco device that is set to use Syslog, what I like about Extrahop is the ability to discriminate between what you send to Splunk and what you don’t send to Splunk.

Extrahop/Splunk Integration: SQL Server Queries

Grabbing SQL Queries and reporting on their performance:
One of the most profound metrics we have noted since we started integrating Splunk and Extrahop was the ability to create a flow then cherry pick metrics from it. Below you will see a pair of Extrahop Triggers (the drivers for Splunk integration) the first trigger builds the flow by taking the DB.statement and the DB.procedure fields (pre-parsed on the wire) and creating a flow that you can then tap into when you send you syslog message in the next trigger.

The stmt (var stmt) variable, refers to the flow that we just created above, we will instantiate this flow and pull from it key metrics such as statement and procedure and couple it with the DB.tprocess and then tie in the process time of specific SQL Statements.

At the bottom you see the RemoteSyslog.info command that sends the data to Splunk (or KIWI with SQL, or what we call “skunk”).

Note below, I am NOT logging the database name but that is a trigger option in Extrahop if you have more than one database that uses similar table names. Also note, the condition if (DB.tprocess >=0). I am basically grabbing every database process. This measurement is in milliseconds so if you only wanted to check database queries that took longer than one second it would read if (DB.tprocess>=1000)


For myself, I assign both of these triggers to my Citrix XenApp servers and they are able to report on the database transactions that occur in my Citrix environment. Obviously, you can apply this triggers to your webservices, individual clients as well as the database servers themselves. In my case, I already had a device group for the XenApp servers.

This translates into the metrics you see below where Splunk automatically parses the data for me (YES!) and I am ready to start drilling into it to find problem queries, tables and databases.

Below you see how easy (well, I recommend the O’Reily “Regular Expressions” book) it is to now parse your wire data to provide the performance of specific queries. As you can see below, this allows you to see the performance of specific queries and get an understanding of how specific tables (and their corresponding indexes) are performing. The information you see in the graphic below can be delivered to Splunk in real-time and you can get this kind of insight without running SQL Profiler. If they are logging into the application with Windows credentials, you will have the user ID as well.

Also, you don’t have to know regex every time, you can save the query below as a macro and never have to type the regex ever again. You can also make that rex field a static column. I am NOT an regex guru, I managed to get every field parsed with a book and Google.

For me this now allows you to report on average process time by:

  • Database Server
  • User ID
  • Database Table (if you know a little regex)
  • Database
  • Client Subnet
  • Client IP Address
  • Individual Stored Procedure

Basically, once your data is indexed by Splunk’s Big Data back end, you have “baseball stats” as in it is crazy what you can report on (Example: who hit the most home runs from the left side of the plate in an outdoor stadium during the month of July). You can get every bit as granular as that in your reporting and even more.

Extrahop/Splunk Integration: DNS Errors

Few issues will be as maddening and infuriating as DNS Resolution issues. A windows client can pine for resolution for as long as five seconds. This can create some serious hourglass time for your end users and impact the performance of tiered applications. An errant mistake in a .conf file mapping to an incorrect host can be an absolute needle in a haystack. With the Splunk integration, (Extrahop’s own console does a great job of this as well) you can actually integrate the DNS lookup failures as the happen in real time and take them off the wire and into your Splunk Big Data platform. Below you see the raw data as it happens. (I literally went to a DOS prompt and started typing NSLOOKUP on random thought up names. (The great irony being that in this age of domain squatting, 1/3 of them actually came back!!!!). As my mentor and brother James “Jim” Smith once told me “if you have issues that are sometimes there and sometimes not there, it’s probably DNS” or “If DNS is not absolutely pristine, funny things happen” or my all-time favorite quote from my Brother Jim “Put that GOD DAMN PTR record in or I will kick your phucking ass!” Needless to say, my brother Jim is rather fond of the DNS Failure record keeping of Extrahop.

Below you see a very simple trigger that essentially logs the client IP, the DNS Server IP and the DNS Query that was attempted, the condition is set so that it is triggered in the event of an error.

Below is the resultant raw data.


As with the SQL Data we had above, we have more Parsing goodness because we are integrating the data into Splunk: Note the server cycling through the domain DNS Suffix thus doubling the number of failures.

So within the same vein as “baseball stats” you can report on DNS lookup failures by DNS Query (As you see above, those records who fail to be looked up most often) but you also have the ability to report on the following:

  • DNS Failures by Client IP and by DNS Query (Which server has the misconfigured conf file)
  • DNS Failures by DNS Server
  • DNS Failures by Subnet (bad DHCP setting?)

Proper DNS pruning and maintenance takes time, until Extrahop I cannot think of how I would monitor DNS failures outside of wireshark (Great tool but not much big data or business intelligence behind it). The ability to keep track of DNS failures will go a very long way in providing needed information to keep the DNS records tight. This will translate into faster logon times (especially if SRV lookups are failing) and better overall client-server and nth-tiered application performance.

Extrahop/Splunk Integration: Citrix Launch Times

One of the more common complaints of Citrix admins is the slow launch times. There are a number of variables that Extrahop can help you measure but for this section we will simply cover how to keep track of your launch times.

Below you see a basic trigger that will keep track of the load time and login time. I track both of these metrics as often, if the login time is 80-90% of the overall load time you likely need to take a look at group policies or possibly loopback processing. This can give you an idea of where to start. If you have a low logiinTime metric but a high loadTime metric it could be something Network/DNS related. You create this query and assign it to all XenApp Servers.


The Raw Data: Below you see the raw data, I am not getting a username yet, there is a trick to that I will cover later but you see below I get the Client Name, Client IP, Server IP and I would have my server name if my DNS was in order (luckily my brother Jim isn’t here)

As with the previous two examples, you now can start to generate metrics on application launch performance.


And once again with the baseball stats theme you can get the following metrics once your Extrahop data is integrated into Splunk:

  • Average Launch time by UserName
  • Average Launch time by Client Name
  • Average Launch time by Client IP
  • Average Launch time by Customer Subnet (using some regex)
  • Average Launch time by Application (as you see above)
  • Average Launch time by XenApp Server (pinpoint a problem XenApp server)

Conclusion:
While I did not show the Extrahop console in this post the Extrahop console is quite good, I wanted to show how you could integrate wire data into your Splunk platform and make it available to you along with your machine data. While you are not going to see CPU, Disk IOPS or Memory utilization on the wire, you will seem some extremely telling and valuable data. I believe that all systems and system related issues will manifest themselves on the wire at some point. An overloaded SQL Server will start giving you slower ProcessTime metrics. A flapping switch in an MDF at a remote site might start showing slower Launch Times in Citrix and a misconfigured .conf file may cause lookup failures for your tiered applications that you run. These are all metrics that may not manifest themselves with agent driven tools but you can note them on the wire. Think of Extrahop as your “Pit Boss” engaging in performance surveillance of your overall systems.

I have found the integration between Splunk and Extrahop gives me a level of visibility that I have never had in my career. This is the perfect merger of two fantastic data sources.

In the future I hope to cover integration for HTTP, CIFS as well as discuss the security benefits of sending wire data to Splunk.

Thanks for reading.

John M. Smith

Useful Regex statements

Getting the Subnet ID (24 bit Mask)
This is the REX statement that will let you query for a 24 bit subnet ID.  This will let you check Citrix Latency and Load/Launch times by Subnet within a customer’s network.

rex field=client_ip “(?<net_id>\d+\.\d+\.\d+)” | stats avg(Load) count(load_time) by net_id

Getting performance on SQL INSERT statements:

The REGEX below will allow you to get the actual table that an insert command is updating.  This could be useful to see if SQL write actions are not performing as expected.  This REX will parse out the table name so that you can check the performance of specific tables.

rex field=_raw “Statement=insert INTO\s(?<Table>.[^\s]+)”

Getting the Table Name within a SELECT statement:
The REX statement below allows you to get the table that a select statement is running against.  Mapping the performance by Table name may give you an indication that you need to re-index.
| rex field=_raw “[fF][rR][oO][mM]\s(?<Table>.[^\s]+)”

From Buzzword to Buziness: A conversation about Operation Intelligence with 3 Pioneers at Citrix Synergy

From Buzzword to Buziness: A conversation about Operation Intelligence with 3 Pioneers at Citrix Synergy

While a family commitment is keeping me from moderating my Geek Speak session you get a “Moderator Upgrade” in the form of Splunk’s Brandon Shell.  

As we have discussed, there are changes to Citrix’s Edgesight Product as well as some great innovations by smaller APM players that are well worth looking into.

This Geek Speak features the CEO of the recent “Best of Interop” awardee Extrahop, Jesse Rothstein, Jason Conger, an architect for Splunk and Dana Gutride, an architect for Citrix’s next generation of monitoring.  

In this session we will question the dominant paradigm surrounding monitoring the user experience and engage in a discussion about what it is to leverage Operational Intelligence. 

 

Custom User Groups based on Domains

In an hosting environment, you may have multiple customers logging into your Citrix environment. If you have domain trusts set up, the users may be authenticating and launching applications based on their domain credentials. In edgesight you can run reports based on user groups configured based on each individual domain, but first it has to be configured correctly.

The first thing that needs to happen is you have to get the list of domains from the Edgesight database.

SELECT     domain_name
FROM         vw_es_usergroup_ica_users
GROUP BY domain_name
ORDER BY domain_name

You are then presented with results like the following:

Getdomains

The information you need from this query is the text that is presented in the results. Next you will need to go into your edgesight website. GO to Configure –> Company Configuration –> User Groups. On this window you will need to click New User Group. Enter a name for the domain group you are about to create and make sure that “Take me to the Add Members Page” is checked.

CreateGroup

On the next page make sure you select “Queries” and click “Next” A new page will open and ask you to select a query. Here you will want to select “New Query”. This is where you need to remember your results from the first query. Once you click “New Query”, you will have to type in a name for the query, and in the query text window enter the following:

SELECT     sessid
FROM         vw_es_usergroup_ica_users
WHERE     (domain_name = 'Customer')

createquery
Save the query and test it out. You should see all the users in the domain you listed. In the bottom window make sure you click “Add Query” so that the group uses the query you just created. Congratulations! You can now use this user group to run reports.

Anyone have an Edgesight Database they can donate so I can make some videos?

One of the barriers to conducting Edgesight Training is the fact that there is not a sample database that can be used to conduct a training session.

I would like to emulate what I have done with some of the Netscaler videos on the http://xen-trifuge.com site but I don’t have a database to test with.

If anyone has a database they can back up and make available for XenAPP and even XenDesktop/Endpoints  if possible. 

What you get?

I would like to record about six hours of Custom SQL Queries and maybe even some custom report writing.

Thanks

John

EdgeSight: Timezone offsets

Intro

If you have implemented any of the ad hoc SQL queries available on this site, you may have noticed that most time queries are offset by –4 or –5 hours. This is due to the fact that the EdgeSight database uses GMT to record time and John and I are located in the U.S. Eastern Time Zone.

In this post we will take a look at some tables in the EdgeSight database that you can utilize to make your queries more local and portable.

seamonsterThere Be Monsters Here!

Most of my experience with EdgeSight has been with the database views that summarize and organize the vast amounts of data that EdgeSight collects. On occasion I’ve gone where few dare to tread to look directly at the tables for the data I need.

EdgeSight’s views are dizzying enough, but the table structure of the EdgeSight database is intimidating to the SQL neophyte. Despite this, I was inspired to look deeper after David did his post on session counts. His query uses the ‘timezone’ table to determine the time offset for his query and this got me curious. How can I utilize this to make my queries easier to maintain and more portable?

Timezone table

Lets take a look at the timezone table

SELECT *
FROM timezone

image

The above picture is only a portion of the table. It consists of 74 rows. Yeah makes  total sense right? Naturally, I had to do some more checking. If you check the company table, we get a clue.

SELECT *
FROM company

image

As you can see in the above picture, each company in the EdgeSight database has an associated Time Zone and Language. In this case, we have a timezone id (tzid) of 13 and a culture_name of en-US. If we cross reference the tzid with the timezone table we get:

image

Looking at the result above, we can see that this is for the U.S. Eastern time zone and includes daylight savings time as well. You can configure this in the EdgeSight console by clicking on the Configure tab. Look under the Server Configuration section and click on Companies to see where to add/edit company information.

image

So for the example above, I have the language set to English and the time zone set to U.S. Eastern Time which has a GMT offset of –5 hours.

How does this help me?

Let’s take a look at a query I’ve posted on this site before:

DECLARE @today datetime
DECLARE @app varchar(20)
SET @today = convert(varchar,getdate(),111)
SET @app = 'notepad.exe'
SELECT DISTINCT CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111) AS 'Date', serv.machine_name AS 'Server', serv.[user] AS 'Username', serv.client_name, serv.client_address, serv.client_version, icatbl.client_directory, apptbl.app_description, apptbl.exe_name, apptbl.exe_version
FROM vw_es_archive_application_usage apptbl, vw_ctrx_archive_server_start_perf serv, vw_es_usergroup_ica_users icatbl
WHERE apptbl.exe_name like '%'+@app+'%'
and apptbl.account_name <> 'UNKNOWN'
and serv.client_address not like '192%'
and icatbl.client_directory not like '\%'
and convert(varchar(10),dateadd(hh,-4,apptbl.time_stamp), 111) >= @today-30
and apptbl.sessid = serv.sessid and icatbl.sessid = serv.sessid
and CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111) = CONVERT(VARCHAR(10),DATEADD(hh,-4,serv.time_stamp), 111)
ORDER BY CONVERT(VARCHAR(10),DATEADD(hh,-4,apptbl.time_stamp), 111), 'username'

As you can see above, all the timedate fields are offset by –4 hours. To keep from having to change the offset to –5 or –4 depending on what time of year it was (standard vs. daylight savings time), I developed a simple select query that determines the current offset by checking the timezone table.

DECLARE @tzbias INT
SELECT @tzbias = case when use_daylight = '0' then standard_bias else daylight_bias end from timezone where tzid = 13

In layman’s terms, look at the timezone table where the timezone id (tzid) is equal to 13. If the field ‘use_daylight’ is equal to zero, use the ‘standard_bias’ otherwise use the ‘daylight_bias’.

I’m setting whichever bias this query returns equal to the variable @tzbias. I then use the @tzbias variable in my timedate fields in my queries. If we rewrite the above query with the tzbias variable, we get the following:

DECLARE @tzbias INT
SELECT @tzbias = case when use_daylight = '0' then standard_bias else daylight_bias end from timezone where tzid = 13
DECLARE @today datetime
DECLARE @app varchar(20)
SET @today = convert(varchar,getdate(),111)
SET @app = 'notepad.exe'
SELECT DISTINCT CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111) AS 'Date', serv.machine_name AS 'Server', serv.[user] AS 'Username', serv.client_name, serv.client_address, serv.client_version, icatbl.client_directory, apptbl.app_description, apptbl.exe_name, apptbl.exe_version
FROM vw_es_archive_application_usage apptbl, vw_ctrx_archive_server_start_perf serv, vw_es_usergroup_ica_users icatbl
WHERE apptbl.exe_name like '%'+@app+'%'
and apptbl.account_name <> 'UNKNOWN'
and serv.client_address not like '192%'
and icatbl.client_directory not like '\%'
and convert(varchar(10),dateadd(mi,@tzbias,apptbl.time_stamp), 111) >= @today-30
and apptbl.sessid = serv.sessid and icatbl.sessid = serv.sessid
and CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111) = CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,serv.time_stamp), 111)
ORDER BY CONVERT(VARCHAR(10),DATEADD(mi,@tzbias,apptbl.time_stamp), 111), 'username'

Since the timezone bias is in minutes, I had to change the DATEADD functions to use mi for minutes. Now I can use my queries year around without worrying about daylight savings time changes.

I hope this provides you some options when doing ad hoc queries against the EdgeSight database. As always, I welcome all comments and questions.

Thanks,
Alain

Headless Edgesight: Using Stored procedures to have key metrics emailed to you.

If you have been coming to the site now for the last few years you have probably noticed that we kind of specialize in a “Headless” Edgesight deployment. Dave, Alain and I all tend to steer away from the Edgesight console and focus more on running ad hoc queries or specialized reports. In this entry, I want to focus on leveraging stored procedures to provide you with the data you need via an email every morning so that you can validate a few things in your environment and perhaps put yourself in a position to “right the ship” before your onrush of users pile on the to the system at 8am.

What we will be covering today is coupling a stored procedure with an HTML formatted SQL Mail. To do this we require a few pre-requisites.

1.) You have SQL Mail configured (if you do not, there are better blogs than mine to walk you through that).

2.) That you have enough of a basic understanding of SQL Server to schedule a stored procedure to run. (or the wherewithal to follow the directions I provide).

While there are a number of canned reports in Edgesight, navigating through them or getting the exact metric you are looking for can be painstaking. Using the stored procedure method, you can schedule them to run and provide you with key metrics made up of those custom queries you choose in a format that will work both on your desktop outlook as well as be readable on your smart phone.

First Stored Procedure:

Title: SP_CTX_REBOOTS

Purpose: Most of us have a reboot regimen that we put our XenAPP Servers through. This query/stored proc will email you the status of your nightly reboots. It will be up to you and your operations staff to know which servers need to get rebooted but you can at least find out that 10 of your 25 xenapp servers IMA service did not start before you get your morning rush of users.

USE [EDGESIGHT]
GO
/****** Object:  StoredProcedure [dbo].[SP_CTX_REBOOTS]    Script Date: 04/05/2012 12:12:52 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
--=============================================
 --Author:        <John Smith>
-- Create date: <04/02/2012>
-- Description:   <Email Reboot Records>
-- =============================================
CREATE PROCEDURE [dbo].[SP_CTX_REBOOTS]
-- Add the parameters for the stored procedure here
AS
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
Begin
DECLARE @tableHTML  NVARCHAR(MAX) ;
declare @Result NVARCHAR(MAX) ;
declare @today datetime ;
set @today = convert(varchar, getdate(),111) ;
SET @tableHTML =
N'<style type="text/css">h2, body {font-family: Arial, verdana;} table{font-size:10px; border-collapse:collapse;} td{background-color:#F1F1F1; border:1px solid black; padding:3px;}th{background-color:#99CCFF;}</style>' +
 N'<table border=0 width=90% cellspacing=0 cellpadding=3>' +
N'<th>Time</th>'+

N'<th>Machine</th>'+

N'<th>Status</th>'+

cast ((Select distinct td=  dateadd(hh,-5,time_stamp),           '               ',

td= Machine_name,             '               ',

td= Service_State_name,             '               '

from vw_ctrx_archive_service_state

where dateadd(hh,-5,time_stamp) > getdate()-1

order by machine_name

FOR XML PATH('tr'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>' ;

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'XenStats',

@recipients = 'john.smith@xentrifuge.com',

@body = @tableHTML,

@body_format = 'HTML',

@Subject = 'Server Reboots',

@importance = 'High' ;

end ;

Second Stored Procedure:

Title: SP_CTX_Alerts

Purpose: This is actually a very valuable alert, this will provide you with the top errors/popups users receive when they are on your system. This is made up of both Windows and Application level alerts. I have noted a number of application (custom written errors) that show up here because they popped up during the users session. This can serve as a very valuable QA tool for both Systems Admins and Developers to run down the errors that they see.

USE [EDGESIGHT]
GO
/****** Object:  StoredProcedure [dbo].[SP_CTX_ALERTS]    Script Date: 04/05/2012 12:12:52 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- =============================================

-- Author:        <John Smith>

-- Create date: <04/02/2012>

-- Description:   <Email top Errors and popups>

-- =============================================

CREATE PROCEDURE [dbo].[SP_CTX_ALERTS]

-- Add the parameters for the stored procedure here

AS

-- SET NOCOUNT ON added to prevent extra result sets from

-- interfering with SELECT statements.

SET NOCOUNT ON;

Begin

DECLARE @tableHTML  NVARCHAR(MAX) ;

declare @Result NVARCHAR(MAX) ;

declare @today datetime ;

set @today = convert(varchar, getdate(),111) ;

SET @tableHTML =

N'<style type="text/css">h2, body {font-family: Arial, verdana;} table{font-size:10px; border-collapse:collapse;} td{background-color:#F1F1F1; border:1px solid black; padding:3px;}th{background-color:#99CCFF;}</style>' +

N'<table border=0 width=90% cellspacing=0 cellpadding=3>' +

N'<th>EXE-Name(If Any)</th>'+

N'<th>Alert Text</th>'+

N'<th>Context</th>'+

N'<th>Total</th>'+

cast ((Select td= CASE

WHEN exe_name IS NULL Then 'None'

Else exe_name

END,          '               ',

td= replace(Alert_text,'°',''),             '               ',

td= Context_text,             '               ',

td= count(alert_text),             '               '

from vw_es_archive_alert

where dateadd(hh,-5,time_stamp) > getdate()-1

group by exe_name, alert_text, context_text

having count(alert_text) > 20

order by count(alert_text) desc

FOR XML PATH('tr'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>' ;

EXEC msdb.dbo.sp_send_dbmail

@profile_name = 'XenStats',

@recipients = 'john.smith@xentrifuge.com',

@body = @tableHTML,

@body_format = 'HTML',

@Subject = 'Alerts and Popups',

@importance = 'High' ;

end ;

Scheduling the Stored Procedure:

Step 1: Add a new SQL Server Agent Job.

 

Step 2: Enter in the name of the Stored Procedure


Step 3: Got to “Steps” Configure the Steps by telling it to execute the stored procedure. (Set the Database Name to the Edgesight Database)


Step 4: Schedule the job to run so that the report is waiting for you when you get in that morning.

What does it look like: (You may have saw this at Briforum ’11)

Basically it is a very simple table that can auto size for your smart phone.

Time

Machine

Status

2012-04-05T02:03:15 SERVER19 Machine Boot
2012-04-05T02:05:06 SERVER19 Service Started
2012-04-05T02:03:48 SERVER20 Machine Boot
2012-04-05T02:05:40 SERVER20 Service Started
2012-04-05T02:15:33 SERVER21 Machine Boot
2012-04-05T02:17:27 SERVER21 Service Started
2012-04-05T02:04:28 SERVER22 Machine Boot
2012-04-05T02:06:19 SERVER22 Service Started
2012-04-05T02:05:55 SERVER23 Machine Boot
2012-04-05T02:07:47 SERVER23 Service Started
2012-04-05T02:06:44 SERVER24 Machine Boot
2012-04-05T02:08:36 SERVER24 Service Started
2012-04-05T02:07:38 SERVER25 Machine Boot
2012-04-05T02:09:33 SERVER25 Service Started
2012-04-05T02:10:08 SERVER27 Machine Boot
2012-04-05T02:12:07 SERVER27 Service Started
2012-04-05T02:10:38 SERVER28 Machine Boot
2012-04-05T02:12:31 SERVER28 Service Started
2012-04-05T02:11:46 SERVER29 Machine Boot
2012-04-05T02:13:35 SERVER29 Service Started
2012-04-05T02:12:55 SERVER30 Machine Boot
2012-04-05T02:14:45 SERVER30 Service Started
2012-04-05T02:02:28 SERVER35 Machine Boot
2012-04-05T02:04:18 SERVER35 Service Started
2012-04-05T02:04:59 SERVER35 Service Started
2012-04-05T02:10:50 SERVER35 Service Started

Conclusion:
So, the two queries included in the stored procedures are very useful but the point of the post this time is not to provide a useful query so much as it is to provide a useful method. If you have some rudimentary SQL Skills you should be able to reverse engineer this to include any custom Edgesight Query that you see on this post or that you have made up yourself. Sometimes it is nice to be able to have a simple, formatted report automatically sent to you, a customer or your boss so that they can get the key metrics they need on a consistent basis. I have found Edgesight to actually be more useful in a headless configuration (ad hoc and these stored procs) than to use the console.

Thanks for reading! If you are at Synergy and you see me (big ugly guy w/thick glasses) say “heya”

John

ESUTH Poll