LPS: Custom Field Extraction

I may have mentioned this before but I use it so often, I think it deserves it's own post. I am often faced (as in very often) with a single field of data that has lots of disparate bits of data embedded within it. A perfect example is the cs-uri-query field contained in IIS logs, especially where Exchange is concerned. Here is an example:

ae=Folder&t=IPF.Appointment&id=LgAAAACZfhLxcD%2bnRZZG0OyEvOwcAQC3gm%2fVeZsgSIJcyGKb0MppAEwcvx5xAAAC&yr=2015&mn=3&dy=15&Initial+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85862/7%,AB:60000/60000/0%,RPC:90000/87538/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm&v=14.3.174.1&mbx=MBX1.CONTOSO.COM&sessionId=891d047b451141b0a6406ca854de41b9&prfltncy=134&prfrpccnt=15&prfrpcltncy=78&prfldpcnt=1&prfldpltncy=15&prfavlcnt=0&prfavlltncy=0&End+Budget>>Conn:1,HangingConn:0,AD:60000/60000/0%,CAS:90000/85738/8%,AB:60000/60000/0%,RPC:90000/87463/5%,FC:$null/0,Policy:DefaultThrottlingPolicy_67df4d2e-9f19-4270-9b8e-dec43773104b,Norm[Resources:(Mdb)ABC-DAG09-DB32(Health:-1%,HistLoad:0),]

^That is a lot of info for a single field, not to mention not very review-friendly, and for the sake of my theoretical needs I really just wanted the MBX being used, the session ID and which throttling policy is being applied. I have no real need for this data right now, I'm just randomly grabbing these for the example hence the term "theoretical need".

It is easy to extract each bit of information we want into it's own field. I can for example use EXTRACT_PREFIX() and EXTRACT_SUFFIX() to do this. We could probably use SUBSTR() but that would potentially be much messier and we'd be dealing with string lengths which is going to break the moment the string we are dealing with changes.

With EXTRACT, we can use strings and grab exactly what we want. There is a shortcut in LPS which auto-creates a near ready to go combo of the EXTRACT functions, that shortcut is SHIFT+F12 and inserts the following into the LPS query window (yes, this shortcut exists because I grew extremely tired of typing it out by hand!):

EXTRACT_PREFIX(EXTRACT_SUFFIX(FieldName, 0, 'StartChar'), 0, 'EndChar') as FieldAlias

We only need to change the tokens to fit our data. Here are those tokens and their meaning:

FieldName - The name of the field we want to extract data from (in this case cs-uri-query)

StartChar - The starting character(s) we want to use as a marker to tell LPS where our data begins.

EndChar - The ending character(s) we want to use as a marker to tell LPS where our data ends.

FieldAlias - What we want our custom column to be named.

Since I want three items, I need three of these lines, one for each bit of data I want to extract (E.g.: Mailbox Server, SessionID and Throttling Policy). I'll go ahead and fill these in and you can compare to the example querystring above to see why I chose what I chose:

SELECT TOP 1000

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&mbx='), 0, '&') as MBXServer,

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, '&sessionId='), 0, '&') as [Session ID],

EXTRACT_PREFIX(EXTRACT_SUFFIX(cs-uri-query, 0, 'Policy:'), 0, ',') as Policy

FROM '[LOGFILEPATH]'

WHERE cs-uri-stem LIKE '%ev.owa%'

AND cs-username IS NOT NULL

*The WHERE statement above exists to make sure only records that contain the querystring data I expect are returned. If I did not do this, we'd get back some empty fields because the data we want to extract doesn't exist in every request in the IIS log. Here is an example result of the above:

  Mailbox Session ID Policy
  MBX2.CONTOSO.COM 8667ec4f36ao4660a3d8382de9dfbafe DefaultThrottlingPolicy_45df4d2e-9f19-4270-9b8e-deu22273104b
  MBX1.CONTOSO.COM 8667ec4f36ao4660a3d8382de9dfbafe DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b
  MBX3.CONTOSO.COM 86e44db93ec245e0b64f3feb2a68feeb DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b
  MBX1.CONTOSO.COM 86e44db93ec245e0b64f3feb2a68feeb DefaultThrottlingPolicy_54df4d2e-9f19-4270-9b8e-deu22273104b

Neat! Now this is something I can work with.

Being an Exchange Escalation Enginner I used Exchange data from my lab but this can be used with any type of data in LPS that has text, not just IIS logs. We could have just as easily used this on a CSV log or even text. Consider the following line of text and also consider it is in some text file somewhere:

foo=bar+bar=001,sometext||hello

We could easily use what we have learned to turn that into useful columns of data. In this example I'm using the TEXTLINELOG type to search my example text file:

SELECT TOP 1000

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'foo='), 0, '+') as Foo,

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, 'bar='), 0, ',') as Bar,

EXTRACT_PREFIX(EXTRACT_SUFFIX(Text, 0, '||'), 0, '+') as SomeText

FROM '[LOGFILEPATH]'

Result:

  Foo Bar SomeText
  bar 001 hello

When data mining all types of data with LPS, I find myself using this technique much of the time. It's just too handy to be able to extract data and view it as I need to see it which is the entire point.... Get to the data and represent it in a meaningful way. Just remember to use SHIFT+F12 to insert the template.