Log Parser Studio - Write your first query in less than 30 seconds - Part I

Article
06/05/2013

Though LPS contains over 170 preconfigured queries out-of-the-box there almost always comes a time when you have a question you like to get an answer for that an existing query doesn't quite cover. There are couple of things we can do, we can modify an existing query that is close to what we want or we can write our own. I often hear the following when speaking about LPS with others:

"I love using LPS but I don't know the first thing about writing SQL queries, SQL queries are complicated and difficult to write. "

It's true that SQL queries can get complicated but Log Parser query syntax is only a subset of the SQL you are used to hearing about so the good news is that you don't need to be a SQL DBA or anything near that to begin writing simple queries that will help you get answers to your questions. So, how can you write your first query in less than 30 seconds? Just click New Query in LPS, select a log type and click Run! OK, so LPS wrote it for you but the default new query is there both as a starting point and the simplest example of a query that pulls the first ten records found in the logs:

SELECT TOP 10 * FROM '[LOGFILEPATH]'

Let's dissect the query above to better understand what is going on. Once you have this basic understanding you'll soon see that it really isn't that hard to get started. I will be explaining these in layman 'Let's get some work done" terms where LPS is concerned only. This isn't a formal SQL course. :)

SELECT

We can always think of the SELECT statement as "Which columns from the log do I want to included in the results". SELECT will always appear as the very first statement in every query followed by the list of columns you wish to included. Let's assume a theoretical log file with the following columns: date, time, user, status, duration and show some examples. Actually let's create our own log file and use that in this tutorial. Begin by pasting the following text into notepad then save as "TestLog.CSV":

Date,Time,User,Status,Duration
6/4/2013,12:00:04,User 1,OK,123
6/4/2013,12:00:05,User 3,OK,516
6/4/2013,12:05:31,User 3,ERROR,731
6/4/2013,12:10:50,User 1,OK,826
6/4/2013,12:10:52,User 1,PENDING,154
6/4/2013,12:15:12,User 2,ERROR,213
6/4/2013,1:23:32,User 2,OK,22
6/4/2013,1:24:00,User 1,ERROR,546
6/4/2013,1:39:21,User 2,OK,987
6/4/2013,1:40:01,User 3,PENDING,265

Once saved as CSV, open the Log File Manager in LPS, choose "Add File", browse to and select Testlog.CSV. Create a new query in LPS and change the log file type to CSVLOG. Now that you are setup and ready, here are the examples that you can run against that log along with explanations as to how they work:

SELECT date, time, user FROM '[LOGFILEPATH]'

Display every record but only include the date, time and user columns in the result.

SELECT user, status FROM '[LOGFILEPATH]'

Display every record but only include the user and status columns.

SELECT * FROM '[LOGFILEPATH]'

Display every record including all columns (the asterisk is a wildcard for column names). You probably never want to use this exact query as it will return every single record which would be the same as just opening the file.

SELECT TOP 5 user, status FROM '[LOGFILEPATH]'

Display only the first 5 records and only include the user and status columns. Notice that field names are always separated by commas with no comma between the last field and the FROM statement. As an FYI, it is a common mistake to place that extra comma after the last field and the query will throw an error similar to "Expecting FROM statement" or similar when this happens. Another good thing to know is that queries are evaluated for validity from left-to-right, top-to-bottom so many times the seemingly cryptic error you may encounter when attempting to run a query containing a mistake will give a clue as to where the issue is.

TOP

TOP is always used to restrict the number of records returned and always follows the SELECT statement. It doesn't need much explanation but I'm including it because when writing queries the first thing you'll likely do is write a query that returns too many records and the whole idea of queries is to isolate what you need from the noise. TOP is a great way to protect your queries from information overload by restricting the total records returned but is purely optional where SELECT is not optional.

FROM

FROM means just what it says. "From what log file or resource will I be querying? " AKA the path to the log file(s). In LPS we typically use the placeholder token '[LOGFILEPATH]' instead of real filenames. This is swapped out with the real file names we chose in file manager when the query runs. As you can see we really only need SELECT, some field names and FROM to write the simplest queries from scratch; it really is that simple, but wait there's more. Just pulling records from a log as-is isn't much different than opening the file and looking at it, this is where the WHERE statement comes in.

WHERE

WHERE is our workhorse for extracting only what we care about and is used to search our logs for specific data via equality matching. Continuing with our theoretical log file above what if we wanted to only return records that contains user1? The following returns all records where the user field is User1:

SELECT * FROM '[LOGFILEPATH]'
WHERE user LIKE 'User1'

Notice the single quotes which is the default for string searches. Secondly notice the LIKE statement which gets a lot of use along with the WHERE statement. LIKE can also use wildcards which is the preferred in many cases. The wildcard is LPS is the % sign. Here are some additional examples using WHERE and LIKE:

SELECT * FROM '[LOGFILEPATH]'
WHERE user LIKE 'User1 % '

Find all records and display all columns for every record where the user field begins with User1. Notice the % wildcard is at the end of the string.

SELECT * FROM '[LOGFILEPATH]'
WHERE user LIKE ' % User1 % '

Find all records and display all columns for every record where the User1 appears anywhere in the user field. Wildcard is at beginning and end of string.

SELECT user, time, status FROM '[LOGFILEPATH]'
WHERE user LIKE '%User1%'
AND status LIKE '%Error%'

Find all records where the User1 appears anywhere in the user field AND where the status field contains the word error; display only the user, time and status fields. I threw in the AND statement to show that you can expand your criteria and narrow results easily. What if we wanted to look for errors for User1 OR User2?

SELECT user, time, status FROM '[LOGFILEPATH]'
WHERE (user LIKE '%User1%' OR user LIKE ' %User2% ')
AND status LIKE ' %Error% '

If your remember your math classes from high school you might remember the term "Operator Precedence" and it is being employed above because SQL queries observe a similar logic. In other words what is contained inside the parentheses is evaluated first, otherwise our query might return unexpected results when attempting to evaluate the OR between users and the AND for finding errors. When using AND/OR together you want to think about this so that your queries do what you expect them to do.

ORDER BY

The ORDER BY clause is also somewhat explanatory. It allows you to sort the records based on the fields you wish in either ascending or descending order. Here are a few examples with ORDER BY and expanding upon our progress:

SELECT user, time, status FROM '[LOGFILEPATH]'
WHERE (user LIKE '%User1%' OR user LIKE '%User2%')
AND status LIKE '%Error%'
ORDER BY User ASC

The above finds all records where User1 or User2 are in the user field and the status field contains the word error then sorts the results by user name in ascending order. ASC is not actually required and its omission automatically defaults in ascending order. Our two choices are ASC (ascending) and DESC (descending). If you want descending order you do need to include DESC since the default is ASC. I typically always include ASC because it makes the query more self-explanatory for others. You can also sort based on multiple fields using commas:

SELECT user, time, status FROM '[LOGFILEPATH]'
WHERE (user LIKE '%User1%' OR user LIKE '%User2%')
AND status LIKE '%Error%'
ORDER BY time, user ASC

AS is can be used to create aliases for field names. Field/column names in logs can sometimes be less than desirable for the story we want to tell. Maybe it's an IIS log and we would prefer "cs(user-agent)" display as "Client Device" instead because it makes more sense for the query/report we are going to generate. AS is extremely simple to use, simply place "AS Alias" directly after the field you wish to rename. Below we are using our same example log to rename the fields to our liking:

SELECT user AS [User Name] , time AS TimeLogged, status AS Result
FROM '[LOGFILEPATH]'
WHERE (user LIKE '%User1%' OR user LIKE '%User2%')
AND Result LIKE '%Error%'
ORDER BY time, user ASC

I exploited a few features above, let me explain. The brackets allow us to use spaces in our alias so that the user column will display as "User Name". Once the aliases are set they can also be used in the query itself so long as they are referenced after the alias is created. Notice how I used "Result" with the AND operator instead of the real field name of "status". I use aliases all the time as they are wonderful for converting what may be cryptic looking field names into something stakeholders better understand. Queries are about telling a story and aliases are a great tool for presenting results and reports that make sense to those you are delivering them to.

Field Names

When learning to write queries the first hurdle you may encounter is knowing the names of the actual fields. You have a 3GB log file and opening it up just to find field names isn't going to be very productive. For IIS logs it's easy in LPS, just click F3 and a window will popup that lists all default IIS fields. You can select the ones you care about and hit enter and they will be copied to the clipboard (with the commas) and you can just paste them into your query. :)

For other logs you can actually write a query to give you the field names that you can then use as a reference in your new query. Here's how I do it and there are actually some built-in sample queries that already perform this useful trick:

SELECT TOP 0 FROM '[LOGFILEPATH]'

You may be wondering why we want to run a query that returns zero records; well, we don't want records we want field names and the query above will do just that provided you have chosen the correct log file type. Just run the query and you will see the grid populate with column names only and no records. Now, right-click anywhere inside the empty grid and choose copy which will copy the field names to your clipboard so you can paste them and reference as you write your query. One thing I often do is save my "find field name" query and include the fieldnames as a comment in the actual query for future use:

SELECT TOP 0 FROM '[LOGFILEPATH]'

/* Field Names: date, time, user, status, duration */

If you aren't code savy, /* and */ are opening and closing tokens for comments. Anything between these are considered comments and are not evaluated as part of the query. Keep this in mind because you can document your queries with useful information which is exactly what we are doing here.

SUMMARY

Hopefully this quick crash course in creating queries will get you up and running with creating your own queries. Though the built in queries are extremely useful, the real power of LPS is in modifying and writing your own because every situation is different. Take some time to study the above an see if you can write a few simple queries for your own logs in your environment. it doesn't take long to get a basic grasp on writing basic queries. Stay tuned for part II which will dig a little deeper into query creation and different methods of getting at the answers you need.

Log Parser Studio - Write your first query in less than 30 seconds - Part I

Additional resources