Splitting logs with PowerShell


I did some work to aggregate some logs from a group of servers for the whole month of February. This took a while, but I ended up with a nice CSV file that I was ready to load into Excel to create some Pivot Tables. See more on Pivot Tables at: Using PowerShell and Excel PivotTables to understand the files on your disk.

However, when I tried to load the CSV file into Excel, I got one of the messages I hate the most: “File not loaded completely”. That means that the file I was loading had more than one million rows, which means it cannot be loaded into a single spreadsheet. Bummer… Looking at the partially loaded file in Excel I figure I had about 80% of everything in the one million rows that did load.

Now I had to split the log file into two files, but I wanted to do it in a way that made sense for my analysis. The first column in the CSV file was actually the date (although the data was not perfectly sorted by date). So it occurred to me that it was simple enough to write a PowerShell script to do the job, instead of trying to reprocess all that data again in two batches. LogSplitter

 

In the end, since it was all February data and the date was in the mm/dd/yyyy format, I could just split the line by “/” and get the second item. There’s a PowerShell function for that. I also needed to convert that item to an integer, since a string comparison would not work (using the string type, “22” is less than “3”). I also had to add an encoding option to my out-file cmdlet. This preserved the log’ s original format, avoided doubling size of the resulting file and kept Excel happy.

Here is what I used to split the log into two files (one with data up to 02/14/15 and the other with the rest of the month):

Type .\server.csv | 
? { [int] $_.Split("/")[1]) -lt 15 } | 
Out-File .\server1.csv -Encoding utf8
Type .\server.csv | 
? { [int] $_.Split("/")[1]) -ge 15 } | 
Out-File .\server2.csv -Encoding utf8

That worked well, but I lost the first line of the log with the column headers. It would be simple enough to edit the files with Notepad (which is surprisingly capable of handling very large log files), but at this point I was trying to find a way to do the whole thing using just PowerShell. The solution was to introduce a line counter variable to add to the filter:

$l=0; type .\server.csv | 
? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -lt 15 ) } | 
Out-File .\server1.csv -Encoding utf8
$l=0; type .\server.csv | 
? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -ge 15 ) } | 
Out-File .\server2.csv -Encoding utf8

PowerShell was actually quick to process the large CSV file and the resulting files worked fine with Excel. In case you’re wondering, you could easily adapt the filter to use full dates. You would split by the comma separator (instead of “/”) and you would use the datetime type instead of int. I imagine that the more complex data type would probably take a little longer, but I did not measure it. The filter would look like this:

$l=0; type .\server.csv | 
? { ($l++ -eq 0) -or ([datetime] $_.Split(",")[0] -gt [datetime] "02/15/2016")  } | 
Out-File .\server1.csv -Encoding utf8

Now let me get back to my Pivot Tables…

Comments (3)

  1. Sylvain says:

    ITs really incroyable…https://fr.wikipedia.org/wiki/Log

    What I say I hate this code .

  2. Wes says:

    I switched to using MS Power BI tools to analyze giant CSV files. Recently I took 8 GB of text in 4 x CSV files, transformed and loaded it all – 40 million rows. This was a Checkpoint Firewall log export. Once in Power BI I could plot connections per second over time.

  3. JoseBarreto says:

    @Wes

    Yes, that’s a great point,
    PowerBI is an excellent tool and does not have the 1 million row limitation.

Skip to main content