When my team goes out on tour or as part of a product launch, we meet a lot of people who have fantastic ideas on how to use Technology to get things done and make their lives easier. One of these people is Yaroslav Pentsarskyy – a student at British Columbia Institute of Technology. He came out to the SQL 2005 Launch tour and saw Damir Bersinic and Barnaby Jeans in action with SQL 2005. He contacted the team via this blog after completing a challenging assignment and thinking outside the box. After reading his initial contact email – I was intrigued with his approach to solving a problem that is faced by security professionals all over – “How do you import and analyze millions and millions of log entries in a fast and efficient manner?”
After corresponding via email – I’ve asked him to write up a blog post series on the assignment as well as the details on how he accomplished the task. This is the first post in the series. I’ll let Yaroslav take it from here.
Being a student at the British Columbia Institute of Technology (BCIT) is a special thing. It's an environment where real world tasks are out there waiting for you to apply your very own approach to them. That approach must not only be effective, robust, and innovative; it should be also acceptable in a real world environment. It's not just another school project - it is something that tomorrow may be used in any scale organization and it better be working well. With the growing interest in security across many organizations I found my course project on Network Intrusion Detection particularly interesting. Nowadays we work with many information systems under different platforms, running different operating systems, and working with a range of applications. In case of a system getting compromised the question is - will I be able to extract the data I need, when I need it, and manipulate it across all of those systems to achieve the result I expect? As there are so many opinions out there present at many organizations, that the concept of security system analysis is becoming much broader in every way. In such a heterogeneous environment it's really easy to loose track of your system, which should never happen in the security world. The story that I wanted to share with you today will involve one tool that can gather, process, and manage security data coming from across many platforms - that tool is the least expected to be heard in this context - it is SQL Server 2005.
The scenario for my project was the following: The Company, having a suspicion of a break into their internal network, has hired an analyst to gather information of how was the network compromised, what was the weak part of the network, and how to protect the network from similar attacks in the future. I was assigned as an analyst, having Intrusion Detection System (IDS) logs as the only artifacts available. The task was clear until the most important piece of evidence came into play - over 3 million records of intrusion detection alerts. The alerts covered parts of company's network that were monitored over 4 days using Snort (a popular IDS running under Linux). Alerts ranged from suspected break-ins to hundred of thousands of false positives. There were a total of more than 1500 types of alerts, indicating different levels of lethality. The challenge was to find out which of those attacks have successfully contributed to overall security compromise resulting in breaking into the system. That information was supposed to be helpful in determining what actions ought to be taken to prevent breaks in a future. After looking at the data and having no other data available about the network except those intrusion detection alerts, I would never expect to be clear on anything that happened on the network during those 4 days; the only part that gave a little hope was the 3 weeks that were given for the project to be completed.
Being at the Launch of SQL Server 2005, along with Visual Studio 2005, I remembered those case studies where corporate IT managed huge numbers of critical data with SQL Server. Having my complimentary copy of SQL Server 2005 Standard I decided to see how SQL can help me with tons of my "critical" data. There was no strict requirement for the tool to use for data analysis, and logs were just huge collection of flat file records so I started with loading all the records to a SQL Server database. The format of the actual text file was a bit challenging as it took quite some time to get all the parts settled into their respective columns and have everything organized. In a couple of days after having everything loaded and basic analysis done, I decided to find out what other approaches were taken by other students. Well, I wasn't surprised that many of us googled hundreds of tools that some of them were claimed to be the "industry de facto" in security log analysis. Among them was SnortSnarf (http://www.snort.org/dl/contrib/data_analysis/snortsnarf/) a PERL program that used original Snort logs to create custom reports based on the requirements specified by security analyst. Another tool - SawMill (www.sawmill.net) - was designed to analyze many types of logs and build custom reports based on queries. The list went on and on, but the two tools listed above were the most popular ones. To my disappointment, nobody went the SQL Server route because apparently it involved too much Transact-SQL programming. Well it was too late for me, as all the data has been nicely loaded into my database and was ready to be analyzed. So I decided to check out with other students on the performance indicators of their approaches. Once everyone chose their analysis process and implemented some basic data manipulations, I was astonished to find out that my friendly guy, SQL Server, has processed the data couple of thousand percent faster than other tools. Comparing to SnortSnarf, that used PERL, and took more then one day (non stop) just to load data, and SawMill taking gigabytes of RAM and hours of time to display a simple report with 2000 records, SQL Server took less than 10 minutes to load flat log files into the database. I was amazed and thought that this can't be - yet I wasn't dreaming … "nice work, and those are specialized systems" I thought. Well that was just the beginning of it. Having all the data loaded in my SQL database I could run any sort of analysis on the log, group it in the most custom and flexible way, and manipulate every single parameter possible. I quickly spotted some trends that led me to an actual break in - the relevant data was extracted and clearly presented in my report. I could provide actual proof based on my assumptions - just by having quick access to the log components relevant to me and being able to view every angle of the database. That took me couple of hours in about 4 days. Almost any aggregation procedure took no more than 10 seconds to output results and involved basic Transact SQL knowledge. I must say, it was quite challenging for many students to extract meaningful correlation using their tools; the main problem was trying to verify their suspicions within reasonable time limit as well as inability to manipulate certain parameters to be certain of an assumption.
Now I thought it would be nice to share a sample of my solution. So in the second blog post, I will present certain steps that start from loading the flat file into the database and doing some custom aggregation to detect serious security alerts.
Feel free to post you comments and share your opinions either here or contact me directly at email@example.com