Business need for Security Incident Management

Its been a while I m here at my blog. Believe me breaks work in amazing ways. This article is primarily for information security audience. But it wont hurt non-security folks either, as it would make sense to anybody. Many organisation even now don't have an information security program and obviously do not have security incident program, which usually is a subset of a security program. I wont get into defining , what is a incident and what's a security incident and what's the difference between them in order to stay focussed on current topic.

Background

As an example, I will discuss a scenario. So one of the organisations had an incident, they thought was a security incident, as suddenly they started seeing pop-ups, authentication prompts on client machines, with different messages. Looking at this weird behaviour, their team that was working on it, thought its a malware attack. That team and their executive management were in extremely panic state. I was engaged on it in later stages of our engagements. After understanding the scenario and problems they were encountering. My first question to them was, what is the Business Impact of this incident? they took some time to answer that question, as probably they did not think it was necessary to think about the business impact of that incident or event.  Answer although came little late, was about possibility of unauthorized disclosure.

Investigations

Based on that response, I checked "netstat -ano" on one of the problem machines, a client machine. There were not more then few established connections, out of them few were SMB/CIFS connections going to their known file servers, remaining connections were also going to known local machines, which were verified individually. Tcpview tool from sysinternal also provided details of the processes who had initiated those connections , all those processes were also known processes. Previously our engineers had also checked (ASEPs)auto runs, using autorun tool from sysinternal ,nothing suspicious was seen in that. Same was true with process explorer tool when it was run on the same machine earlier, no suspicious processes. Then, I took network captures using netmon 3.4 tool, to check the SMB/CIFS traffic and verify if files are getting created on destination machines or on the local machine, nothing of that sort was seen in traces either.

So from observations so far, there was no evidence of malicious activity on the network from that machine and even locally. They informed us that certain files were getting created after some time, in a particular folder  and wanted answer on that occurrence. I suggested them to enable auditing on that folder and we  will be able to find out which process of an application is creating those files in that folder. While  our AD team helped them in setting up auditing on that problem folder. We ran our famous tool that collects all ,volatile and non-volatile information, along with USN journal info using which we can track activities on files. Data analysis of that, also pointed to one thing, "No Evidence of Malicious activity".

Then answer for creation of files in a particular folder, came from auditing I got implemented by our AD team. It pointed to a misbehaving application creating those files and other weird activities, that were occurring on the machines. It was hard to believe for some ,who thought it was all malicious and it was a security Incident, but in reality it was not. But when we look back from where it started and where it ended. There so many lessons in it to be learned.

Lessons to be learned, point to serious business need of a Security Incident Management Program

First impression of this episode, CHAOS and PANIC. This caused guys who were working on it ,from that organisation side and their Executive team to go in extreme panic and stressful state whole day. This could have been avoided completely, if they had a Security Incident Management Program in place. You ask me how? There are open standards( ISC2 ), SANS and so many other institutions that talk about security Incident management. At Microsoft,  we also have an offering on this ,its called POP SIM(Security Incident Management). I however would talk just about the standard and if our customers get interested in POP SIM, we can have separate discussions with them.

In general, this program allows an organisation to create a framework and policies about how to deal with an security incident . Following diagram explains the usual flow of it, so when incident comes, usually reported by IDS,SEIM,firewall logs, suspicious activity seen by users,admins etc. Its triaged to check, if its truly a security incident, if its false positive then its handled like an usual event(non-security incident).

After initial determination, once its determined that its not a false positive and indeed is a security Incident.  It is categorized into predefined incident categories, based on business impact of the incident and as per previously determined security incident management policies. Then as per the policies, its occurrence is communicated to right teams and right people for further handling.

Note: Diagram, here I have drawn based on inputs from various books on this subject mainly ISC2 and related books and here it is purely for informational and educational purpose.

Incident response is mix of steps that happen in series and steps that happen in parallel. At this stage two things happen in parallel, they are :

1. Containment 

2. Investigations with initial RCA to reduce the impact.

 Objective is(apart from reducing the impact), to stop the spread of incident any further and also to contain the access of attacker, in case its a directed attack without giving hint to them, as this can go horribly wrong ,if not handled delicately and with caution. This is done ,if we want to track the attackers and prosecute them.

Then, in next stage of detailed analysis, more data is collected using various tools and deeper root cause analysis is done to find out who did it, how they did it, how they came in, in case its a directed attack.

But if its just a malware, then how it came, what AV we are using, is it updated to latest, is it able to detect it. If not have we contacted AV vendor for signatures yet? What's the nature of malware, how it spreads, containment is super important for spread mitigation.

This is a circular process, as based on new evidence, many steps are revisited again.

After RCA and finding root cause of the incident and reducing the impact and containment, Recovery process starts, to bring back affected systems and networks to production.Its also a tricky thing, its a risk based business decision, whether to rebuild a compromised system or just a repair or clean or removal by AV is good enough,decision that an organisation has to take based on their own scenario, how much time ,money effort they are willing to put in this. Recovery it self is a huge topic and would need separate dedicated discussion.

Then incident needs to be closed ,it becomes more important when there are multiple incidents happening, it will be difficult to track progress of them.So based on criteria established in policies ,incident shall be closed after it meets that criteria.

Reporting of incident with post-mortem of how,when what,where happened, to engaged and concerned people. From here lessons learnt are used to mend the established security incident policies, to ensure mistakes are not repeated and organisation matures its process further based on this input.

Replaying the Example Incident

Lets assume that, there is another organisation, who also have security incident management program in place, when this organisation runs into this same issue, they would perform triaging, with an experienced skilled triaging team, in this step itself they would have filtered this incident as a false positive and would have handled it as a non security incident or event.

In case, if they were paranoid and to be on safer side decided to consider it as a security incident, then in the next stage of categorizing the incident based on the business Impact, they would have effectively categorized it as Severity C. This however requires experienced and skilled team members, this is where, if they don't have their own they can engage Microsoft or any other trusted third party to help them.

So this organisation following their security incident management program, would have effectively managed the incident without panic ,chaos and resulting extreme stress for people engaged on the incident.

Business and Money

You can clearly imagine, if you don't have this program, if 10 incidents happen you will manage them with same chaos, panic and extreme stress. End result, you can imagine that too, it costs lot of money to engage resources on each incident. If you spend  USD 10,000 on one Incident at minimum (a very low figure in reality), its easy math to know how much you will spend on 10.

But if you do have a Security Incident Management program, Then probably you may spend USD 10,000 on high impact incident and probably USD 500 on an incident that appeared  initially as a security incident but after triaging, it was marked as a false positive and sent out to be handled as a non security incident to IT team instead of security team.

It would also save so much of time for people involved which can be utilized in better ways. Obviously, low stress levels and happier lives.