Anomaly Detection: Models Ensemble

In the final article of their three-part series, SoftServe’s Data Science Group (DSG) wraps up their look at informational security risk identification by detecting deviations from the typical pattern of network activity.

By Tetiana Gladkykh , Taras Hnot , and Volodymyr Solsky

Continuing the previous research on Machine Learning: Achieving Ultimate Intelligence, SoftServe’s Data Science Group (DSG) describes informational security risk identification by detecting deviations from the typical pattern of network activity on each of the suggested models: Dynamic Threshold Model, Association Rules Based Model, and Time Series Clustering Model.

In the previous article “Three Models for Anomaly Detection: Pros and Cons”, the models were considered separately. However, to maximise the effectiveness of the models used for the information security violations detection, we decided to unite them into one ensemble (Fig. 12). As seen from the diagram, the first two models (Dynamic Thresholds and Association Rules) use the same data set related to measurements according to N categories received in real time. This approach may be referred to a begging model where the role of arbiter is performed by an Anomaly Confidence Level unit reestablishing the level of certainty that abnormal network activity takes place in the given time period. Meanwhile, the Time Series Clustering model works with the pool of historical data which presupposes its inclusion in the model according to the boosting method.

fig12

Ensemble of models – Fig.12

Because of the differing usage modes between the models (online and offline), the model-based time series segmentation is the most applicable in non-business hours to clarify the situation when an anomaly hasn’t been detected. According to the offline verification results and further verification, the models are to be adjusted to the new conditions, which leads to a change in a range of normal behaviour patterns and, as a consequence, allows the observed process to be described with more accuracy.

Conclusions

Within the framework of the conducted research, SoftServe’s DSG came up with a comprehensive solution to detect differing network activity of users or group of users as opposed to a well-known pattern, which in its turn may indicate attempts of an information security breach. The offered solution is an ensemble of three models that facilitate analysis of three anomaly types:

  1. Significant deviation of the observed values from the expected – Dynamic Threshold Model. Simplicity in implementation and its ease of adaptation are the main advantages of the model; on the other hand, the isolation of the analysis results for each individual metric from observations in other categories appears to be its downside, which makes it difficult to search for event patterns.
  2. Unusual set of the observed values of the measured parameters – Association Rules Based Model. The main advantage of this model is its ability to describe the observed process as a set of related events; insensitivity to weak process dynamic changes is its main drawback.
  3. Unusual dynamics in the observed process – Time Series Clustering Model. Even though the found patterns in this model reflect the internal dynamics of the observed process, it doesn’t allow detection of event patterns and its application "on the fly" is extremely difficult.

This is why we suggest using an ensemble model, since combined they neutralise all the disadvantages and facilitate decision-making regarding adaptation to the modified conditions. As a result, this solution allows both typical abnormal network activity manifestations to be identified and unusual and new elements of the network anomalies to be detected, while the self-adjusting capability lets the solution adapt to the "legal" changes in the network processes.