Let telemetry be your guide, a proposal for security tests…

Users today are offered choices among many security products, any number of which are sufficient, and none perfect.  Along with these products are myriads of product test results and certifications, all there to help you make a better, more informed decision on which product to use.  And as product developers, we’ll point to the tests and reviews that best represent our product. (Like this recent report on the just released Microsoft Security Essentials Beta and the most current AV-Comparatives test showing Windows Live OneCare (OneCare) reaching the vaunted status of Advanced+.)

But are the tests doing what they ought to do?

I would like to take this opportunity to present a case for advancing the methodology of testing security products.

In all the time that this industry has been in place, product testing has been conducted by way of throwing huge numbers of malware at the product and seeing how well the product can detect that malware.  "Improvement" in testing was measured by increasing the number of samples.  "Comprehensiveness" was to have millions instead of thousands, and coverage of the many types instead of just lots of malware.  And only recently, consideration for false positives (FPs) finally is influencing the interpretation of the test results. 

(An example: it is this concept of false positives that allowed OneCare to win the latest AV-Comparatives test.  There were two other comparable products, one scoring a detection rate higher and one the same as OneCare.  But, because they also were among the highest in FPs (over 15 FPs), both fell to Advanced. OneCare only had 0-2 false positives, the lowest of all tested products, and the only one in this lowest category.)

Because false positives cause unnecessary upheaval that may result in nonfunctioning machines, and because a high detection rate is often directly correlated with the propensity to FP, we would like to recognize AV-Comparatives, and all the other testers and certifications that do not blindly judge detection capability without consideration for false positives.  And our hat's off to Virus Bulletin for having had a no-FP requirement for its VB100 Award for the longest time.

So, the recognition that false positives are an important consideration in the interpretation of test results is now becoming standard.  What next to make tests more meaningful for the real user? 

As I mentioned before, the standard way of testing is to throw lots and lots of malware at the products and present a detection percentage.  This is then presented as a measure of the quality of the product.

But does that really represent quality for the average user?  The tests do not simulate the likely scenario on our machines at home or at the office.  So, how is the result then meaningful?  If a product misses 1% more than another, are those 10,000 samples in a million meaningful to you?  Maybe it's 10,000 distinct samples of a single server-side polymorphic trojan from one site that your browser happens to warn you not to visit?  Or, they might be mostly comprised of a set of targeted attacks.  Important to the targeted entity and the products they use, but for you or me?

How do we fix this?

One of the best advances in the security industry in recent years is the ability we have to capture telemetry about the malware cases we encounter.  The data associated with malware infections enables us to produce the semiannual Security Intelligence Report.  And selective use of prevalence reports enables us to make decisions each month regarding the best way for the MSRT to protect the eco-system.  Others in the industry make use of their telemetry to also produce reports, and free tools to clean up the most prevalent malware affecting the eco-system.

What we need to do is to incorporate this data in the tests.  To accomplish this, the Microsoft Malware Protection Center (that’s us), in its arrangements that give other security vendors access to the malware we collect, has started to also provide normalized prevalence data to other security vendors, security industry testers, and the WildList Organization.

Tony Lee manages our collection of malware and its distribution to our partner security vendors who care to participate in the Microsoft Virus Information Alliance (VIA).  He will contribute the next section of this blog…

Malware manages to evolve in its ability to distribute, mutate and update itself at an increasingly fast pace – we’re often talking about hours and days here. Malware also targets various sizes and groups of the population. These infection characteristics pose challenges to AV product testing, both in the demographic and chronological sense. In order to meaningfully reflect a product's ability to protect its users, the testing methodology employed needs to have an up-to-date and accurate view of the threat landscape.

Through telemetry collected by our various antimalware products, we are able to observe what is statistically significant to reflect the state of threat activities in the wild, in near real time. For example, by observing first seen, last seen dates of a threat, and its occurrences during various periods of time, we can assess the age, severity and activity trend at both file and threat levels.

Recently, I established an experimental program to share this prevalence data with our security partners. We have received very positive feedback and suggestions. At the core of this program is an automation process that monitors noticeable new threat activities as they are taking place in the field. The process then aggregates, analyzes and publishes this data to security partners in an encrypted channel, on a daily basis. Recipients of this information can assimilate this data over time and construct a view similar to the example below:

SHA1: 18375FD78CDE1E1B7291FBC37831CB36013895FD
MD5: 9FFCA5614A1032B0709ECAB67DF10F49
Total Reports: 17,052
File Size: 96,047
We also share weekly information in a Top 100 list; the top 20 in the report generated July 10th are shown here:

* ITW Index is an abstract representation of one element against another; it does not represent actual count.



Threat Name

ITW Index



Worm:Win32/Koobface.gen!D [generic]




VirTool:WinNT/Koobface.gen!B [generic]




Worm:Win32/Koobface.gen!D [generic] [non_writable_container]




TrojanProxy:Win32/Koobface.gen!C [generic] [non_writable_container]




Trojan:Win32/Liften.A [non_writable_container]




TrojanDownloader:Win32/Small.gen!B [generic] [non_writable_container]




Trojan:Win32/Matcash.gen!M [generic]




Backdoor:Win32/Delf.B [non_writable_container]




Trojan:Win32/Tibs.gen!lds [generic]




Trojan:Win32/Vundo.gen!AN [generic]








PWS:Win32/Daurso.gen!A [generic] [non_writable_container]




Comments (0)

Skip to main content