Update on Telemetry Usage in Tests, Part 1

Almost a year ago, I wrote a blog on promoting the use of telemetry when anti-malware testers compile their set of malware to run tests.  I thought it might be time to give people an update.

Basically, changing testers’ habits is like the proverbial turning of a battleship.  Testers use tried and true methodology.  And it’s important for the consumers of the test results to have consistent methodology to compare past results with present ones to build a pattern of progress.  So, even to improve the tests, it has to be done delicately and with broad support, and sound reason.

I believe I gave sound reason in the presentation last year.  Fundamentally, let’s make tests more meaningful for the average consumer.  And our initiative would be through telemetry because that’s what Microsoft would be best able to provide.

The broad support from industry came in the form of an IEEE effort resulting in the release of a standard for the sharing of malware metadata.  At the same time, a large percentage of the industry banded to form the Anti-Malware Testing Standards Organization (AMTSO).  AMTSO also works towards bettering tests.  Microsoft was a member at inception.  But it became clear that Microsoft’s objective of improving tests would be better served if we concentrated on one singular message (telemetry in testing) than to spread our voice behind the many efforts that AMTSO and its members would endeavor to accomplish.  (We respect the efforts and messages undertaken by AMTSO and will try to join its membership again at a reasonable time in the future.)

So, we were down to helping the testers.

Also around this time, Virus Bulletin (VB) was introducing a new component of their bimonthly testing called RAP tests (Reactive and Proactive).  In this test, VB would collect four weekly sets of malware samples to use in the test.  Tony Lee and I worked with John Hawes (VB’s Technical Consultant & Test Team Director) to show him the kind of data Microsoft could provide, and offered ways to interpret the data so more contributors could be engaged with similar data, so Microsoft would not bias the test by being the sole or dominant provider of such information.  John wrote:

We plan to do some prioritization of our own, aligning our sample selection processes with the prevalence data we gather from a range of sources – the aim being to include the most significant items... We continue to seek more and better prevalence data to add to our incoming feeds.- http://www.virusbtn.com/vb100/vb200902-RAP-tests

AV-Comparatives.org is an Austrian non-profit testing organization.  Starting with their February 2010 tests, they wrote:

You will notice that this time the test-set is smaller… This is because we are now trying to include in the test-set mainly prevalent real-world malware…  To build the test-set we consulted metadata and telemetry data collected and shared within AV industry…  Malware we see on user PC’s are automatically considered as important. - http://www.av-comparatives.org/images/stories/test/ondret/avc_report25.pdf

And for the most recent test released this month:

We tried to include in the test-set only prevalent real-world malware that has not been seen before the 10th February 2010 by consulting telemetry / cloud data collected and shared within the AV industry.  Consulting that data was quite interesting for us, as it showed that, while some vendors had seen some malware already many months or even years ago, the same malware hashes appeared in some other vendors clouds only recently. - http://www.av-comparatives.org/images/stories/test/ondret/avc_report26.pdf

Microsoft is one of the companies helping AV-Comparatives. I presume the other participants in the IEEE metadata telemetry effort are also involved, as having only one telemetry source is not sufficiently meaningful.  For, there is a saying in the field of telemetry gathering, “You can only see what you’re looking for.”  So, it’s important that telemetry be gathered from multiple sources.  Unfortunately, as yet there are still not a very large number of companies providing actionable telemetry to testers.  As a result, current telemetry primarily provides a negative use, rather than a positive one.  (Not negative-as-in-bad / positive-as-in-good; “negative” as in the telemetry is used to eliminate samples, not so much to assert samples as important.  Which is why AV-Comparatives makes the statement that malware they see on user PC’s are what they deem to be important.)

There is also a side-effect to the use of negative telemetry.  As noted before, since you can only see things you’re looking for, “no telemetry” does not mean the malware is less common than malware with low telemetry.  In fact, the side-effect occurs such that the detection score for a product that contributes telemetry is more likely diminished because samples they know, but are of low telemetry are the ones removed from the test set.  Or, as with the second AV-Comparatives quote above, samples known to exist prior to the cutoff date are removed.  And samples not known, stay in the test set.  So, during this early period of the adoption of telemetry, we are able to remove truly meaningless samples.  But, the products represented by those that contribute telemetry suffer a minor detection score deficit.

End result though, we know telemetry produces more meaningful test sets thus more meaningful tests.  And that benefits users.  So, we ask all products to join in contributing malware and metadata telemetry to testers.

This concludes part 1.  In part 2, I will talk about full product testing and false positives.

-- Jimmy Kuo

PS. We’re proud to note, even with the potential handicap, Microsoft Security Essentials was able to achieve the highest rating of Advanced+, with “very few” false positives in that latest AV-Comparatives test.

Comments (0)

Skip to main content