This post is authored by Hai Ning, Principal Program Manager at Microsoft.
As we shared in an earlier blog entry, according to a World Health Organization report, 820,000 women and men aged between 15 and 24 living in developing countries were newly infected with HIV in 2011, and over 60% of this population were women. The Cortana Intelligence team at Microsoft saw an opportunity to get involved. Specifically, we decided to host a competition to build models in Azure Machine Learning that would categorize young women aged between 15 and 30 years and from one of nine underdeveloped regions into a risk segment based on various data points collected from these subjects. This would allow healthcare providers to offer them appropriate education and training programs to reduce their reproductive health risks, including for HIV infections.
This dataset was graciously donated by the Bill & Melinda Gates Foundation, and it contains roughly 9,000 samples collected through a 2015 survey conducted at clinics in 9 underdeveloped regions around the world. The data challenge is essentially a multi-class classification problem.
In a span of three months, we received 2,392 entries from 493 contest participants. The top 10 entries on the public leaderboard were only separated by 0.38% in accuracy. One participant, Rui Quintino, even created an informative Power BI dashboard (picture below) to showcase the data and the participation.
The competition officially ended on October 1st and, after validating all contest entries and results, we are very happy to announce the Top 3 competition prize winners, along with their published ML experiments! Here they are:
Grand Prize: Ion Kleopas – Predictive Experiment
Third Place: David Eduardo Millan Calero – Predictive Experiment
We would like to extend our hearty congratulations to Ion, Nailong and David, and hope you will check out their published winning entries. Additionally, we would also like to share our deep appreciation for all other participants who took on this challenge and made the contest a success.
When you look at the winning entries, a common thread you will see is that they all use XGBoost R package to solve this problem. Ion and David trained the XGBoost model in their local environment and brought the serialized model into Azure ML for scoring, while Nailong used the XGBoost package that’s built into the latest R runtime environment, Microsoft R Open v.3.2.2, in the Azure ML Execute R Script module for training and scoring. This is a great testament to both the popularity and power of XGBoost and to the extensibility of Azure ML Studio. In the end, Ion and Nailong had identical private scores (except Ion submitted his winning entry a few weeks earlier), and David was only 0.07% behind – quite the photo finish for a very close race!
We also received positive feedback about the Competition platform and Azure ML Studio during the contest. One contestant wrote to us, saying, “Azure ML Studio is a platform I am very fond of, since it offers extremely useful machine learning functionality in a rather easy and straight-forward way, plus integration with other Azure services is remarkably easy.” Another wrote, “Microsoft is changing the game in making machine learning so easy and fast!”
We hope everyone enjoyed the contest. We are working with the Gates Foundation to put these winning models to work in the real world, so we can make a difference in the lives of women in these HIV-affected regions. It’s our privilege to work with the ML community to find solutions such as this with potentially outsized impact on the lives of people around the world – thank you!
Hai, on behalf of the Cortana Intelligence team.