Announcing Winners of the Azure for Research Awards for ML

As one part of the Microsoft Azure for Research project we periodically run RFPs and offer awards that grant access to Azure Machine Learning to researchers, instructors and students. The goal of these awards is to facilitate scholarly and scientific research by enabling researchers and instructors to take advantage of the power and scale of cloud computing and leverage ML for their classroom and advanced analytics needs via shared and collaborative workspaces

The winners of our November 2014 RFP are below. Congratulations to you all! It is quite amazing to see the diversity of winning ideas and the many different places from which they originated.

For those of you interested to apply for an Azure ML award, you can do so by the February 15th deadline for our next RFP, here.

Research Award Winners

Name and Affiliation

Proposal Title, Domain & Abstract

Benjamin Rubinstein, Senior Lecturer, Computing & Information Systems, University of Melbourne, Australia

Big Data Preparation

This project aims to build generic services using Azure ML: Big Data Integration and Adaptive Labelling. Azure ML bridges the gap between basic Big Data (typically PaaS or IaaS with map-reduce only) and applications that require deeper data insight. While Azure ML’s service-oriented focus and Studio broaden ML’s accessibility, this project aims to address important components of the ML pipeline with services that are of independent, broad-based value and that could be surfaced to Azure users: integration being critical for data cleaning & source combination, and adaptive sampling guiding general labeling tasks that pervade model training and evaluation.

Jarrel Seah, Student Researcher, Medicine, Nursing & Health Sciences,  Monash University, Australia

Automated Detection of Cervical Vertebral Fractures on Computed Tomography Scans in Trauma

This project is a cross-disciplinary effort between medical and computer science faculties that aims to create an automated computer system to aid in the interpretation of cervical spine CT scans in order to rapidly diagnose cervical injuries and to serve as a second reader system. Historical CT data together with their gold standard diagnoses extracted from medical coded data, will be extracted and analyzed. Data will be taken from the radiology unit at the Alfred Hospital, where the major Victorian trauma unit is based.

Hajji Hicham, Associate Professor, School of Geomatic Sciences & Surveying Engineering, Morocco

Azure-Based Approach for Storing & Processing Water Smart Meter Data

For decades, a recurrent problem encountered in water data management (in areas such as Utilities, Hydrology modelling) has been the handling of data related complexities. But recently, due to the arrival of sensors and smart metering technologies, we have witnessed the emergence of a new class of complexity commonly expressed by the four V’s. Those inherent properties of water datasets imply a questioning of the current water data management solutions and rethinking of new solutions to guarantee efficiency, near real time processing. We wish to test the Azure infrastructure and ML library in this specific area and explore Azure ML Studio with graduate students.

Joao Magalhaes, Assistant Professor, Department of Computer Science, Universidade Nova de Lisboa, Portugal

Learning PubMed Cross-Media Relations

When making clinical decisions, physicians often browse medical information search systems for similar medical cases. Systems such as PubMed are a common tool for finding information in the biomedical literature based on simple keyword searches (including author or dates). The long-term vision of our research is motivated by the question “what if healthcare professionals could relate the data of one patient to the wealth of the entire PubMed bio-medical literature?” To pursue this vision, this project advocates the use of ML to power a new breed of medical information search systems built on cross-media relations extracted from bio-medical literature.

Alison Fairbrass, Engineering Doctoral Student, Centre for Biodiversity & Environment Research, University College London, UK

A Cloud-Based Species Recognition Toolset for Acoustic Biodiversity Monitoring at Scale

This project will demonstrate the use of the Azure ML platform for cloud-based bioacoustics classification. Classification algorithms will be implemented in three diverse, distinct and novel projects, all at different stages of development and with different intended end-users: 1. Identification of neo-tropical bat echo-location calls, 2. Urban biodiversity soundscape monitoring, and 3. Classification of British orthoptera species. The project will test the existing functionality of Azure ML and the implementation of desirable functionality, including hierarchical and adaptive classification. The project will provide a platform for expertise sharing between researchers from the fields of biodiversity monitoring and computer science.

David Clifton, Faculty Member, Department of Engineering Science, University of Oxford, UK

Machine Learning for Intelligent Healthcare Technologies

Healthcare delivery now results in very large datasets being accumulated including the electronic health records now active in many hospitals and new data sources that feed into them – including genomic data from next-generation sequencing. The resulting exponential growth in data far outpaces the capability of clinical experts to cope, resulting in a so-called “data deluge” in which the data are largely unexploited. This project proposes to use “big data” machine learning to exploit the contents of these complex, heterogeneous datasets by performing robust, automated Bayesian non-parametric inference at very large scale in collaboration with clinical experts.

Alvin Rajkomar,  Assistant Clinical Professor, Medicine,  University of California, San Francisco, USA

Big Data in Healthcare: Creation of a Re-Admissions API & Collaborative Filtering Algorithm to Generate Clinically Actionable Data

Clinicians are commonly faced with important questions like, “Will my patient be readmitted?” or “Is this patient’s medication list accurate and complete?” Health systems can improve the care of their patients by leveraging data science techniques with electronic health record (EHR) data to help them answer those types of questions. This project proposes the following proof of concept analysis to demonstrate the benefit of using EHR data, cloud services and ML for clinical systems: the first is the development of a hospital re-admissions API, and the second is the use of collaborative filtering on patient medication lists.

Ismini Lourentzou, Graduate Student, Computer Science, University of Illinois at Urbana-Champaign, USA

Multivariate Time Series Analysis for Trend Forecasting

This project focuses on forecasting future trends by combining a broad spectrum of heterogeneous sources and time series analysis. While we will mostly be focusing on predicting social and political issues, the framework of the task provides the opportunity for trend forecasting for a wide spectrum of topics, such as trends in media, sports or even events related to post-climate disasters. Ideally, the system would be a comprehensive approach to all such cases.

Jimeng Sun, Associate Professor, School of Computational Science & Engineering, Georgia Tech, USA

Cloud-Based Predictive Modeling for Healthcare Research

Healthcare analytics research involves building predictive models for the early detection of diseases such as heart failure, mortality prediction and personalized treatment recommendation. These tasks often involve multiple patient sets, features and algorithms on different prediction targets. A huge number of predictive models have to be computed and compared. In this work, we plan to develop cloud-based healthcare predictive modeling platform to efficiently compute such models in parallel with right amount of computation resources. The goal is to develop a system that can expedite and simplify the process for building predictive models on health data using Azure.

Rasiah Loganantharaj, Associate Professor, The Center for Advanced Computer Studies, University of Louisiana at Lafayette, USA

Annotating Uncharacterized Genes Using Phylogenetic Profiles

The objective of this project is to create a user-friendly application in the cloud that facilitates the annotation of uncharacterized genes or proteins by providing co-evolutionary information and functional annotation to the query sequences along with appropriate justification. A pair of genes is co-evolved if the genes have similar phylogenetic profiles and such genes seem to have similar functions. We plan to use Azure with ML Studio for creating and storing phylogenetic profiles of genes and proteins of diverse genomes. The outcome of this project will provide significant benefits for scientists who work with genes or genomes that are not well understood functionally.

Shafiqul Islam Professor, Civil & Environmental Engineering,

Tufts University, USA

CDI Tools for Data Driven Tornado Forecasting

Tornadoes remain one of the deadliest natural disasters in the USA. Currently, tornado warnings are issued based on short-term, observed weather information providing the public less than a few minutes of advanced warning. We will create an operational, data driven platform based on a synthesis of atmospheric model output, probabilistic modeling and ML which will predict the occurrence of tornadoes with several hours’ lead time.

Instruction Award Winners

Mohamed Nadif Professor,  Mathematics & Computer Science,  University Paris Descartes, France

Azure-Based Machine Learning Training

Our research team at University Paris Descartes has a long experience with ML techniques and now wants to teach how these techniques can be used on very large datasets both in a static and stream context. To do so we need to rely on a powerful cloud infrastructure, giving access to large storage facilities and processing capabilities. Students include those enrolled in masters programs on “Machine Learning” and “Business of informatics” as well as PhD students in ML.

Erel Amit, Academic Coordinator of IT Department, College of Management,


Teaching Business Data Mining

Azure ML will be used as a tool to teach 5 different Data Mining courses at our Business School every year.

Martine DeCock, Associate Professor, Institute of Technology, University of Washington Tacoma, USA

Machine Learning Projects on Azure

As a faculty member at the Center for Data Science, University of Washington Tacoma, I guide the graduate students of the Master of Science in Computer Science and Systems (MSCSS) program in their ML coursework. Students can carry out different kinds of projects that provide various levels of in-depth experience with applied ML, and the goal here is to use the Microsoft Azure ML Instruction Award across all these types of projects.

ML Blog Team