Spotlight on Microsoft Research: Accessing Usable Environmental Data at the Speed of Thought with FetchClimate

Posted by Kerry Godes
Senior Manager, Worldwide Marketing and Operations

Sources of data exist all around us, especially when it comes to environmental research. With more sensors and devices than ever capable of capturing robust amounts of data, new technologies are needed to analyze this information and make insightful observations.

Microsoft Research (MSR) has been working on methods to compile, analyze, and apply big data for a number of years. Today we’re showcasing FetchClimate, another noteworthy example of how a combination of big data analysis and open science can help scientists and non-scientists alike better understand the world around them.

We recently chatted with Drew Purves, head of the Computational Ecology and Environmental Science Group at Microsoft Research Cambridge to talk about the FetchClimate tool for environmental data that he and his team have brought to life. Read on to get Purves' perspectives on the importance of openness in his team’s work and how with FetchClimate it’s very fast and easy to “get useful environmental information in a very open way.”

Give us a quick overview of FetchClimate and what it allows users to do.

The point of FetchClimate is to allow users to get to a summarized set of useful information about different elements of the environment, very quickly and very easily. There are huge amounts of environmental data available in the public domain, but it’s all really difficult to digest, with much of it stored away in huge files that require special programs, and so on. We (at Microsoft Research Cambridge) actually carry out our own environmental research every day, and the model with FetchClimate was to originally help accelerate our own efforts, and then make it easier for others to do the same -  ultimately, be able to get data almost at the speed of thought. FetchClimate sits atop the whole set of big data so that when you then ask for a specific bit of information, it chooses that data and makes it available to be downloaded instantly, and also to be visualized. It empowers anyone from anywhere to get a broad range of environmental information immediately.

What open standards and protocols does FetchClimate use?

A big takeaway throughout all of our projects and initiatives really is their openness. Scientists need to collaborate openly – and therefore if Microsoft, and any other company for that matter, wants to play an important role in science, it needs to have that open collaboration. Without it, humanity is not going to solve the crucial problems that we face.

The current version of FetchClimate is built using HTML5 for the front end, to allow for access on essentially any device and any platform. In terms of open source and protocols, overall, we’ve really tried to adopt the open formats that scientists already are using – like NetCDF (network Common Data Form) – to make it easy for them to work with FetchClimate.

Other enhancements have made it possible for people to make their own copy of FetchClimate, which can read from any of the data in our original copy, but then they can also supplement with their own private data, which they can access via the same API and / or UI. This means that anyone can use the FetchClimate software to make a public-facing geographical information system tailored to a particular purpose. For example, if you’re with the Brazilian government, you may want to have lots of information specific to Brazil accessible to the public via FetchClimate, but in a particular color scheme or badged a certain way. Meanwhile, the underlying software would be the same as our core and can then pick up and would show any of the information in the big ‘reference’ copy.

Looking at the idea of openness more broadly, people talk about open data and that’s certainly big in FetchClimate. But it’s not just about making data open in the sense that it can be downloaded to somewhere. Rather, it allows people to get the useful information they want, without ever needing to find, download and transform the data themselves. So now people can get useful environmental information in a very open way. Ultimately though, what we are interested in is open modeling: enabling a much wider group of scientists to collaboratively create a model that can predict something important (say, food production or fire) from environmental information that is already available (like temperature and rainfall). Already, you can push a model like this into FetchClimate. Then instead of asking for climate information, you ask it to produce much richer and more sophisticated predictions based on one of those models. FetchClimate will work out how to first get the climate information needed to run the model, then run it for you, anywhere in the world, for any dates (including into the future!), at any resolution. But you can also share the whole history and provenance of the model itself, so it can be inspected, modified and extended, then pushed back into the same system and on you go. That’s what we mean by open modeling.

Do you feel that Microsoft is becoming more “open” as a company?

When I first started engaging with Microsoft Research, I was amazed at how incredibly open the company really was, especially on a human level. Microsoft Research has a very porous environment that allows us to really work with whomever we want. Microsoft realizes that having such a creative atmosphere that enables us to bring in outside expertise to work with Microsoft can create a real hopper for ideas. I’ve been amazed learning about computer science through Microsoft, and the company is there at every type of conference because they are so open. I see more movement in that direction for the future as well.

Is FetchClimate currently being employed by any organizations?

First and foremost, ourselves! We do all kinds of novel real-world analysis with FetchClimate, but we also have 1:1 collaborations in place with a number of academic colleagues. We are also receiving a lot of interest in FetchClimate from major environmental organizations. For example, the International Union for Conservation of Nature (IUCN) has asked us (Microsoft) to partner with them for their Red List of Threatened Species (as a result Microsoft has become the first and only corporate partner for the IUCN’s Red List). My colleague, Lucas Joppa from Microsoft Research, is actually leading up that partnership.

What other Big Data initiatives are you and your team working on that bring Microsoft and open source together?

FetchClimate is actually just one of several ambitious tools we are developing in collaboration with our Research Connections colleagues at Microsoft Research’s Redmond campus. For example, our ‘Distribution Modeller’ prototype makes it enormously easier to build the types of models that can then be pushed into FetchClimate. It also lets users share that whole analysis with someone else with just a few clicks, meaning that anyone else can inspect, repeat, modify and extend that model.

We also have prototype tools for a number of uses, including: spinning up complex, relational geo-databases; piping all kinds of data sets between the tools in a way that preserves metadata; telling comparative narratives that will appeal to policy makers and the public; and helping decision makers to compare multiple model predictions to make better decisions. And here in Cambridge we’re experimenting with some early prototypes for new programming languages to define ecological models, new sensor devices, all kinds of things.

On the technical side, there is quite a lot being done to enable the building of these tools, and we’re trying to keep things as open as possible. For example, we just reworked our data visualization display (‘Dynamic Data Display’) in JavaScript so that any others can then use it in tools that are built in HTML5. We’ve packaged up FetchClimate so that it can be used easily through R, which is an important open source platform in the environmental science community. We’re also looking to open source some additional key components, like our parameter estimation engine, ‘Filzbach,’ which allows models to be trained against data, and makes a lot of use of F#, another very open project. Similarly, we’ve also been able to package Filzbach up for R as well as for MatLab.