Spotlight on Microsoft Research: Big Data and Open Science

Posted by Tony Hey
Vice President of Microsoft Research Connections

There is a sea of change happening in science: It’s increasingly being driven by data and computation. The practice of science is now enhanced by collecting and analyzing massive quantities of data rather than small, focused experiments. The data are coming from instruments such as satellites, high-throughput biometric screening systems, networks of sensors and telescopes, as well as massive computer simulations. In this decade we will collect more scientific data than we’ve collected so far in the whole of human history. Soon it will be impossible to do any kind of science without computational tools?and the more advanced and powerful, the better for the scientist and the science.

Extending the challenge of increasing data quantities is a corresponding need to collaborate across numerous sources and data consumers. This is driving a trend toward open science data, open access to text and publications, open standards, and open collaboration around computational tools that best serve the science community. There’s a unique role right now for the computer science and IT industries to help scientists unleash the value of their data by allowing more contributors to derive insights, and combine and refine data regardless of its scale and complexity. Microsoft Research aims to play a part in this transformation of the scientific discovery process through offering combinations of breakthrough research, software assets, algorithms, and open collaboration to accelerate the process of reaching insight. This post will be the first of a series of profiles that highlight Microsoft Research collaborations in the spirit of open science and innovation.

Today we’re showcasing a noteworthy example at the intersection of big data analysis and open science. ChronoZoom, an open source community project released earlier this year, has the ambitious goal of presenting the history of everything and is proving to be a vital tool in the evolving field of Big History, which attempts to unify the past – from the beginning of time, some 13.7 billion years ago – with the present. Big History offers a broad understanding of how the past has unfolded, and lets us explore the unifying characteristics that can bridge the intellectual chasm between the humanities and the sciences. The project has been a truly collaborative effort by the University of California, Berkeley, Moscow State University, the Outercurve Foundation, and Microsoft Research Connections. ChronoZoom utilizes Windows Azure, HTML5, JavaScript and a rich user interface to bring the elements of Big History together. It is now available in 2.0 beta for public use, feedback and, ultimately, widespread collaboration. You can learn more about ChronoZoom through this video overview, or experience it firsthand at www.ChronoZoomProject.org.

We recently asked Roland Saekow, the ChronoZoom Community Project Lead at University of California, Berkeley, to reflect on his experiences in developing what is regarded as a master timeline of the cosmos, Earth, life, and human experience:

Q: How did you come up with the idea for ChronoZoom?
A: In my senior year at Berkeley, I took Walter Alvarez's Big History course. One of the hardest things to convey in a Big History course is the immense amount of time that makes up our past: All 13.7 billion years of it.

Professor Alvarez designed many different handouts to convey this vastness. In particular, there was a handout with four columns drawn from bottom to top that inspired me. The first column stretched the entire page, was labeled Cosmos, and represented 13.7 billion years. The next column was only about a third of a page long, labeled Earth and represented 4.5 billion years. Next to it was a column about the same length, labeled 4 billion years for the history of life. And next to that was a single thin line with an asterisk that was labeled "5 thousand years" for Written Human History. The asterisk explained simply that this line could be drawn as thin as possible, and even then it would still be a gross exaggeration of the span of written human history.

As an interdisciplinary studies major focusing on Science, Technology and Society, I knew that today's computing power could help solve this problem by showing the vastness of time in an interactive and visual way. So, inspired by this handout, I began discussing with Professor Alvarez on how we could turn this idea into reality.
 
Q: Why was it important that ChronoZoom be an open source project?
A: Early on, Professor Alvarez and I agreed that ChronoZoom should be a freely available educational and research tool, so that it could contribute to and grow the field of Big History. As an open source project, anyone can understand how ChronoZoom is built, and work with us to make it even better. We're so excited to have this opportunity to contribute to the field of Big History, while at the same time advancing the state of the art in information visualization, Big Data and HTML5 with the open source community.
 
Q: Why did you select Microsoft as a partner to build it out?
A: In 2009, Microsoft Live Labs was working on Seadragon, a zoom technology with an incredible zoom capability. Since Professor Alvarez had designed handouts with a series of columns where each one was an exploded view of the one preceding it, I wondered if Seadragon technology could be used to place each column within itself, instead of next to each other as an exploded view. Using a freely available tool from Microsoft called Deep Zoom Composer, I was able to begin experimenting and working with Professor Alvarez on building the first version of ChronoZoom, a raster-based graphics version.

Q: What was your experience working with Microsoft? What surprised you most?
A: We've had the great pleasure of working with Microsoft Researchers in developing ideas, brainstorming solutions, and building the project together. Microsoft's vast experience in planning, organizing and executing a project of this scale made it possible for us to bring ChronoZoom to life. The Microsoft team's passion, openness and dedication to work with us as academic partners surprised us the most. They helped us to speak the technical language necessary for building a large scale computing project, while we shared ideas and expertise from our domains.

Q: What did the Azure platform bring to the project?
A: Azure's vast and scalable distributed network made it possible to deploy ChronoZoom with little worry of running into capacity limits. The first version of ChronoZoom was especially resource intensive, involving thousands of tiny images since that system was based on a raster graphics approach. The current version of ChronoZoom is based on vector graphics and leverages a centralized database. ChronoZoom operates directly from the cloud, with an online authoring tool that is able to add new content directly from the web. Thanks to Azure, ChronoZoom is able to work completely independently of any client application. It is built and viewed in the cloud, anywhere, anytime.
 
Q: What kind of feedback have you received about ChronoZoom so far?
A: We've heard from specialists, historians, students and teachers from all over the world. They really like how ChronoZoom serves as a map of time, showing the past on a linear scale without distortion. Because adding new content is currently not a publicly available feature in the beta, we’ve received many requests to add specific timelines, events and other support other kinds of data, such as charts, phylogenetic trees and more. We're thankful for all the interest and suggestions, and have begun working with select partners to incorporate new content. When ChronoZoom comes out of beta, it will be possible for people to add their own data and timelines.
 
Q: What are some of the most interesting things you’ve seen people do with ChronoZoom?
A: ChronoZoom was designed and developed at the university level. We currently have plans to include different sets of content, one that is appropriate for university use, and another for high school use. It's really incredible watching young children and young adults use ChronoZoom, show it to their friends, and grasp – early on – their context in the bigger picture. Increasingly, Big History courses are being used as the foundation and introduction to a wide variety of subjects for younger audiences. Together, Big History and ChronoZoom can serve as a framework for their future courses.
 
Q: What would you like to see ChronoZoom become?
A: We hope ChronoZoom becomes a rich resource for students and teachers exploring the field of Big History, while also providing tools for researchers to uncover new insights from our past. By building a database of chronological events with all sorts of media, the hope is that we can better understand our past, find new trends and patterns to help shape our future. One of the things that makes Big History so exciting is that it brings together the humanities and sciences in a common conversation. ChronoZoom can help to bridge this gap by providing a visual map of our past.

Q: What has made now the right time for ChronoZoom to be launched into the global community?
A: As a field, Big History is about 20 years old and is now really taking off. Courses are now starting to be taught all over the world. This year, the International Big History Association will hold its first conference. There is also the Big History Project, an effort to teach Big History at the 9th grade level. A new field needs new tools, and we hope ChronoZoom will be among them.