Spotlight on Microsoft Research: Improving Scientific Data Sharing and Management

Posted by Kerry Godes
Senior Manager – Worldwide Marketing and Operations

The absent-minded professor is an exaggerated movie cliché. But like most clichés, it does have a kernel of truth within it – namely that people who are absolutely brilliant in one area are often less disciplined in others. In certain scientific communities, an area of inattention has been data management, sharing and archiving.

The University of California Curation Center (UC3) at the California Digital Library (CDL) is aiming to address that problem with DataUp, an add-in (a kind of extension) for Microsoft Excel that is available via https://dataup.cdlib.org staring today. DataUp was created with support from Microsoft Research Connections and with funding from the Gordon and Betty Moore Foundation.

We spoke with Carly Strasser, project manager for DataUp, and John Kunze, associate director of UC3.

Please give our readers some background on DataUp
Strasser: This project came about because researchers are being challenged in new ways with digital data and are generally not educated as to how best to manage and document their data. The folks at CDL thought, “What can we do as a library to help these researchers who don’t have the skill set to take care of their data?” Microsoft Excel is a big part of most researchers’ workflows, so we created a tool that would integrate well into those workflows and make it seamless to manage data.

What benefits will scientists and researchers realize from DataUp?
Strasser: The biggest problem we noticed is that researchers don’t know where to start when it comes to data management. At the same time, new funding requirements are forcing them to think about data management far in advance. DataUp has the ability to help them manage data better. It gives them the ability to put their data in a repository and it helps them create metadata so they don’t have to figure out how to do it on their own. It does all of this in Microsoft Excel, so it streamlines the process of managing and sharing data.

How specifically does DataUp enable data sharing?
Strasser: Sharing data requires a lot of steps before the actual sharing. You need to document it well. You need to make sure it’s archived for the long-term. It needs to have a persistent identifier that works within a URL. DataUp helps you describe your data well enough so that someone else is able to use it.

What applications might DataUp have beyond the scientific community?
Strasser: The target audience for the tool is Earth, environmental, atmospheric and oceanographic researchers. These are scientists who tend to work individually on research projects and don’t typically have access to good data management tools. That said, we think DataUp will be useful across any kind of research community, including the social sciences, economics, and biomedicine. That’s why it was so important to make DataUp open source, so these groups can tailor it to their needs.  

Why else was it important that the DataUp add-in be released as open source?
Strasser: The obvious reason from the CDL standpoint is that we don’t have a lot of funds, so we don’t have a lot of developers working on DataUp. We’re excited about allowing the community to take it on, work on it and improve it over time. There are a lot of groups interested in customizing it for their needs. Open source makes that possible and we think adoption will be high as a result. We’re working with the Outercurve Foundation to help socialize DataUp in the open source community.

How did you wind up working with Microsoft Research Connections? What did they contribute to the project?
Kunze: We started to talk about DataUp with Microsoft Research back in 2009 at the International Digital Curation Conference. We already had good relations with Microsoft, so that turned into a proposal to get the project funded. Microsoft provided funding for the development of the project, the Gordon and Betty Moore Foundation supplied salary support, and CDL provided project management.

Were you surprised at Microsoft’s willingness to contribute to an open source project?
Kunze: We were, but they were talking about open source from the beginning, and the company has been very consistent throughout the project.

What has been the reaction in the scientific community to DataUp?
Strasser: We’ve talked about it at several scientific meetings, at library meetings and on UC campuses. We’ll be talking about it further at Microsoft’s eScience Workshop in Chicago on October 9 and at the AGU (American Geophysical Union) Conference in San Francisco in December. There’s a lot of excitement about the potential for this tool. Professors are excited about using it as a teaching tool in graduate classes for scientists. Groups of citizen scientists are interested in adopting it to ensure data quality. I’ve also had conversations with data repositories about using it to streamline the process of getting data into their particular repositories.

What kinds of collaborations would you like to see Microsoft undertake with open source solutions to benefit the scientific and academic communities moving forward?
Kunze: There’s no shortage of work to do and no shortage of extensions to create for this particular tool. One thing we had always dreamed of was at some point getting the extension incorporated into Microsoft Excel so you wouldn’t have to download it as an add-in.
Strasser: There’s an endless supply of cool things we could do with DataUp.

For more information on DataUp, please visit https://dataup.cdlib.org