Digital Archiving of Astronomical Data to Support Publication and Long Term Preservation

Researchers at Johns Hopkins and Microsoft are collaborating to undertake Digital Archiving of Astronomical Data to Support Publication and Long-term Preservation. Activities will involve developing a repository-based infrastructure that will support the end-to-end process of capturing, curating, preserving, and providing access to all of these data for the long term. This system will build upon existing repository and electronic publishing systems (e.g., Fedora, PubMedCentral, DPubS) and Virtual Observatory web services and data format standards (e.g., FITS).

One of the most fundamental aspects of scientific scholarly communication is the ability to cite and examine data in a persistent manner.  Without this ability, the very essence of the scientific method, with its requirement of validating results, becomes compromised.  Large-scale astronomy projects such as the Sloan Digital Sky Survey ( have gathered data at unprecedented rates, raising new challenges and opportunities.  This explosion in data-driven science has led to fundamental changes in practice and modes of inquiry, prompting the National Science Foundation (NSF) to advance the evaluation and development of Cyberinfrastructure to support large-scale, digital science projects.  Both the Library of Congress' National Digital Information Infrastructure and Preservation Program (NDIIPP at and the NSF Blue-Ribbon Panel on Cyberinfrastructure report (National Science Foundation 2003) stress the essential aspect of digital archiving of datasets to ensure long-term access.  Most importantly, this year’s Institute of Museum and Library Services’ (IMLS) National Leadership Grant guidelines for demonstration projects invite efforts to “develop pilot projects or programs in data curation.”  This proposal directly addresses this important and urgent priority.  Without immediate action, we may find ourselves in a “digital dark age” losing important, scholarly resources from the scientific domain.

The National Virtual Observatory (NVO) project is playing a leadership role in building services for the astronomy community to access and analyze astronomical data ( For good reason, the NVO is often cited as one of the quintessential cyberinfrastructure projects. With projects such as NVO, the astronomy community has moved into the forefront of data-intensive digital science, providing a path for other disciplines to consider.  However, thus far the scope of the NVO has deliberately not included long-term data curation, focusing instead on data location and data access standards and protocols.  Based on extensive, ongoing dialogue and communication, the NVO project team, led by researchers at Johns Hopkins University (JHU), has concluded that academic research libraries represent the ideal home for long-term preservation and curation of large-scale datasets to support persistent access and scholarly communication, given their expertise and long-term, sustainable support from universities.

