Data Management

The environmental data sets used in drinet are managed by the Purdue TeraGrid data management system.

As part of NSF's TeraGrid initiative, we have developed and deployed a flexible, multidisciplinary data management framework at Purdue University. This framework can be used to manage data from different sources and provide multiple access points for users from different communities with different levels of IT expertise.

The architecture of the framework consists of five layers: data capture layer, iRODS (Integrated Rule-Oriented Data System) layer, application layer, Web Services layer, and presentation layer. The base component is iRODS, a client-server middleware developed at the Data Intensive Cyber Environments Center (DICE Center) at the University of North Carolina at Chapel Hill that provides a uniform interface to heterogeneous resources. It also allows users to discover data based on logical attributes instead of physical file names and path names.

All resources are connected to the 40 Gbps high-speed TeraGrid backbone network via a 10 Gbps optical lambda. Our framework has been successfully applied to the management of several data collections from different application domains with different data formats, including LARS remote sensing image data, PTO satellite real-time data, NWS streaming NEXRAD radar data, and scientific datasets from climate modeling.

Users may access the data through various interfaces, including a user-friendly Gridsphere-based data portal; a set of iRODS client tools including command line utilities and web/desktop interfaces; a set of web service interfaces; and application-specific tools, including clients enabled by OPeNDAP/THREDDS.