National Data Service Consortium Workshop, April 4-6 2016
Carol Song and Rajesh Kalyanam attended the NDS Consortium Workshop (http://www.nationaldataservice.org/get_involved/events/NDS5/) hosted by RENCI, between April 4 -6, 2016, where Carol presented the GABBS project in a panel on DIBBs project efforts. The NDS Consortium (http://www.nationaldataservice.org/) seeks to improve ways in which data is found, shared and re-used to foster better interoperability and collaboration across disciplines. The goal is to create the data services and tools necessary to enable this interoperability while bringing together existing initiatives in this space like the RDA (Research Data Alliance), the DataNET platform and the NSF DIBBs projects.
The first day of the workshop was a tutorial session demonstrating the capabilities of the sandbox environment created by NDS developers, called NDS Labs. This environment enables developers to discover, re-use and build service stacks consisting of interoperating containerized tools , services and applications. Organized around Docker containers and Kubernetes, NDS Labs defines a service specification language (based on JSON) to connect containers along dependency chains, expose configuration options and shareable data volumes. It also provides a client API and web interface for adding new service specifications and launching the associated container stacks for development. The developers demonstrated examples of such integration involving iRODS data storage and a data management platform, Dataverse. NDS Labs is designed to be both installable on a user's personal machine and for production-scale deployment. The NDS Consortium encourages the development of pilot projects that try to enable interoperation between distinct efforts, to create a system deployable in NDS labs.
The latter two days of the workshop included several panel talks spanning the data cycle spectrum, starting with data generation, data curation, publishing and sustainability. The goal was to determine the role that NDS can play in designing a common architecture that encapsulates all these requirements and drives tool and service creation that works towards the NDS vision of a "federated, interoperable and integrated" national-scale data service.
From the GABBS project perspective it is not yet clear how we would contribute / use NDS Labs since the HUBzero infrastructure provides the necessary development environment for tools (Rappture, the workspace environment). The common data access protocols have not yet been defined for NDS, and so it is not clear how we would use/re-purpose our pre-existing infrastructure to enable interoperability with NDS services. As a starting point, it might be instructive to attempt to integrate a sample hub tool like geobuilder with a data transformation service like NCSA's BrownDog opening up the different file formats on which the geobuilder tool can be invoked (formats not directly usable in Geobuilder can be transformed using BrownDog). This integration experiment could be carried out in NDS Labs , but would require containerizing the geobuilder tool as well as our current data management infrastructure (iRODS). It is also not clear what role a containerized version of the HUBzero CMS would play in the NDS space; however individual components of HUBzero like the Rappture tool development kit seem immediately usable if containerized.
In addition to learning about NDS, we were also made aware of several projects with similar goals to us, Dataverse, Discovery Environment, Sciserver and Scidrive. Discovery Environment is a creation of the iPlant collaborative that uses iRODS heavily and has driven several iRODS (which is also highly relevant to us) improvements in the recent past. A researcher from U of Colorado, Boulder's LASP Group (http://lasp.colorado.edu/home/) mentioned their work using OpenDAP to subset and serve large data files and their interest in integrating it with iRODS storage. He also mentioned their work on a common data format used as an intermediate for data transformation from a source to target format. This might be instructive towards our ideas on a similar common data format enabling interoperability of geospatial data.