The Virtual Data Collaboratory (VDC) is a federated data cyberinfrastructure that is designed to drive data-intensive, interdisciplinary and collaborative research, and enable data-driven science and engineering discoveries. VDC accomplishes this by providing seamless access to data and tools to researchers, educators, and entrepreneurs across a broad range of disciplines and scientific domains as well as institutional and geographic boundaries. In addition to enabling researchers to advance research frontiers across multiple disciplines, VDC also focuses on (1) training the next generation of scientists with deep disciplinary expertise and a high degree of competence in leveraging data, cyberinfrastructure, and tools to address research problems and (2) helping data scientists and engineers develop and apply advanced federated data management and analysis tools for high impact scientific applications. To meet this mission, VDC extends beyond its collaborating institutions and leverages NSF investments to provide cyberinfrastructure typically not available to community colleges, state-associated colleges and universities, and regional liberal arts colleges and universities, and to stimulate intense user engagement and adoption by scientists across domains and institutions.
VDC represents state of the art data-intensive computing, storage, and networking solutions, integrated with an innovative data services layer. VDC is federated and coordinated across three geographically distributed Rutgers University campuses in New Jersey and multiple campuses in Pennsylvania and New York by a high-speed network, with the potential to incorporate academic/research institutions across the Mid-Atlantic and the nation. VDC builds on and integrates existing national/international and regional data repositories, including NSF-funded repositories, and leverages local/regional/national ACI investments. Central to the VDC vision are three infrastructural innovations, a regional science data science DMZ network that provides services to enable efficient and transparent access to data and computing capabilities, an expandable and scalable architecture for data-centric infrastructure federation, and a data services layer to support research workflows that utilize cutting-edge semantic web technologies, support interdisciplinary research, expand access, and increase the impact of data-science worldwide.
The main goal of the Virtual Data Collaboratory is to leverage our partnership with the NJBDA in order to impact analytics and data science courses across the state, fostering learning communities through easy-to-use online modules and classes centered on research-based data science and analytics. This will include analytics and data science programs at universities such as Rutgers, Penn State, Drexel, and CUNY, and across academic levels from high school workshops to post-graduate seminars.
Part of this mission includes providing resources for educators and students, bringing Big Data skills into the classroom. The resources provided on this site via the VDC are ready and easy to use, both within the classroom and beyond.
The “Dive Into Big Data” high school-level workshop is a great way to introduce students to the fundamentals of Data Science, as they complete an interactive experiment with live oceanographic data (courtesy of the Ocean Observatory Institute, or OOI) and visualize the results of simple data transformations. The materials for this workshop have been made available so that educators can host their own Dive Into Big Data workshop.
These materials include a description of the program, a workshop agenda for those planning on touring the Advanced Cyberinfrastructure facilities, a presentation giving an overview of the objectives of the program, and hands-on step-by-step instructions for how to complete the workshop using live data from the OOI.
After the activity is completed, students will take a short quiz and survey to assess the effectiveness of the workshop’s objectives. Questions regarding the RDI² facilities tour may be omitted or modified if the workshop was hosted from an external location.
RDI²’s VDC project team has hosted several events for undergraduates, graduate students, and beyond, including distinguished seminar series and roundtables.
Going forward, these events will be posted on our YouTube channel for instructors to utilize; one such event is the Data Science Career Panel, featuring industry speakers from within the field of Data Science. The speakers answer questions from prospective computer and data science students regarding what they can expect once entering the field professionally. This roundtable event is a useful resource for undergraduate and high school students interested in pursuing a career in data science.
As part of the NSF funded Virtual Data Collaboratory project, RDI2 is developing educational modules to help researchers solve their data issues and increase the impact of their research. One such module was the Introduction to Data Management seminar; held during May 2018, this seminar invited career researchers to a join RDI2 for a discussion of best practices for managing research data. The data created as part of research is important, and should be well-organized, well-preserved, accessible, understandable, and usable by the scholarly community.
The discussion included developments in data sharing, data collaboration, reproducible research, and more. Insights and feedback shared during the seminars are further incorporated into the educational modules, the materials for which have been made available as resources for educators.
|Virtual Data Collaboratory is supported by its members institutions and the United States National Science Foundation through the NSF award number 1640834. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.|