CIS Seminar - Constellation: A Science Graph Network for Scalable Data and Knowledge Discovery in Extreme-Scale Scientific Collaborations

Computer Science Alumni Lecture

Sudharshan S. Vazhkudai, PhD
R&D Manager, Technology Integration Group
Oak Ridge National Laboratory (ORNL)

3:00 PM, Wednesday October 21 2015
235 Weir Hall

Constellation: A Science Graph Network for Scalable Data and Knowledge Discovery in Extreme-Scale Scientific Collaborations

Abstract:

“Just as a stargazer looks up at the night sky with billions of stars to see the patterns he desires, a scientist should be able to derive associations and insights from the millions of data products, processes, publications and other resources.”

Extreme-scale simulations on leadership-class systems, e.g., Titan, are producing tens of millions of data products that need to be discovered, correlated and analyzed by a distributed community to glean insights. The ensuing analyses also produce numerous derived data and results that need to be captured to facilitate future lines of inquiry. The metadata surrounding the collaborative data production and the associated processes, leading up to the publication and curation of artifacts, needs to be captured to facilitate the scalable discovery of data and knowledge pathways. There is a wealth of information (metadata) that is distributed across the resource fabric of the collaboration, which if harnessed efficiently can help with numerous data disposition questions. Further, resource information alone cannot address the complex interrelationships between the various resources, which is a key deficit in the current state-of-the-art in collaborative software systems that leads to isolation gaps. Thus, there is the need to federate the metadata into a sophisticated construct that lends itself for the scalable discovery of resources.

In this talk, I will describe the Constellation system that addresses the aforementioned issues via two fundamentally novel concepts. The first concept is the creation of a transformative, non-hierarchical, “science graph network” structure that can bridge the isolation gaps within a collaboration. The science graph network serves as a scalable way to both federate and correlate information (metadata) from the resource fabric of the collaboration (e.g., users, data, jobs, publications, knowledge bases, hardware). Viewing a scientific collaboration through the prism of a graph connectivity network allows us to naturally build complex associations between resources, and discover new data pathways by both exploiting graph properties and performing graph data analytics and mining.

To build the science graph, however, we need rich metadata and information about the resources in the collaboration. This leads us to the second concept, which is a foray into the construction of rich knowledge indexes atop information extracted from the resource fabric. Therefore, a key challenge to address is how sophisticated knowledge structures and indexes can be built in a non-intrusive fashion, with as little user or domain input as possible. Together, these concepts will bring about a transformative impact on scalable knowledge discovery in scientific collaborations. The new knowledge pathways that can be discovered via the science graph network approach would normally be impossible or extremely inefficient with the current state-of-the-art.

Bio: Dr. Sudharshan S. Vazhkudai is an R&D Manager at the Oak Ridge National Laboratory (ORNL), a U.S. Department of Energy Lab. He leads the Technology Integration group that is responsible for building the systems software, storage and data management solutions for the Oak Ridge Leadership Computing Facility (OLCF), which is home to the 27 petaflops Titan supercomputer. From 2003 to 2012, he was a Research Scientist in the Computer Science Research group at ORNL, working in areas such as distributed storage, HPC I/O, multicores, data management and non-volatile memory. During this time, he was also the lead architect of the distributed computing and scientific data management solution for the Spallation Neutron Source at ORNL, a billion dollar neutron scattering facility. He holds a Joint Faculty appointment at the University of Tennessee, Knoxville. Dr. Vazhkudai has served as the PI on several grants from the DOE, NSF and NIH. He received a Ph.D. in Computer Science from the University of Mississippi in 2003. His doctoral work was on wide-area distributed data management, and was conducted at the Argonne National Laboratory in Chicago, where he was a Wallace Givens Fellow.