Amazon Research Grant Approved for the EC2 Hadoop Cluster for INSITE

Feb. 12, 2014


INSITE is pleased to report that an Amazon research grant has been received.

Grant Summary

Non-communicable diseases (NCD), like diabetes are caused by social, environmental, and genetic factors. Rapidly scaling datasets from big data sources such as genome sequencing, social media, and medical sensors, make it possible to explore factors associated with NCDs in unprecedented new ways. These data hold the promise of a new understanding to complex disease that can be applied towards prevention and directed therapy. Yet, the toolkit and necessary big data platform to analyze these large-scale datasets is under-developed, stymieing the use of new data and translation to biomedical knowledge. To unlock these valuable resources, a new ecosystem of interoperable, scalable and discoverable biomedical data resources is required. To this end, we propose to develop big data tools using graph database technology for biomedical research through an interdisciplinary collaboration called BRIDGE (Biomedical Research Innovation through Dynamic Graph Engineering).

We aim to use and expand on rapidly developing graph database technology from open source projects to: (1) build tools to convert biomedical data into interoperable graph databases, (2) develop methods to interconnect disparate graph databases, (3) develop robust strategies for preserving privacy when translating biomedical data to graph database networks, and (4) develop methods for visualizing network data for exploratory and predictive analyses. These tools can be applied broadly in any biomedical or disease area to uncover unseen patterns from disparate big data sources. Further, the proposed research addresses unsolved problems in graph database theory, especially related to privacy preservation and graph database network interoperability, that are broadly applicable in any field of study towards big data analytics. This work has the capacity to transform how biomedical data are stored, analyzed and visualized towards an integrated biomedical knowledge environment.

Our proposal focuses on four core areas of data science research to develop a robust data toolkit: (1) graph database storage and fast retrieval using the Hadoop architecture, (2) data linking via graph databases, (3) privacy preservation for network data in graph databases, and (4) large-scale map-based data visualization of networks. This big data toolkit will initially be developed in partnership with clinicians and biomedical researchers using data from both traditional sources (e.g. electronic health records and lab tests) and non-traditional sources (e.g. sensors and social media) allowing for connectivity across various data modalities. Moreover, characteristics of different data types will be explored including very small to very large datasets and sparse to dense datasets within the same unified framework.

Therefore this work offers new advances in: (1) how large scale clinical data sets are stored and mined, (2) data sharing by informatically removing barriers given patient privacy, and (3) data visualization of large-scale multi-dimensional datasets for discovery and clinical interpretation.

The BRIDGE collaboration proposes to develop innovative tools for big data analytics through extending open source big data technologies and providing new health related data functionality. These big data tools will be fully integrated with one another and made available as open source software. The tools produced through this proposal will also promote the development of an ecosystem of interoperable re-usable biomedical data resources. This will allow researchers to move quickly from data production, to management and analysis with relevant datasets to find novel correlations that drive scientific inquiry in unprecedented ways. Lastly, training will be provided to undergraduate, and graduate (Master’s and Doctoral students) in the data science techniques using biomedical datasets described for this project.

Image Courtesy Pixabay