Extracting knowledge from data.
Working group members: Sudipto Banerjee, Keith Chen, Alexander Hoffmann, Ted Parson, Todd Presner
Data science is a multidisciplinary field that focuses on methods to capture, maintain, process, analyze and communicate with big data. It requires skills in statistics, mathematics, computer science, data visualization, and communication. Researchers in a variety of fields today encounter new and ongoing data analytic challenges emanating from diverse subjects, including climate change, environmental degradation, emerging epidemics, biomedicine and biology, health disparities, food insecurity, homelessness, racial discrimination and violence, social and behavioral research. To address these challenges, statistical modeling, machine learning algorithms, and emerging technologies in computational data science are rapidly becoming a mainstay in research and development in each of these subject areas. For example, data-intensive research in the basic sciences and in medicine has witnessed an explosion of interest largely attributable to rapid developments in high-performance computing, data analytic methods and statistical programming environments. Technological advances have produced massive databases on a variety of outcomes that are accessible to researchers, administrators, and policy-makers. This “data deluge” poses new challenges in training the next generation of data scientists who will be tasked with analyzing massive databases. The increasing availability of electronic health records (EHR), administrative records, geospatial data, genomic data, social media and web-generated data and the linking of these datasets provide a unique and timely opportunity to advance understanding of biological, behavioral, environmental and social phenomena.
Research and training in data science will pervade across a variety of areas within the basic and natural sciences, social and behavioral sciences, economics, biomedical health sciences, mathematics, engineering, and statistics and biostatistics. Broadly, research and educational activities in data science on the UCLA campus can be classified into theoretical and methodological research in data science and their substantive applications. This working group feels that IDRE is uniquely situated in terms of the bridges it has built on campus and in its potential to build new connections and form new synergies in data science across the UCLA campus. The group will work to advance IDRE’s role in these connections, and to advocate for investment in infrastructure and initiatives for data science support including improvements to the Hoffman2 high-performance computing cluster and other cloud infrastructure, data science short courses and workshops, and support for PIs who are preparing computationally intensive grant proposals.