The CERC ECD Grant provided me with the opportunity to attend the Summer Institute for Statistics for Big Data (SISBID), held at the University of Washington, July 19 to 24, 2017. I attended two SISBID modules explaining and demonstrating the application of modern statistical techniques for the analysis of biological Big Data.
The first module discussed supervised learning techniques applied to high-dimensional data, such as performing regression, classification, and survival analysis. Practical examples and exercises were part of this course, and the attendees participated in the discussion of the common limitations of this increasingly popular field in statistics. The techniques discussed and exercises were demonstrated in R programming language.
The second module focused on unsupervised learning techniques for biological Big Data. Unsupervised techniques aim to discover patterns when outcomes have no associated observations and learning is based solely on the data structure. The techniques discussed during this module covered data dimension reduction, clustering analysis and classification, and network analysis with graphical models. During one of the course sessions, the combined use of the unsupervised concepts for real-world problem solving was also explored using examples and exercises. Though most of the course focused on the application of supervised and unsupervised learning to high-dimensional data sets, such as those in the fields of genetics and biomedical imaging, the techniques discussed can be applied to new fields in statistical analysis.