Computing and Mathematical Sciences Colloquium
Principal component analysis (PCA) of genotype matrices has become a standard tool for studying population structure in genetics and has also been used to control for such structure in genome wide association studies. I will discuss how to generalize the PCA framework to the study of more complex genotype-phenotype interactions via a probabilistic model that subsumes probabilistic PCA and canonical correlation analysis (CCA) in a common framework. The model, which we term factored association analysis (FAA), also addresses issues of overfitting when CCA is used naively. Using FAA, I will demonstrate evidence for population structure in gene expression, and also show how it can be used to analyze multiple diverse genomic datasets, in particular from cancer genome projects. This is joint work with Nicolas Bray, Brielin Brown, and Shannon McCurdy.