skip to main content
Caltech

Computing and Mathematical Sciences Colloquium

Monday, January 23, 2017
4:00pm to 5:00pm
Add to Cal
Annenberg 105
Structured Factor Models to Find Interpretable Signal in Genomic Data
Professor Barbara Engelhardt, Computer Science Department Center for Statistics and Machine Learning , Princeton University,

Latent factor models have been the recent focus of much attention in `big data' applications because of their ability to quickly allow the user to explore the underlying data in a controlled and interpretable way. In genomics, latent factor models are commonly used to identify population substructure, identify gene clusters, and control noise in large data sets. In this talk I present a general framework for Bayesian structured latent factor models. I will illustrate the power of these models for a broad class of problems in genomics via application to the Genotype-tissue Expression (GTEx) data set. In particular, by using a Bayesian biclustering version of this model, the estimated latent structure may be used to identify gene co-expression networks that co-vary uniquely in one tissue type (and other conditions). We validate network edges using tissue-specific expression quantitative trait loci.

For more information, please contact Carmen Nemer-Sirois by phone at (626) 395-4561 or by email at [email protected].