Abstract to "Model-based Gene Set Enrichment Analysis" by Dr. Julien Gagneur
The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the users interpretation.
I will present Model-based Gene Set Analysis (Bauer et al. NAR, 2010) in which we tackle the problem by turning the question differently. Instead of searching for all significantly enriched groups, we search for a minimal set of groups that can explain the data. We model the experimental observation by a set of "active" groups. Our model penalizes the number of active groups thus naturally providing parsimonious solutions. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.