Lecture Series Algorithms in Bioinformatics WS 2013/2014

Lecture Series: Algorithms in Bioinformatics Winter Semester 2013/2014

The lecture series "Algorithms in Bioinformatics Winter Semester 2013/2014" is organised by Professor Holger Fröhlich. The venue of this lecture serie is B-IT lecture hall at 17.00 hours at the dates listed below:

An Algorithm for Flowgram-String Alignment and Applications to Amplicon Methylation Analysis by Deep Bisulfite Sequencing

5 December 2013

Prof. Dr. Sven Rahmann UAMR-Professor for Computational Biology Genome Informatics, Institute for Human Genetics, University Hospital Essen, University of Duisburg-Essen, Germany

A read from 454 or Ion Torrent sequencers is natively represented as a flowgram, which is a sequence of pairs of a nucleotide and its (fractional) intensity. Recent work has focused on improving the accuracy of base calling (conversion of flowgrams to DNA sequences) in order to facilitate read mapping and downstream analysis of sequence variants. However, base calling always incurs a loss of information by discarding fractional intensity information. We argue that base calling can be avoided entirely by directly aligning the flowgrams to DNA sequences. We introduce an algorithm for flowgram-string alignment based on dynamic programming, but covering more cases than standard local or global sequence alignment. We also propose a scoring scheme that takes into account sequence variations (from substitutions, insertions, deletions) and sequencing errors (flow intensities contradicting the homopolymer length) separately. This allows to resolve fractional intensities, ambiguous homopolymer lengths and editing events at alignment time by choosing the most likely read sequence given both the nucleotide intensities and the reference sequence. We demonstrate an application of this new approach for DNA methylation analysis from deeply sequenced amplicons that underwent bisulfite treatment.

Understanding the biological system in terms of pathways and gene regulatory networks

12 December 2013

Prof. Frank Emmert-Streib Center of Cancer Research and Cell Biology Queen's University, Belfast

Breathtaking technological progress fueled by the human genome project enables nowadays a quantitative, data-driven approach to study not only basic biological processes, but also biomedical and clinical questions. For this reason, computational statistics approaches are needed for analyzing, integrating and interpreting high-throughput genomics data. In this talk I will discuss approaches to interrogate gene expression data on the pathway level and beyond. Specifically, I will present results comparing gene set/pathway methods as well as inference methods for gene regulatory networks for different cancer data sets. This is joint work with Galina Glazko, Ricardo de Matos Simoes and Matthias Dehmer.

Approximation Algorithms for Combinatorial Optimization Problems on Power Law Networks and Some Applications

19 December 2013

Dr. Mikael Gast B-IT Research School, Germany

One of the central tasks in the analysis of large scale networks is the identification of a set of key nodes such as to reach or to affect the whole network from this - ideally small - set. Areas of application range from the World Wide Web and the Internet to various social and biological networks. In this talk we present new results on approximability of some fundamental optimization problems on such networks. Some of the results are first of the art (sometimes optimal), on those networks. We give also a short survey on relatively new concepts of hyperbolic networks and their applicability in the analysis of the real-world large scale networks. (joint work with Mathias Hauptmann and Marek Karpinski)

Network-Based Biomarker Discovery: Development of Prognostic Biomarkers for Personalized Medicine by Integrating Data and Prior Knowledge

11 February 2014

Yupeng Cun B-IT Research School, Germany
Important note : This lecture will begin at 3.00 pm in B-IT Lecture hall

Advances in genome science and technology offer a deeper understanding of biology while at the same time improving the practice of medicine. The expression profiling of some diseases, such as cancer, allows for identifying marker genes, which could be able to diagnose a disease or predict future disease outcomes. Marker genes (biomarkers) are selected by scoring how well their expression levels can discriminate between different classes of disease or between groups of patients with different clinical outcome (e.g. therapy response, survival time, etc.). A current challenge is to identify new markers that are directly related to the underlying disease mechanism.

In this talk, we first compared a collection of published gene selection methods, of which some include network information. Our results show that incorporating prior knowledge of network information into gene selection method in general does not significantly improve classification accuracy, but greatly enhances the interpretability of gene signatures compared to classical algorithms. In a next step we developed a new method, called stSVM, which integrates both, network information as well as gene and microRNA expression profiles, into one classifier. This new approach not only shows superior prediction performance, but also stability and interpretability of selected features. An open source software, called netClass, was developed for implementing the proposed feature selection algorithm.

Statistical Learning in HIV Research

20 February 2014

Dr. Nico Pfeifer Max-Planck Institute of Computer Science, Germany

Even after more than 25 years of HIV/AIDS research there is still no approved treatment to cure an HIV infection, which is why a lot of research has been devoted to antiretroviral drugs that can lower viral load. Nevertheless, recent breakthroughs in HIV vaccine research have revitalized hope that a universal vaccine against HIV can be found.

In HIV vaccine research, we were mainly concerned with cytotoxic T cell lymphocyte (CTL) responses against HIV. We showed how to model viral escape with Bayesian networks accounting for founder effects, co-variation and pressure by the immune system. Additionally, we showed how to perform a genome wide association study given human genetic variation as well as viral genetic variation. The first part of the talk will be about these developments.

For HIV drug resistance I was mainly concerned with HIV viral tropism prediction to determine whether the viral population of a patient is susceptible to a certain entry inhibitor treatment. This depends on whether these viruses can use only the coreceptor that can be blocked by the treatment or whether they could also use another coreceptor. We recently showed at ECCB how to build a predictor from NGS data to improve not only predictions for NGS samples but also samples from Sanger sequencing. Furthermore, we introduced a new visualization method to make non-linear prediction methods more interpretable.The second part of the talk will be about these developments.