Theory and Applications in Bioinformatics

Questions about this course?

Questions about applications?

Apply now!

Bioinformatics Education Online

Course provider:
The University of Manchester
Course contact:
Magnus Rattray (magnus@cs.man.ac.uk)
Summary:
After completing this module the students should have knowledge and understanding of:
  1. probability and statistics sufficient to correctly interpret the output of biological sequence analysis applications;
  2. key algorithms underlying the principle biological sequence analysis applications and their computational performance;
  3. the role of probabilistic models in biological sequence analysis.

After completing this module the students should be able to:

  1. design simple sequence analysis applications based on principles of probability theory (eg. simple Hidden Markov Models or Markov Chain models);
  2. carry out testing and benchmarking experiments to compare methods in a statistically principled way;
  3. generalise knowledge and understanding developed in (a),(b) and to provide a critical assessment of applications appearing in the Bioinformatics literature.
Syllabus:

Part 1 : Introduction to Probability and Statistics using Matlab

           Probability Distributions: density and mass functions
Examples:
Bernoulli (biased die DNA model)
Binomial,
Multinomial,
Gaussian (central limit theorem example).
Bayes' Theorem
Entropy
Model parameter estimation
Running Example: Estimating the bias in a 4-sided die
Maximum Likelihood Solution
Prior probability
Maximum a posteriori (MAP) Solution
Score Matrices (Blossum etc)

Part 2: Traditional Sequence Alignment and Search

2.1 Pairwise alignment (dynamic programming) This is a first algorithm, used to emphasise the concepts of time and space complexity. It is demonstrated in Bioinformatics 1, but this module will take a more formal approach.
2.2 Similarity search heuristics (BLAST, FASTA)
2.3 Significance of Scores (Traditional vs. Bayesian)
2.4 Multiple alignment heuristics (CLUSTALW etc)

Part 3 : Introduction to Probabilistic Sequence Models

3.1 Markov chain model
3.2 Hidden Markov model (HMM) intro
3.3 Pair-HMMs (probabilistic interpretation of alignment)
3.4 HMM algorithms

Part 4 : Markov Chains and HMM Applications

This section will focus on some applications in detail
4.1 Profile HMMs (PFAM,SAM-T98,Motif-HMMs)
4.2 DNA Applications (Gene finding)

Part 5 : Phylogenetic Methods

5.1 Evolutionary distance
5.2 Distance based methods
5.3 Parsimony
5.4 Maximum likelihood
5.5 Bayesian inference
Assessment:
Students will be assessed throughout the course, using the following means:
  1. Four exercises
  2. Project
  3. Project discussion
  4. On-line exam
Please note that this module has an on-line exam. This exam will take place AFTER the main 18 week teaching period, so as to give students time to revise. The exact date will be determined after consultation with students taking the course.
Further details:
Applicants should have successfully completed Introduction to Bioinformatics and Introduction to software development in Java, or be familiar with the material covered in Introduction to Bioinformatics and have some programming experience. Applicants should also ensure before the start of the course that they are familar with a prescribed set of mathematical pre-requisites. The pre-requisites are set out here. You may find it easier to read a printed copy of these pages.
Technical requirements:
You will be supplied with a copy of the student version of MATLAB. The system requirements are given here. We are only able to support installation on Microsoft Windows.
References:
1 Basic Mathematics for Biochemists
A. Corish-Bowden
OUP, 2nd Edition, 1999
2 Bioinformatics, sequence and genome analysis
D. W. Mount
3 Biological sequence analysis
R. Durbin, S. Eddy, A. Krogh and G. Mitchison
Cambridge University Press
4 Bioinformatics, the machine learning approach
P. Baldi and S. Brunak
4 Molecular Evolution: A Phylogenetic Approach
Roderic D.M. Page and Edward C. Holmes (Blackwell Science, 1998)

Back to module page

Updated 21 August 2009 by Heather Vincent