| Course provider: |
The University of Manchester |
| Course contact: |
Magnus Rattray (magnus@cs.man.ac.uk)
|
| Summary: |
After completing this module the students should have
knowledge and understanding of:
-
probability and statistics sufficient to correctly interpret the output of biological sequence analysis applications;
-
key algorithms underlying the principle biological sequence analysis applications and their computational performance;
-
the role of probabilistic models in biological sequence analysis.
After completing this module the students should be
able to:
-
design simple sequence analysis applications based on
principles of probability theory (eg. simple Hidden
Markov Models or Markov Chain models);
-
carry out testing and benchmarking experiments to compare
methods in a statistically principled way;
-
generalise knowledge and understanding developed in (a),(b) and to provide a critical assessment of applications appearing in the Bioinformatics literature.
|
| Syllabus: |
Part 1 : Introduction to Probability and Statistics using
Matlab
|
|
Probability Distributions: density and mass
functions
Examples:
Bernoulli (biased die DNA model)
Binomial,
Multinomial,
Gaussian (central limit theorem example).
|
|
|
Bayes' Theorem
|
|
|
Entropy
|
|
|
Model parameter estimation
Running Example: Estimating the bias in a 4-sided
die
Maximum Likelihood Solution
Prior probability
Maximum a posteriori (MAP) Solution
|
|
|
Score Matrices (Blossum etc)
|
Part 2: Traditional Sequence Alignment and Search
|
2.1
|
Pairwise alignment (dynamic programming) This is a
first algorithm, used to emphasise the concepts of time
and space complexity. It is demonstrated in
Bioinformatics 1, but this module will take a more
formal approach.
|
|
2.2
|
Similarity search heuristics (BLAST, FASTA)
|
|
2.3
|
Significance of Scores (Traditional vs. Bayesian)
|
|
2.4
|
Multiple alignment heuristics (CLUSTALW etc)
|
Part 3 : Introduction to Probabilistic Sequence Models
|
3.1
|
Markov chain model
|
|
3.2
|
Hidden Markov model (HMM) intro
|
|
3.3
|
Pair-HMMs (probabilistic interpretation of alignment)
|
|
3.4
|
HMM algorithms
|
Part 4 : Markov Chains and HMM Applications
This section will focus on some applications in detail
|
4.1
|
Profile HMMs (PFAM,SAM-T98,Motif-HMMs)
|
|
4.2
|
DNA Applications (Gene finding)
|
Part 5 : Phylogenetic Methods
|
5.1
|
Evolutionary distance
|
|
5.2
|
Distance based methods
|
|
5.3
|
Parsimony
|
|
5.4
|
Maximum likelihood
|
|
5.5
|
Bayesian inference
|
|
|
Assessment:
|
Students will be assessed throughout the course, using the following means:
- Four exercises
- Project
- Project discussion
- On-line exam
Please note that this module has an on-line exam. This exam will take place AFTER the main 18 week teaching period, so as to give students time to revise. The exact date will be determined after consultation with students taking the course.
|
| Further details: |
Applicants should have successfully completed Introduction to Bioinformatics and Introduction to software development in Java, or be familiar with the material covered in Introduction to Bioinformatics and have some programming experience. Applicants should also ensure before the start of the course that they are familar with a prescribed set of mathematical pre-requisites. The pre-requisites are set out here. You may find it easier to read a printed copy of these pages. |
| Technical requirements: |
You will be supplied with a copy of the student version of MATLAB. The system requirements are given here. We are only able to support installation on Microsoft Windows.
|
| References: |
|
1
|
Basic Mathematics for Biochemists
A. Corish-Bowden
OUP, 2nd Edition, 1999
|
| 2
|
Bioinformatics, sequence and genome analysis
D. W. Mount
|
| 3
|
Biological sequence analysis
R. Durbin, S. Eddy, A. Krogh and G. Mitchison
Cambridge University Press
|
| 4
|
Bioinformatics, the machine learning approach
P. Baldi and S. Brunak
|
| 4
|
Molecular Evolution: A Phylogenetic Approach
Roderic D.M. Page and Edward C. Holmes (Blackwell Science, 1998)
|
|