Identification of differential transcription in Drosophila embryos from tiling microarray data
Tiling data provides a map of transcribed regions across an entire genome without the biases inherent in "traditional" microarray data. However, the data are noisy and do not alone give insight to the structure or function of what is transcribed. As more and more genomes are sequenced, and tiling arrays become available for them, the automated annotation of transcribed regions becomes both possible and important. A number of tools address this objective (e.g. ARTADE) and approaches have been suggested, but there is still a need to assess what is available and to investigate whether novel approaches, or novel combinations of existing approaches, can improve such automated annotation. In particular the annotation of non-coding genes from tiling data is seen as a potentially valuable area of research as these genes are currently under-recognised by conventional gene prediction methods.
Objectives
- Review the available tools for predicting gene structure from tiling data.
- Benchmark these tools against a set of available insect tiling data.
- Review available methods for analysing the data that may not yet have been incorporated into automated tools.
- Propose one or more approaches that may improve performance, either in isolation or in combination with existing methods. Suggested novel approaches include:
- combination of Pol II chip-chip and expression data to identify gene boundaries and intron/exon structures;
- use of covariance in tissue type and/or developmental series data sets to identify gene boundaries;
- windowing, threshold and SVM approaches to pattern recognition.
- Implement one or more of the suggested novel approaches.
- Benchmark the novel approach(es).
This work built on skills developed in our distance learning course in microarray data analysis.