SETN 2012 - LNAI 7297

Towards Better Prioritization of Epigenetically Modified DNA Regions

Ernesto Iacucci^1,2, Dusan Popovic^1,2, Georgios A. Pavlopoulos^1,2, Léon-Charles Tranchevent^1,2, Marijke Bauters^3,2, Bart De Moor^1,2, and Yves Moreau^1,2

¹ESAT-SCD, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Box 2446, 3001, Leuven, Belgium

²IBBT-K.U.Leuven Future Health Department, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Box 2446, 3001, Leuven, Belgium
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

³Department of Human Genetics, Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, Box 2446, 3001, Leuven, Belgium

Abstract. Epigenetic modifications of the genome can cause profound changes in phenotype of an organism. Experimental methods allow us to detect regions of the DNA that have been epigenetically modified; these regions are said to be enriched in a queried state versus a control. Detecting the enriched regions is not a simple matter as making sense of the data involves multiple analytical steps and often results in false calls. In this study, we analyze the utility of using additional features of the data (such as the transcription start site (TSS) and the histone coverage) to detect enrichment. We train a decision tree ensemble using these three features and review how well they identify regions that are truly enriched (as validated by q-PCR). We find that the enrichment score derived directly from ChIP-chip experiment data is less informative than the histone coverage.

Keywords: ChIP-chip, data integration, protein-DNA, machine learning, decision trees

LNAI 7297, p. 270 ff.

Full article in PDF | BibTeX