genetics department | lude franke | prioritizer

Whole Genome Association NEW!
Association StudiesProgress has been made on a new version of Prioritizer: Prioritizer WGA. This new software tool and method combines both basic (statistical) functionality for performing preprocessing, quality control and single marker association analysis on raw genotype files from Illumina and Affymetrix WGA chips, but also includes a comprehensive genome viewer, for the joint exploration of the called genotypes and raw data, linkage disequilibrium patterns and genes underlying strong hits. Additionally it includes new functionality to help improve the reliability of detecting real disease SNPs by utilizing our functional human gene network.


Introduction
Although the majority of common diseases are complex, resulting from many different genes with weak effects, it can be assumed there are often only a limiting number of molecular pathways that contribute to disease etiology. Linkage studies have led to the identification of a considerable number of susceptibility loci, but lag behind in pinpointing genes contributing to disease because these regions usually span 10s of Mb’s. To aid in the identification of causative genes we propose a prioritization method for positional candidate genes, by assuming that the majority of causative genes are functionally closely related.

Methods
We used a Bayesian approach to generate a gene network, based upon data from Gene Ontology (GO), KEGG, BIND, HPRD, Reactome, a dataset which contained approximately 70,000 predicted protein-protein interactions (Lehner and Fraser, 2004), 3,000 predicted human protein-protein interactions (Stelzl et al, 2005) and co-expression data, derived from approximately 10,000 human microarray experiments stored within the Gene Expression Omnibus and the Stanford Microarray Database.

We used the gene network to analyze 96 heritable disorders for which at least three contributing disease genes have been identified. By constructing artificial susceptibility loci around each disease gene, containing 50, 100, 150 or 200 genes, we used a graph theoretic measure to relate positional candidate genes in different loci with each other.

Finally we determined per gene an empiric p-value, which was used to rank per disorder for each locus the positional candidate genes.

Outline
Overview | Basic principle of the positional candidate gene prioritization method using gene networks. Depicted in this figure are three different gene-gene interaction data sources that are integrated in a Bayesian way. After integration of the data sources the actual gene network is constructed. As an example, all genes get an initial score of 0 assigned and three different susceptibility loci, each containing a disease gene (P, Q or R) and two non-disease genes, are analyzed. Per locus the three positional candidate genes increase the scores of genes functionally nearby within the gene network, using a kernel function which models the relationship between gene-gene distance and score effect. Once all loci have been processed, shuffling the three susceptibility loci 10,000 times across the genome allows for the determination of an empiric p-value per gene, and the eventual ranking of the positional candidate genes per locus. Genes P, Q and R should then end up as the top ranked genes, as they have the most significant p-values.

Prioritizer Screenshot
Screenshot | Prioritizer showing the results of the analysis of Turcot syndrome, a malignant tumor of the central nervous system, in which three disease genes (PMS2, APC and MLH1) have been implicated.

Discussion
For 43% of the disease genes the analysis of the loci using the gene network performed well: the true disease genes were ranked within the top 10 per artificial linkage region, when each region contained 100 genes. Compared to a previous method (Turner et al, 2003), in which only 12% of the disease genes were correctly returned, our method is somewhat less specific, but performances much better in identifying the correct disease genes.

We have shown that by assuming that disease genes in a specific disorder are usually functionally related, we are capable of substantially enriching for true disease genes when analyzing susceptibility loci. This method therefore could be valuable for analyzing common disease loci in which the contributing disease genes have not yet been identified.

The resulting program (Prioritizer) allows for the analysis and visualization of user-defined susceptibility loci, and will be available soon on this website.

References