Whole Genome Association NEW!
Progress has been made on a new version of Prioritizer: Prioritizer WGA. This new software tool and method combines both basic (statistical) functionality for performing preprocessing, quality control and single marker association analysis on raw genotype files from Illumina and Affymetrix WGA chips, but also includes a comprehensive genome viewer, for the joint exploration of the called genotypes and raw data, linkage disequilibrium patterns and genes underlying strong hits. Additionally it includes new functionality to help improve the reliability of detecting real disease SNPs by utilizing our functional human gene network.
Introduction
Although the majority of common diseases are complex, resulting from
many different genes with weak effects, it can be assumed there are
often only a limiting number of molecular pathways that contribute to disease
etiology. Linkage studies have led to the identification of a considerable
number of susceptibility loci, but lag behind in pinpointing genes
contributing to disease because these regions usually span 10s of Mb’s.
To aid in the identification of causative genes we propose a prioritization
method for positional candidate genes, by assuming that the majority of causative
genes are functionally closely related.
Methods
We used a Bayesian approach to generate a gene network, based upon
data from Gene Ontology (GO), KEGG, BIND, HPRD, Reactome, a dataset which contained
approximately 70,000 predicted protein-protein interactions (Lehner
and Fraser, 2004), 3,000 predicted human protein-protein interactions (Stelzl et al, 2005)
and co-expression data, derived from approximately
10,000 human microarray experiments stored within the Gene Expression
Omnibus and the Stanford Microarray Database.
We used the gene network to analyze 96 heritable disorders for which
at least three contributing disease genes have been identified. By
constructing artificial susceptibility loci around each disease gene,
containing 50, 100, 150 or 200 genes, we used a graph theoretic measure
to relate positional candidate genes in different loci with each other.
Finally we determined per gene an empiric p-value, which was used
to rank per disorder for each locus the positional candidate genes.
Overview | Basic principle of
the positional candidate gene prioritization method using gene networks.
Depicted in this figure are three different gene-gene interaction data
sources that are integrated in a Bayesian way. After integration of
the data sources the actual gene network is constructed. As an example,
all genes get an initial score of 0 assigned and three different susceptibility
loci, each containing a disease gene (P, Q or R) and two non-disease
genes, are analyzed. Per locus the three positional candidate genes
increase the scores of genes functionally nearby within the gene network, using
a kernel function which models the relationship between gene-gene
distance and score effect. Once all loci have been processed, shuffling
the three susceptibility loci 10,000 times across the genome allows
for the determination of an empiric p-value per gene, and the eventual
ranking of the positional candidate genes per locus. Genes P, Q
and R should then end up as the top ranked genes, as they have the
most significant p-values.

Screenshot | Prioritizer showing the results of the analysis of Turcot syndrome, a malignant tumor of the central nervous system, in which three disease genes (PMS2, APC and MLH1) have been implicated.
Discussion
For 43% of the disease genes the analysis of the loci using
the gene network performed well: the true disease genes were ranked within
the top 10 per artificial linkage region, when each region
contained 100 genes. Compared to a previous method (Turner
et al, 2003), in
which only 12% of the disease genes were correctly returned, our method
is somewhat less specific, but performances much better in identifying
the correct disease genes.
We have shown that by assuming that disease genes in a specific disorder
are usually functionally related, we are capable of substantially enriching
for true disease genes when analyzing susceptibility loci. This method
therefore could be valuable for analyzing common disease loci in which
the contributing disease genes have not yet been identified.
The resulting program (Prioritizer) allows for the analysis and visualization
of user-defined susceptibility loci, and will be available soon on this
website.
References
Lude Franke, Harm van Bakel, Like Fokkens, Edwin D. de Jong, Michael Egmont-Petersen, Cisca Wijmenga. 2006. Reconstruction of a functional human gene network, with an application for prioritizing positional candidate genes. Am J Hum Genet 2006 Jun;78(6):1011-25
Lehner, B. and A.G. Fraser. 2004. A first-draft human protein-interactionmap. Genome Biol 5:R63.
Turner, F.S., D.R. Clutterbuck, and C.A. Semple. 2003. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol 4:R75.
Dudbridge, F. and B.P. Koeleman. 2004. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am J Hum Genet 75: 424-435.
Stelzl U., Worm U., Lalowski M., Haenig C., Brembeck F.H., Goehler H., Stroedicke M., Zenkner M., Schoenherr A., Koeppen S., Timm J., Mintzlaff S., Abraham C., Bock N., Kietzmann S., Goedde A., Toksoz E., Droege A., Krobitsch S., Korn B., Birchmeier W., Lehrach H. and Wanker E.E. 2005. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005 Sep 23;122(6):957-68.