What QGene's permutation analysis does

The problem of setting significance thresholds for multiple QTL analyses involving linked, nonindependent markers can be addressed by permutation analysis, a resampling procedure whose application to this problem was suggested by Churchill and Doerge, Genetics 138: 963-971 (1994). It shows the range of statistics possible under the null hypothesis -- that there is no association between marker genotype and phenotype -- by repeatedly shuffling either of these to destroy any but random associations between them. Finally the statistic computed from unshuffled data is compared against this range.

In the figure, the white ticks along the base of the plot indicate the positions of the markers on chromosome 5A. The white contour represents that for the unshuffled data, while the red contours are those produced at each iteration. The horizontal thresholds are drawn after the shuffles are complete. Note that the "Max" line grazes the highest peak in any of the contours. The green "99%ile" line will lie higher than 99% of the maxima (leaving only 10 of 1000 maxima above it).

For analysis of a single chromosome such as this, QGene 3.0 will compute the interval map up to 10,000 times, storing at each iteration the maximum LOD value along the chromosome, and finally extract the percentiles from the array of maxima. In this case genome-wise thresholds are computed via a Bonferroni correction for the number of chromosomes in the genome. For analysis of all chromosomes in the map, the maximum LOD across the genome at each shuffle is stored and finally the genome-wise percentiles are extracted from this maxes array. Data completion is always used for interval permutations.

Sample output from interval permutation run

Permutation analysis for aSSRs.data
Trait KBY95_2, chromosome 5A

 Shuffles

 1000

 N

94

%iles of the max LOD tests, one for each iteration:

95%ile 99%ile Max
2.09 2.91 4.51
Expwise 3.70 4.21 4.51

This table tells us that for the analysis of chromosome 5A influences on trait KBY95_2, the LOD score of 2.09 was greater than 95% of the maximum LODs obtained for any iterations; 2.91 greater than 99% of maximum LODs; and 4.51 the highest LOD obtained in any iteration. The Expwise or experimentwise thresholds were obtained on the basis of a 21-chromosome map; for example the alpha = 0.05 corresponding to the 95%ile became alpha = 0.05/21 = 0.0023. Thus the tabular LOD of 3.70 will have been the third highest maximum LOD in the 1000 maxima (since 0.002*1000 values were higher), and the 99%ile LOD of 4.21 the second highest.

With a population of 120 and chromosome 120 cM long, permutations execute at around 80 (on an old 7100/66 machine) to around 700 (on a G3) chromosome maps per second.

Permutation analysis in simple regression

This analysis is somewhat outmoded by interval-based methods but illustrates the empirical test of a simple-regression statistic. For the current trait, QGene randomly shuffles the marker data 100 or more times as specified, redoing a simple-regression scan across all markers with every shuffle and recording the maximum value of the F statistic. Finally it determines the 95th, 99th, and 100th percentile (%ile) of the F over the full set of shuffles, shows it in the plot, and records it in a file. The method is quite analogous to that used for interval permutation tests.

What QGene does QGene home

© 2000 J. C. Nelson. All rights reserved.