Genome-wide prediction of traits with different genetic architecture through efficient variable selection

Valentin Wimmer, Christina Lehermeier, Theresa Albrecht, Hans-Jürgen Auinger, Yu Wang, Chris-Carolin Schön
Genetics 2013, 195 (2): 573-87
In genome-based prediction there is considerable uncertainty about the statistical model and method required to maximize prediction accuracy. For traits influenced by a small number of quantitative trait loci (QTL), predictions are expected to benefit from methods performing variable selection [e.g., BayesB or the least absolute shrinkage and selection operator (LASSO)] compared to methods distributing effects across the genome [ridge regression best linear unbiased prediction (RR-BLUP)]. We investigate the assumptions underlying successful variable selection by combining computer simulations with large-scale experimental data sets from rice (Oryza sativa L.), wheat (Triticum aestivum L.), and Arabidopsis thaliana (L.). We demonstrate that variable selection can be successful when the number of phenotyped individuals is much larger than the number of causal mutations contributing to the trait. We show that the sample size required for efficient variable selection increases dramatically with decreasing trait heritabilities and increasing extent of linkage disequilibrium (LD). We contrast and discuss contradictory results from simulation and experimental studies with respect to superiority of variable selection methods over RR-BLUP. Our results demonstrate that due to long-range LD, medium heritabilities, and small sample sizes, superiority of variable selection methods cannot be expected in plant breeding populations even for traits like FRIGIDA gene expression in Arabidopsis and flowering time in rice, assumed to be influenced by a few major QTL. We extend our conclusions to the analysis of whole-genome sequence data and infer upper bounds for the number of causal mutations which can be identified by LASSO. Our results have major impact on the choice of statistical method needed to make credible inferences about genetic architecture and prediction accuracy of complex traits.

Full Text Links

Find Full Text Links for this Article


You are not logged in. Sign Up or Log In to join the discussion.

Related Papers

Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"