Studing the Performance of Intelegent Singular Value Decomposition Algorithm (SVD) in Imputation of Missing Genotypes

Document Type : Genetics & breeding

Authors

Bu-Ali Sina

Abstract

Introduction By implementing genomic selection, high accurate estimates of breeding values in newborn individuals could be obtained in the absence of phenotypic records. In genomic selection, selection decisions are based on genomic breeding values predicted from high-density SNP pannels. Dramatic advances in sequencing technologies are providing highly dimensional molecular marker information at low cost. Next generation sequencing protocols such as genotype by sequencing (GBS) technology have been suggested as an efficient and cost-effective genotyping method for genomic selection in cattle. It capable of providing acceptable marker density for genomic selection or genome-wide association studies at roughly one third of the cost of currently available genotyping technologies. However, polymorphic loci scored by GBS can contain a large proportion of missing data across samples because random fragments of the genome are sequenced at low depth, leading some loci to have zero coverage in some individuals. Most analyses require a complete dataset; therefore, marker imputation is a necessary step before GBS data can be used for most purposes such as genomic selection. Order of markers is unknown in GBS data. Therefore, an imputation method which does not require previous information about the order of the markers is needed for imputing GBS data. Nonparametric models from the machine-learning repository have been proposed as an alternative to deal with such situations. These models do not follow a particular parametric design. Several different machine-learning approaches are currently used for genotype imputation and it is important to assess the performance of diverse methodologies and identify the methods that can provide the greatest predictive accuracy in a given population. Singular value decomposition imputation (SVD is capable to impute missed markers in GBS data. The aim of this study was assessing the performance of intelligent SVD algorithm for imputation of missing genotypes.
Materials and Methods A genome consisted of one Morgan chromosome was simulated using the hypred package on which in different scenarios, respectively, 500, 1000, 1500, 2000, 2500 and 3000 SNPs with equal initial frequency of 0.5 were arrayed for 1000 individuals. Coding for each genotype with A1 and A2 alleles were 2 for A1A1, 0 for A2A2 and 1 for A1A2 or A2A1, respectively. Then, in the framework of genotyping by sequencing data (GBS), genotype information of 5%, 10%, 25%, 50%, %75 and 90% of SNPs were masked and then imputed with SVD algorithm. Imputation accuracy (r) was assessed by the percentage of genotypes imputed correctly (number of genotypes correctly imputed/total number of masked genotypes). The effect of number of genotyped individuals (1000 and 2000 individuals), number of genotyped SNPs (500, 1000, 1500, 2000, 2500 and 3000 SNP) and levels of minor allele frequency (MAF) (0.01, 0.05, 0.1, 0.2, 0.3 and 0.4) on imputation accuracy were also studied.
Results and discussion The SVD imputation accuracy was noticeable. So by increasing the percentage of masked markers up to 50%, SVD was imputed missing genotypes with accuracy equal to 80%. In the scenarios of 70% and 90% of missing genotypes, the accuracy of imputation decreased and was 70% and 48%, respectively. In parallel to increase in the size of the population from 1000 to 2000 individuals, the imputation performance of SVD was increased, especially in the scenarios of 75% and 90% of masked genotypes. In parallel to increase in the number of markers, the imputation accuracy (r) increased in such a way that with increasing the number of markers from 500 to 3000 SNP, the accuracy of imputation increased by almost %10. An inverse relationship was observed between MAF and r in a way that by increasing MAF from 0.01 to 0.40, the accuracy of imputation decreased by 8%. In other words, markers with lower MAF were imputed with higher accuracy.
Conclusion SVD performed well regarding genotype imputation for GBS platforms  in a way that missing data can be imputed with reasonable accuracy even if the level of missing data are high; up to 50%and even greater accuracies may result if number of individuals in the population is high and level of MAF of genotyped SNPs is low. Therefore, SVD can be recommended for genotype imputation in genome assisted evaluation.

Keywords


1. Berry, D. P., and J. F. Kearney. 2011. Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal, 5: 1162-1169.
2. Calus, M. P. L., A. C. Bouwman, J. M. Hickey, R. F., Veerkamp, and H. A. Mulder. 2014. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal, 21:1-11.
3. Cleveland, M. A., and J. M. Hickey. 2013. Practical implementation of cost-effective genomic selection in commercial pig breeding using imputation. Journal of Animal Science, 91: 3583-3592.
4. Daetwyler, H. D., M. P. L. Calus, R. Pong-Wong, G. de los Campos, and J. M. Hickey. 2013. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics, 193: 347-365.
5. Donato, M., S. O. Peters Mitchell, S. E. T. Hussain, and I. G. Imumorin. 2013. Genotyping-by-Sequencing (GBS): A novel, efficient and cost-effective genotyping method for cattle using next generation sequencing. PLOS ONE, 5: e62137
6. Elshire, R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, and K. Kawamoto. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE, 6:e19379.
7. Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowman, C. M. Reich, B. A. Mason, and M. E. Goddard. 2012. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95:4114-4129.
8. Goddard, M. E. 2009. Genomic selection: prediction of accuracy and maximization of long term response. Genetica, 136: 245-252.
9. Gorjanc, G., M. A. Cleveland, R. D. Huston, and J. M. Hickey. 2015. Potential of genotyping-by-sequencing for genomic selection in livestock populations. Genetics Selection Evolution, 47:12.
10. Heidaritabar, M., M. P. L. Calus, A. Vereijken, A. Martien, M. Groenen, and J. W. M. Bastiaansen. 2015. Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet. 16: 101.
11. Hickey, J. M., J. Crossa, R. Babu, and G. de los Campos. 2012. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Science, 52:654-663.
12. Li, Y., C. Willer, S. Sanna, and G. Abecasis. 2009. Genotype Imputation. Annual Review of Genomics and Human Genetics, 10: 387-406.
13. Lin, P., S. M. Hartz, Z. Zhang, S. F. Saccone, and J. Wang. 2010. A new statistic to evaluate imputation reliability. PLoS One, 5(3), e9697.
14. Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome wide dense marker maps. Genetics, 157: 1819-1829.
15. Neimann-Sorensen, A., and A. Robertson. 1961. The association between blood groups and several production characters in three Danish cattle breeds. Acta Agriculture Scandinavia, 11: 163-196.
16. Pei, Y. F., J. Li, L. Zhang, C. J. Papasian, and H. W. Deng. 2008. Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE, 3:e3551.
17. Perry, P. O. 2009. bcv: Cross-Validation for the SVD. R package version 1.0. Available at: http://CRAN.R-project.org/package=bcv/.18-Roshyara, N.B, and M. Scholz. 2015. Impact of genetic similarity on imputation accuracy. BMC Genetics,16:90
18. Roshyara, N. R., K. Horn, H. Kirsten, P. Ahner, and M. Scholz. 2016. Comparing performance of modern genotype imputation methods in different ethnicities. Scientific Reports, 6:34386.
19. Schaeffer, L. R. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 123: 218-223.
20. Schrooten, C., R. Dassonneville, V. Ducrocq, R. F. Brøndum, M. S. Lund, and J. Chen. 2014. Error rate for imputation from the Illumina BovineSNP50 chip to the Illumina Bovine HD chip. Genetics Selection Evolution, 46(1):10.
21. Su, G., R. F. Brøndum, P. Ma, B. Guldbrandtsen, G. P. Aamand, and M. S. Lund. 2012. Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. Journal of Dairy Science, 95: 4657-4665.
22. Technow, F. 2013. hypred: Simulation of genomic data in applied genetics. Available at: http://cran.r-project.org/web/packages/hypred/index.html.
23. Toosi, A., R. L. Fernando, and J. C. Dekkers. 2009. Genomic selection in admixed and crossbred populations. Journal of Animal Science, 88: 32-46.
24. Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics, 17:520-525.
25. Vereijken, A. L. J., G. A. A. Albers, and J. Visscher. 2010. Imputation of SNP genotypes in chicken using a reference panel with phased haplotypes. 10th World Conference of Genetics Applied on Livestock Production, 407, Germany.
26. Weigel, K. A., G. de los Campos, A. I. Vazquez, G. J. M. Rosa, and D. Gianola. 2010. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science, 93: 5423-5435.
27. Wellmann, R., S. Preuß, E. Tholen, J. Heinkel, K. Wimmers, and J. Bennewitz. 2013. Genomic selection using low density marker panels with application to a sire line in pigs. Genetics Selection Evolution, 45: 28.
28. Weng, Z., Z. Zhang, X. Ding, W. Fu, P. Ma, C. Wang, and Q. Zhang. 2012. Application of imputation methods to genomic selection in Chinese Holstein cattle. Journal of Animal Science and Biotechnology, 3:6.
29. Zhang, Z., Q. Zhang, and X. D. Ding. 2011. Advances in genomic selection in domestic animals. Chinese Science Bulletine, 56: 2655-2663.
CAPTCHA Image