بررسی عملکرد الگوریتم هوشمند تجزیه مقدار تکین (SVD) در بازیابی ژنوتیپ‌های از دست رفته در سناریوهای مختلف از تعداد نشانگر، اندازه جمعیت و فراوانی آلل نادر

نوع مقاله : علمی پژوهشی- ژنتیک و اصلاح دام و طیور

نویسندگان

دانشگاه بوعلی سینا- همدان

چکیده

هدف از این تحقیق بررسی عملکرد الگوریتم هوشمند تجزیه مقدار تکین31(SVD) در بازیابی ژنوتیپ‌های از دست رفته بود. به این منظور، ژنومی متشکل از 1 کروموزوم به طول یک مورگان که بر روی آن در سناریوهای مختلف به‌ترتیب 500، 1000، 1500، 2000، 2500 و 3000 نشانگر تک نوکلئوتیدی دو آللی (SNP) با فراوانی اولیه یکسان 5/0 توزیع شده بود برای 1000 فرد شبیه‌سازی شد. در ادامه جهت ایجاد فایل اطلاعات در چهارچوب اطلاعات”تعیین ژنوتیپ با توالی‌یابی“42(GBS) اطلاعات ژنوتیپی به ‌ترتیب 5%، 10%، 25%، 50%، 75% و 90% از SNPهای افراد از ماتریس ژنوتیپی حذف شده و مجدداً توسط روش SVD بازیابی شدند. درصد ژنوتیپ‌های به‌درستی بازیابی شده (نسبت تعداد ژنوتیپ‌های به درستی بازیابی شده به کل ژنوتیپ‌های از دست رفته) به‌عنوان شاخصی از صحت بازیابی ژنوتیپ (r) در سناریوهای مختلف مورد استفاده قرار گرفت. صحت بازیابی ژنوتیپ‌های از دست رفته با استفاده از روش SVD قابل توجه بود به طوری که با افزایش درصد ژنوتیپ‌های از دست رفته تا 50%، SVD با صحتی در حدود 80% ژنوتیپ‌های از دست رفته را بازیابی نمود. در سناریوهای 75% و 90% ژنوتیپ از دست رفته صحت بازیابی ژنوتیپ کاهش یافته و به ترتیب 70% و 48% بود. در شرایط برابر از تعداد نشانگر و درصد ژنوتیپ از دست رفته، با افزایش تعداد افراد حاضر در جمعیت از 1000 به 2000 فرد، توانایی بازیابی ژنوتیپ توسط روش SVD افزایش یافت. در یک درصد ثابت از ژنوتیپ‌های از دست رفته، با افزایش تعداد نشانگر صحت بازیابی ژنوتیپ افزایش یافت به نحوی که با افزایش تعداد نشانگر از 500 به 3000 نشانگر، حدوداً 10% به صحت بازیابی ژنوتیپ افزوده شد. یک رابطه معکوس بین میزان فراوانی آلل نادر (MAF) و r مشاهده شد به گونه‌ای که با افزایش MAF از 01/0 به 40/0 صحت بازیابی ژنوتیپ به میزان 8 درصد کاهش یافت. به طور کلی نتایج این تحقیق نشان داد که الگوریتم SVD با صحت بالایی می‌تواند ژنوتیپ‌های از دست رفته را بازیابی کند به ‌ویژه زمانی که درصد ژنوتیپ‌های از دست رفته کم باشد، اندازه جمعیت بزرگ باشد و فراوانی آلل نادر نیز پایین باشد.

کلیدواژه‌ها


عنوان مقاله [English]

Studing the Performance of Intelegent Singular Value Decomposition Algorithm (SVD) in Imputation of Missing Genotypes

نویسندگان [English]

  • farhad ghafouri-kesbi
  • Ali Goudarz Talleh Jerdi
Bu-Ali Sina
چکیده [English]

Introduction By implementing genomic selection, high accurate estimates of breeding values in newborn individuals could be obtained in the absence of phenotypic records. In genomic selection, selection decisions are based on genomic breeding values predicted from high-density SNP pannels. Dramatic advances in sequencing technologies are providing highly dimensional molecular marker information at low cost. Next generation sequencing protocols such as genotype by sequencing (GBS) technology have been suggested as an efficient and cost-effective genotyping method for genomic selection in cattle. It capable of providing acceptable marker density for genomic selection or genome-wide association studies at roughly one third of the cost of currently available genotyping technologies. However, polymorphic loci scored by GBS can contain a large proportion of missing data across samples because random fragments of the genome are sequenced at low depth, leading some loci to have zero coverage in some individuals. Most analyses require a complete dataset; therefore, marker imputation is a necessary step before GBS data can be used for most purposes such as genomic selection. Order of markers is unknown in GBS data. Therefore, an imputation method which does not require previous information about the order of the markers is needed for imputing GBS data. Nonparametric models from the machine-learning repository have been proposed as an alternative to deal with such situations. These models do not follow a particular parametric design. Several different machine-learning approaches are currently used for genotype imputation and it is important to assess the performance of diverse methodologies and identify the methods that can provide the greatest predictive accuracy in a given population. Singular value decomposition imputation (SVD is capable to impute missed markers in GBS data. The aim of this study was assessing the performance of intelligent SVD algorithm for imputation of missing genotypes.
Materials and Methods A genome consisted of one Morgan chromosome was simulated using the hypred package on which in different scenarios, respectively, 500, 1000, 1500, 2000, 2500 and 3000 SNPs with equal initial frequency of 0.5 were arrayed for 1000 individuals. Coding for each genotype with A1 and A2 alleles were 2 for A1A1, 0 for A2A2 and 1 for A1A2 or A2A1, respectively. Then, in the framework of genotyping by sequencing data (GBS), genotype information of 5%, 10%, 25%, 50%, %75 and 90% of SNPs were masked and then imputed with SVD algorithm. Imputation accuracy (r) was assessed by the percentage of genotypes imputed correctly (number of genotypes correctly imputed/total number of masked genotypes). The effect of number of genotyped individuals (1000 and 2000 individuals), number of genotyped SNPs (500, 1000, 1500, 2000, 2500 and 3000 SNP) and levels of minor allele frequency (MAF) (0.01, 0.05, 0.1, 0.2, 0.3 and 0.4) on imputation accuracy were also studied.
Results and discussion The SVD imputation accuracy was noticeable. So by increasing the percentage of masked markers up to 50%, SVD was imputed missing genotypes with accuracy equal to 80%. In the scenarios of 70% and 90% of missing genotypes, the accuracy of imputation decreased and was 70% and 48%, respectively. In parallel to increase in the size of the population from 1000 to 2000 individuals, the imputation performance of SVD was increased, especially in the scenarios of 75% and 90% of masked genotypes. In parallel to increase in the number of markers, the imputation accuracy (r) increased in such a way that with increasing the number of markers from 500 to 3000 SNP, the accuracy of imputation increased by almost %10. An inverse relationship was observed between MAF and r in a way that by increasing MAF from 0.01 to 0.40, the accuracy of imputation decreased by 8%. In other words, markers with lower MAF were imputed with higher accuracy.
Conclusion SVD performed well regarding genotype imputation for GBS platforms  in a way that missing data can be imputed with reasonable accuracy even if the level of missing data are high; up to 50%and even greater accuracies may result if number of individuals in the population is high and level of MAF of genotyped SNPs is low. Therefore, SVD can be recommended for genotype imputation in genome assisted evaluation.

کلیدواژه‌ها [English]

  • Genotype imputation
  • SVDI algorithm
  • Chromosome
  • SNP
1. Berry, D. P., and J. F. Kearney. 2011. Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal, 5: 1162-1169.
2. Calus, M. P. L., A. C. Bouwman, J. M. Hickey, R. F., Veerkamp, and H. A. Mulder. 2014. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal, 21:1-11.
3. Cleveland, M. A., and J. M. Hickey. 2013. Practical implementation of cost-effective genomic selection in commercial pig breeding using imputation. Journal of Animal Science, 91: 3583-3592.
4. Daetwyler, H. D., M. P. L. Calus, R. Pong-Wong, G. de los Campos, and J. M. Hickey. 2013. Genomic prediction in animals and plants: simulation of data, validation, reporting, and benchmarking. Genetics, 193: 347-365.
5. Donato, M., S. O. Peters Mitchell, S. E. T. Hussain, and I. G. Imumorin. 2013. Genotyping-by-Sequencing (GBS): A novel, efficient and cost-effective genotyping method for cattle using next generation sequencing. PLOS ONE, 5: e62137
6. Elshire, R. J., J. C. Glaubitz, Q. Sun, J. A. Poland, and K. Kawamoto. 2011. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE, 6:e19379.
7. Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowman, C. M. Reich, B. A. Mason, and M. E. Goddard. 2012. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. Journal of Dairy Science, 95:4114-4129.
8. Goddard, M. E. 2009. Genomic selection: prediction of accuracy and maximization of long term response. Genetica, 136: 245-252.
9. Gorjanc, G., M. A. Cleveland, R. D. Huston, and J. M. Hickey. 2015. Potential of genotyping-by-sequencing for genomic selection in livestock populations. Genetics Selection Evolution, 47:12.
10. Heidaritabar, M., M. P. L. Calus, A. Vereijken, A. Martien, M. Groenen, and J. W. M. Bastiaansen. 2015. Accuracy of imputation using the most common sires as reference population in layer chickens. BMC Genet. 16: 101.
11. Hickey, J. M., J. Crossa, R. Babu, and G. de los Campos. 2012. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs. Crop Science, 52:654-663.
12. Li, Y., C. Willer, S. Sanna, and G. Abecasis. 2009. Genotype Imputation. Annual Review of Genomics and Human Genetics, 10: 387-406.
13. Lin, P., S. M. Hartz, Z. Zhang, S. F. Saccone, and J. Wang. 2010. A new statistic to evaluate imputation reliability. PLoS One, 5(3), e9697.
14. Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome wide dense marker maps. Genetics, 157: 1819-1829.
15. Neimann-Sorensen, A., and A. Robertson. 1961. The association between blood groups and several production characters in three Danish cattle breeds. Acta Agriculture Scandinavia, 11: 163-196.
16. Pei, Y. F., J. Li, L. Zhang, C. J. Papasian, and H. W. Deng. 2008. Analyses and comparison of accuracy of different genotype imputation methods. PLoS ONE, 3:e3551.
17. Perry, P. O. 2009. bcv: Cross-Validation for the SVD. R package version 1.0. Available at: http://CRAN.R-project.org/package=bcv/.18-Roshyara, N.B, and M. Scholz. 2015. Impact of genetic similarity on imputation accuracy. BMC Genetics,16:90
18. Roshyara, N. R., K. Horn, H. Kirsten, P. Ahner, and M. Scholz. 2016. Comparing performance of modern genotype imputation methods in different ethnicities. Scientific Reports, 6:34386.
19. Schaeffer, L. R. 2006. Strategy for applying genome-wide selection in dairy cattle. Journal of Animal Breeding and Genetics, 123: 218-223.
20. Schrooten, C., R. Dassonneville, V. Ducrocq, R. F. Brøndum, M. S. Lund, and J. Chen. 2014. Error rate for imputation from the Illumina BovineSNP50 chip to the Illumina Bovine HD chip. Genetics Selection Evolution, 46(1):10.
21. Su, G., R. F. Brøndum, P. Ma, B. Guldbrandtsen, G. P. Aamand, and M. S. Lund. 2012. Comparison of genomic predictions using medium-density (~54,000) and high-density (~777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. Journal of Dairy Science, 95: 4657-4665.
22. Technow, F. 2013. hypred: Simulation of genomic data in applied genetics. Available at: http://cran.r-project.org/web/packages/hypred/index.html.
23. Toosi, A., R. L. Fernando, and J. C. Dekkers. 2009. Genomic selection in admixed and crossbred populations. Journal of Animal Science, 88: 32-46.
24. Troyanskaya, O., M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R. B. Altman. 2001. Missing value estimation methods for DNA microarrays. Bioinformatics, 17:520-525.
25. Vereijken, A. L. J., G. A. A. Albers, and J. Visscher. 2010. Imputation of SNP genotypes in chicken using a reference panel with phased haplotypes. 10th World Conference of Genetics Applied on Livestock Production, 407, Germany.
26. Weigel, K. A., G. de los Campos, A. I. Vazquez, G. J. M. Rosa, and D. Gianola. 2010. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. Journal of Dairy Science, 93: 5423-5435.
27. Wellmann, R., S. Preuß, E. Tholen, J. Heinkel, K. Wimmers, and J. Bennewitz. 2013. Genomic selection using low density marker panels with application to a sire line in pigs. Genetics Selection Evolution, 45: 28.
28. Weng, Z., Z. Zhang, X. Ding, W. Fu, P. Ma, C. Wang, and Q. Zhang. 2012. Application of imputation methods to genomic selection in Chinese Holstein cattle. Journal of Animal Science and Biotechnology, 3:6.
29. Zhang, Z., Q. Zhang, and X. D. Ding. 2011. Advances in genomic selection in domestic animals. Chinese Science Bulletine, 56: 2655-2663.