Genomic Enabled Prediction Using Bayesian Artificial Neural Networks and Parametric Methods a Comparative Study

Document Type : Genetics & breeding

Authors

1 Department of Animal Sciences, Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran

2 Department of Animal Sciences , Faculty of Agriculture, Ferdowsi University of Mashhad, Mashhad, Iran.

Abstract

Introduction:
 In genomic selection, genetic values of individuals are predicted using genetic markers that are distributed all across the genome and are in linkage disequilibrium with quantitative trait locus. Different methods have been introduced to predict genomic breeding values. These methods take into account different assumptions. Non-parametric methods, including artificial neural networks, have fewer assumptions than parametric methods, and can apply nonlinear relationships in genomic predictions so, in theory these approaches are more robust against genetic architecture changes and are able to provide better predictions.
Materials and Methods:
 In current study, the prediction ability of Bayesian neural networks with different architectures (1 to 5 neurons in the hidden layer) and parametric methods (GBLUP, Bayes RR, Bayes A, Bayes B, Bayes C Bayes L) in four simulated genetic architectures and four real traits of mouse (six weeks weight, growth slope, body mass index and body length) were compared using the correlation coefficient between predicted and expected values, mean square error of prediction and computation time. All simulated genetic architectures were additive and the gene effects followed a normal distribution. The number of QTLs in the first and third genetic architecture was 50 and it was 500 for second and fourth genetic architecture. The heritability of the first and second genetic architectures was 0.3 and the heritability of the third and the fourth genetic architectures was 0.7. The real data consisted of 1,296 mice which were genotyped with 9,265 SNP markers.
Results and Discussion:
 The highest prediction accuracy of Bayesian neural networks were 0.640 (4 neuron in the hidden layer), 0.664 (4 neuron in the hidden layer), 0.800 (1 neuron in the hidden layer) and 0.810 (1 neuron in the hidden layer), and the highest prediction accuracy of parametric methods were 0.711(Bayes B), 0.685 (Bayes A), 0.903(Bayes B) and 0.836 (Bayes B) respectively for one to four simulated genetic architectures. These results showed the superiority of parametric methods to Bayesian neural networks in terms of prediction accuracy in genetic architectures with additive effects. In additive genetic architectures, the allelic effects of genetic variations are independent. In parametric models, these effects are assumed to be independent, therefore in additive genetic architectures can be expected that parametric methods are able to provide better predictions than nonparametric methods. 
The maximum predictive abilities of Bayesian neural networks to predict  six weeks weight, growth slope, body mass index and body length were 0.474 (1 neuron in the hidden layer), 0.349 (4 neuron in the hidden layer), 0.154 (1 neuron in the hidden layer) and 0.214 (4 neuron in the hidden layer). The predictive abilities of parametric methods to predict these traits were similar and equal to 0.477, 0.336, 0.170, and 0.221 in average. The results showed that the predictive abilities of Bayesian neural networks and parametric methods were similar on real data as the difference between the best predictive ability of Bayesian neural networks and parametric methods for Six weeks weight, growth slope and body length were less than 1%. The difference was slightly higher for the body mass index and equal to 1.8%.
The mean squared error of prediction of Bayesian Neural Networks was slightly less than parametric methods in the simulated genetic architectures. The results indicate a slight superiority of Bayesian neural networks compared to parametric methods in terms of mean squared error of prediction as an indicator of overall fit. The mean square prediction error is an appropriate criterion for evaluating the prediction performance of different methods because it contains both accuracy and bias. Considering table (3) and table (5), it can be concluded that the prediction of the Bayesian neural network are less accurate but more unbiased than the parametric methods. This could be due to more applied penalty in parametric methods compared to Bayesian neural networks, which can lead to an increase in the average mean squared error of prediction. In real data, the mean squared error of prediction of the Bayesian neural networks and parametric methods were similar.
The computation time of Bayesian neural networks was increased with an increase in the number of neurons in the hidden layer. The computation time of the parametric methods was the same with the exception of GBLUP. The GBLUP method took more computation time. The computation time of neural the networks with 1 to 2 neurons in the hidden layer were less than GBLUP. Genomic prediction using Bayesian Neural Networks with a greater number of neurons is really challenging, and improving their performance in terms of computational cost is necessary before applying them in genomic selection.
Conclusion:
 Although parametric methods had better predictive accuracy and predictive ability due to the additive genetic architecture of the studied traits, it can be concluded that Bayesian neural networks are powerful tools in genomic enabled prediction that can predict genomic breeding values with acceptable accuracy.
The genomic prediction ability of the neural networks depends on target traits, the animal species, and neural network architecture. Before using Bayesian neural networks in genomic prediction, it is better to compare the results with parametric methods. It is also necessary to improve the computation time of the Bayesian neural networks with a greater number of neurons in hidden layer before applying them in real application of genomic selection. 

Keywords


1. Calus, M., A. De Roos, and R. Veerkamp. 2008. Accuracy of genomic selection using different methods to define haplotypes. Genetics, 178(1), 553-561.
2. Chatterjee, N., B. Wheeler, J. Sampson, P. Hartge, S. J. Chanock, and J.H. Park. 2013. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature genetics, 45(4), 400.
3. Clark, S. A., J. M. Hickey, and J. H. Van der Werf. 2011. Different models of genetic variation and their effect on genomic evaluation. Genetics Selection Evolution, 43(1), 18.
4. Colombani, C., P. Croiseau, S. Fritz, F. Guillaume, A. Legarra V. Ducrocq, and C. Robert-Granie. 2012. A comparison of partial least squares (PLS) and sparse PLS regressions in genomic selection in French dairy cattle. Journal of dairy science, 95(4), 2120-2131.
5. Crossa, J., P. Perez, J. Hickey, J. Burgueño, L. Ornella J. Ceron-Rojas, X. Zhang, S. Dreisigacker, R.Babu, Y. Li, and D. Bonnett. 2014. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity, 112(1), 48.
6. Daetwyler, H. D., B. Villanueva, P. Bijma, and J. A. Woolliams. 2007. Inbreeding in genome‐wide selection. Journal of Animal Breeding and Genetics, 124(6), 369-376.
7. Daetwyler, H. D., R. Pong-Wong, B. Villanueva, and J. A. Woolliams. 2010. The impact of genetic architecture on genome-wide evaluation methods. Genetics, 185: 1021-1031.
8. De los Campos, G., and P. Perez-Rodriguez. "BGLR: Bayesian generalized linear regression." R package version 1.0.5 .2016.
9. De Los Campos, G., D. Gianola, G. J. Rosa, K. A. Weigel, and J. Crossa. 2010. Semi-parametric genomic-enabled prediction of genetic values using reproducing kernel Hilbert spaces methods. Genetics Research, 92(4), 295-308.
10. Ehret, A., D. Hochstuhl, D. Gianola, and G. Thaller. 2015. Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle. Genetics Selection Evolution, 47(1), 22.
11. Eichler, E. E., J. Flint, G. Gibson, A. Kong, S. M. Leal, J.H. Moore, and J.H.Nadeau. 2010. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Reviews Genetics, 11(6), 446.
12. Ghafouri-Kesbi, F., G. Rahimi-Mianji, M. Honarvar, and A. Nejati-Javaremi. 2017. Predictive ability of Random Forests, Boosting, Support Vector Machines and Genomic Best Linear Unbiased Prediction in different scenarios of genomic evaluation. Animal Production Science, 57(2), 229-236.
13. Gianola, D., and J. B. van Kaam. 2008. Reproducing kernel Hilbert spaces regression methods for genomic assisted prediction of quantitative traits. Genetics, 178(4), 2289-2303.
14. Gianola, D., H. Okut, K. A. Weigel, and G. J. Rosa. 2011. Predicting complex quantitative traits with Bayesian neural networks: a case study with Jersey cows and wheat. BMC genetics, 12(1), 87.
15. Gianola, D., R. L. Fernando, and A. Stella. 2006. Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics, 173: 1761-1776.
16. Goddard, M. 2009. Genomic selection: prediction of accuracy and maximisation of long term response. Genetica, 136(2), 245-257.
17. Goddard, M. E., B. J. Hayes, and T. H. Meuwissen. 2010. Genomic selection in livestock populations. Genetics research, 92(5-6), 413-421.
18. Gonzalez-Camacho, J., G. de Los Campos, P. Perez, D. Gianola, J.E. Cairns, G. Mahuku, R. Babu, and J. Crossa, 2012. Genome-enabled prediction of genetic values using radial basis function neural networks. Theoretical and Applied Genetics, 125(4), 759-771.
19. Hayes, B. J., P. J. Bowman, A. C. Chamberlain, K. Verbyla, and M. E. Goddard. 2009. Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genetics Selection Evolution, 41(1), 51.
20. Heslot, N., H. P. Yang, M. E. Sorrells, and J. L. Jannink. 2012. Genomic selection in plant breeding: a comparison of models. Crop Science, 52(1), 146-160.
21. Howard, R., A. L. Carriquiry, and W. D. Beavis. 2014. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures. G3: Genes, Genomes, Genetics, g3-114.
22. Lampinen, J., and A. Vehtari. 2001. Bayesian approach for neural networks—review and case studies. Neural networks, 14(3), 257-274.
23. Legarra, A., C. Robert-Granie, P. Croiseau, F. Guillaume, and S. Fritz. 2011. Improved Lasso for genomic selection. Genetics research, 93(1), 77-87.
24. Lin, Z., N. O. Cogan, L. W. Pembleton, G. C. Spangenberg, J. W. Forster, B.J. Hayes, and H.D. Daetwyler. 2016. Genetic gain and inbreeding from genomic selection in a simulated commercial breeding program for perennial ryegrass. The plant genome, 9(1).
25. MacKay, D. J., and D. J. Mac Kay. 2003. Information theory, inference and learning algorithms. Cambridge university press.
26. Martini, J. W., N. Gao, D. F. Cardoso, V. Wimmer, M. Erbe, R.J. Cantet, and H. Simianer. 2017. Genomic prediction with epistasis models: on the marker-coding-dependent performance of the extended GBLUP and properties of the categorical epistasis model (CE). BMC bioinformatics, 18(1), 3.
27. Meuwissen, T. H., B. J. Hayes, and M. E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics, 157(4), 1819-1829.
28. Meuwissen, T., and M. Goddard, 2010. The use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data. Genetics, 185: 1441-1449.
29. Mohammadi, Y., M. M. Shariati, S. Zerehdaran, M. Razmkabir, M. B. Sayyadnejad, and M. B. Zandi. 2016. The accuracy of genomic breeding value for production trait in Iranian Holstein Dairy Cattle using parametric and non-parametric methods. Animal Production, 18(1): 1-11 (In Persian).
30. Moser, G., S. H. Lee, B. J. Hayes, M. E. Goddard, N. R. Wray, and P.M.Visscher. 2015. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS genetics, 11(4), e1004969.
31. Neves, H. H., R. Carvalheiro, A. M. P. O’brien, Y. T. Utsunomiya, A. S. Do Carmo, F.S. Schenkel, J. Sölkner, J.C. McEwan, C.P. Van Tassell, J.B. Cole, and M.V. Da Silva. 2014. Accuracy of genomic predictions in Bos indicus (Nellore) cattle. Genetics Selection Evolution, 46(1), 17.
32. Ogutu, J. O., H.-P. Piepho, and T. Schulz-Streeck. 2011. A comparison of random forests, boosting and support vector machines for genomic selection. In BMC proceedings (Vol. 5, No. 3, p. S11). BioMed Central.
33. Okut, H., X.-L. Wu, G. J. Rosa, S. Bauck, B. W. Woodward, R.D.Schnabel, J.F. Taylor, and D.Gianola. 2013. Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models. Genetics Selection Evolution, 45(1), 34.
34. Okut, H., D. Gianola, G. J. Rosa, and K. A. Weigel. 2011. Prediction of body mass index in mice using dense molecular markers and a regularized neural network. Genetics Research, 93(3), 189-201.
35. Perez, P., and G. de Los Campos. 2014. Genome-Wide Regression and Prediction with the BGLR Statistical Package. Genetics, 198(2), 483–495.
36. Perez-Rodriguez, P., D. Gianola, J. M. Gonzalez-Camacho, J. Crossa, Y. Manès, and S. Dreisigacker. 2012. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3: Genes, Genomes, Genetics, 2(12), 1595-1605.
37. Shaneh, A., and G. Butler. 2006. Bayesian learning for feed-forward neural network with application to proteomic data: the glycosylation sites detection of the epidermal growth factor-like proteins associated with cancer as a case study. In Conference of the Canadian Society for Computational Studies of Intelligence (pp. 110-121). Springer, Berlin, Heidelberg.
38. Tusell, L., P. Perez-Rodriguez, S. Forni, X. L. Wu, and D. Gianola. 2013. Genome-enabled methods for predicting litter size in pigs: a comparison. Animal, 7(11), 1739-1749.
39. Wolc, A., J. Arango, P. Settar, J. E. Fulton, N. P. O'Sullivan, R. Preisinger, D. Habier, R. Fernando, D.J. Garrick, and J.C. Dekkers. 2011. Persistence of accuracy of genomic estimated breeding values over generations in layer chickens. Genetics Selection Evolution, 43(1), 23.
40. Xu, M., G. Zeng, X. Xu, G. Huang, R. Jiang, and W. Sun. 2006. Application of Bayesian regularized BP neural network model for trend analysis, acidity and chemical composition of precipitation in North Carolina. Water, Air, and Soil Pollution, 172(1-4), 167-184.
41. Yang, J., B. Benyamin, B. P. McEvoy, S. Gordon, A. K. Henders, D.R. Nyholt, P.A. Madden, A.C. Heath, N.G. Martin, G.W. Montgomery, and M.E. Goddard. 2010. Common SNPs explain a large proportion of the heritability for human height. Nature genetics, 42(7), 565.
CAPTCHA Image