بررسی اثر پرسپترون چند لایه در صحت انتخاب ژن های ریز RNA کرم ابریشم (Bombyx mori)

سیددخت, عاطفه; رحمانی نیا, جواد

doi:10.22067/ijasr.2020.38276.0

بررسی اثر پرسپترون چند لایه در صحت انتخاب ژن های ریز RNA کرم ابریشم (Bombyx mori)

نوع مقاله : علمی پژوهشی- ژنتیک و اصلاح دام و طیور

نویسندگان

¹ بخش تحقیقات علوم دامی، مرکز تحقیقات و آموزش کشاورزی و منابع طبیعی استان خراسان رضوی، سازمان تحقیقات، آموزش و ترویج کشاورزی، مشهد، ایران.

² مؤسسه تحقیقات علوم دامی کشور، سازمان تحقیقات، آموزش و ترویج کشاورزی، کرج، ایران

10.22067/ijasr.2020.38276.0

چکیده

ریز RNA ها خانواده ای گسترده از مولکول هایRNA کوتاه غیر کد کننده پروتئینی (ncRNA) و دارای وظایفی مهم در تنظیم فرآیندهای رشد در گیاهان و حیوانات هستند. مطالعات اندکی در ارتباط با ریز RNA های کرم ابریشم که از نظر اقتصادی بسیار مهم نیز هستند، با تمرکز بر شناسایی، آنالیز بیان و پیش بینی عملکرد انجام شده است. به طور کلی توالی ریز RNA ها در سرتاسر گونه ها بسیار محافظت شده هستند و از ساختار ساقه-حلقه اولیه در هسته که از ویژگی های بسیار مهم ریز RNA ها است، تولید می شوند. ریز RNA ها از مهمترین عوامل تنظیمی دخیل در سطوح پس از رونویسی پس از بیان ژن هستند که در تنظیم تعداد زیادی از فرآیندهای فیزیولوژیکی مانند رشد و نمو، متابولیسم و وقوع بیماری ها مشارکت می کنند. با اینکه هزاران ریز RNA در گونه های مختلف شناسایی شده اند، تعداد خیلی زیادی هنوز هم ناشناخته باقی مانده است. بنابراین کشف ژن های جدید ریز RNA یک گام مهم برای درک ریز RNA هایی است که مکانیسم های تنظیم پس از رونویسی را واسطه گری می کنند. روش های بیولوژیکی برای شناسایی ژن های ریز RNA ممکن است در شناسایی تشخیص ریز RNA های نادر محدودیت داشته باشند و بیشتر محدود به بافت های خاص و مراحل رشد و نموی ارگانیسم تحت آزمایش می شوند. این محدودیت ها منجر به پیشرفت روش های محاسباتی پیشرفته برای شناسایی ریز RNA های احتمالی جدید شده است. استفاده از روش های محاسباتی باعث افزایش دقت در شناسایی ریز RNA های کرم ابریشم خواهد شد. در این پژوهش، انواع مدل های محاسباتی برای شناسایی توالی های ریز RNA استفاده شد. با استفاده از داده های مناسب و استخراج ویژگی های بیولوژیکی مؤثر، عملکرد این روش ها ارزیابی شد. در مقایسه با سایر مدل های استفاده شده در این تحقیق، مدل پرسپترون چند لایه با بیشترین مقادیر دقت، معیار F و ضریب همبستگی متیو به عنوان روشی مناسب جهت پیش بینی توالی های ریز RNA در کرم ابریشم معرفی شد.

کلیدواژه‌ها

20.1001.1.20083106.1400.13.4.11.1

عنوان مقاله [English]

A survey on effect of multilayer perceptron on the accuracy of selection of silkworm (Bombyx mori) microRNA genes

نویسندگان [English]

Atefe Seyeddokht ¹
Javad Rahmaninia ²

¹ Animal Science Research Department, Khorasan Razavi Agricultural and Natural Resources Research and Education Center, AREEO, Mashhad, Iran

² 2- Animal Science Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran

چکیده [English]

Introduction MicroRNAs (miRNAs) constitute a large family of non-protein-coding small RNA (ncRNA) molecules and have important roles in the regulation of both plant and animal developmental procedures. Generally, sequences of miRNA demonstrate high sequence conservation across animals and are produced from the primary stem-loop structure in the nucleus, which is an important feature of miRNAs. MiRNAs are one of the most important regulatory factors involved in post-transcriptional levels of gene expression that contribute to the modulation of a large number of physiological processes such as development, metabolism and disease occurrence. To date, A few studies related to miRNAs of the economically important silkworm, Bombyx mori, have been carried out, focusing on detection, expression study, and prediction of function. Machine learning approaches are crucial for prediction success. These methods can solve classification problem.
Materials and Method Although hundreds of miRNAs have been detected in different animals, a lot of them are still unknown. Then, finding of novel miRNA genes is an essential step for understanding miRNA intervened post transcriptional regulation processes. It appears that biological methods to recognize miRNA genes might be inadequate in their capacity to identify uncommon miRNAs and are further limited to the tissues surveyed and the developmental phase of the animal under experiment. These restrictions have led to the development of new computational methods attempting to detect potential miRNAs. Experimentally verified miRNA sequences in miRBase release 22.0 were extracted for inclusion in the positive data set. In the miRBase, the reported secondary structures were predicted by a collection of RNA folding software packages. Consequently, in this study for uniformity, all miRNA secondary structures analyzed using RNAfold packages. The major step for machine learning approaches is the selection of a suitable negative dataset. It is important for a well-trained classifier. If the sequences are too artificial, e.g. completely random sequences, then there is a risk that the classifiers will not be well trained to differentiate between different categories of real biological sequences. Conversely, if the negative dataset is too similar to the positive dataset, the classifiers will be unable to find a way to adequately differentiate between these two data sets. We investigated several different types of negative sequences and finally selected negative sequences which made the best distinction with positive data set. The positive training dataset for our classifier development composed of known silkworm pre miRNAs, while the negative training dataset composed of other ncRNA sequences. Our feature set composed of various features and selecting the most discriminative set of features would increase the performance, efficiency and comprehensibility of a classifier method by reducing its complexity.
Results and Discussion Secondary structural patterns of pre miRNA used in this study such as the intramolecular base pairing of pre miRNA is an important beneficial feature for miRNAs classification. The selective powers of the two different classes of miRNAs secondary structural conformation (dot-bracket notation) were analyzed. Secondary structural feature of miRNA such as Minimum Free Energy, Watson-crick base pairing (AU, GC), Wobble base pairing (G-U) and unpaired bases (A, G, C, U) is analyzed by different algorithms. Here we could successfully solve classification problem by developing an effective classification system using machine learning techniques. Our approach includes introducing more representative datasets, extracting new effective biological features, and comprehensive evaluating of classification performance through these methods via cross-validation. Performance of different algorithms was measured by the total number of true negatives (TN), true positives (TP), false positives (FP), false negatives (FN), and accuracy (Q). In order to evaluate the efficiency of various methods developed in this study, various parameters like F-measure, Matthews correlation coefficient (MCC), accuracy (Q) and, ROC area were calculated. Performance measurement of various models tested with data from miRBase in release 22 in ten-fold cross validation. Multilayer Perceptron model could predict pre miRNAs from non-coding sequences that can be important for detecting the true pre miRNAs in genomic sequences. Consequently a new method on miRNA prediction model could be favorable to understand the characteristics miRNA associated with miRNA biogenesis.
Conclusion Research on miRNA represents important progress in the study of ncRNAs and may provide further information on understanding of RNA regulation networks. Practical research on silkworm microRNAs has shown that microRNAs can have significant effects on the underlying mechanisms of silkworm growth processes. In addition to the research that has been done so far, it provides the basis for advances in improving our understanding of RNA regulatory networks and the molecular mechanisms involved in gene expression patterns during different stages of silkworm life. Due to insufficient computational research in the field of silkworm microRNAs, further research on the microRNAs of this species represents an important advance in the study of noncoding RNAs, which can provide further information on the activity of noncoding RNAs. Machine learning algorithms will help the researcher discover the uncover miRNA that many researchers were not able to explore.

کلیدواژه‌ها [English]

Computational Methods
MicroRNA
Regulatory Factors
Silkworm

مراجع

Abbasi, V., M. R. Nasiri, and A. Javadmanesh. 2018. Prediction and In Silico Validation of Micro-RNAs in Different Tissues Originated from Ovine Chromosome 20. Iranian Journal of Animal Science Research, 11(2): 233-245. (In Persian).
Agarwal, S., C. Vaz, A. Bhattacharya, and A. Srinivasan. 2010. Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM). BMC Bioinformatics, 11(1): S29.
Arowolo, M. O., M. Adebiyi, A. Adebiyi, and O. Okesola. 2020. PCA Model For RNA-Seq Malaria Vector Data Classification Using KNN And Decision Tree Algorithm. International Conference in Mathematics, Computer Engineering and Computer Science (ICMCECS), 1–8.
Bar, M., S. K. Wyman, B. R. Fritz, J. Qi, K. S. Garg, R. K. Parkin, E. M. Kroh, Bendoraite, P. S. Mitchell, and A. M. Nelson. 2008. MicroRNA discovery and profiling in human embryonic stem cells by deep sequencing of small RNA libraries. Stem Cells, 26(10): 2496–2505.
Ben-Hur, A., and J. Weston. 2010. A user’s guide to support vector machines. In Data mining techniques for the life sciences Springer, Chapter 13, pages 223–239.
Bentwich, I., A. Avniel, Y. Karov, R. Aharonov, S. Gilad, O. Barad, Barzilai, P. Einat, U. Einav, E. Meiri, E. Sharon, Y. Spector, and Z. Bentwich. 2005. Identification of hundreds of conserved and nonconserved human microRNAs. Nature Genetics, 37(7): 766–770.
Bhaskar, H., D. C. Hoyle, and S. Singh. 2006. Machine learning in bioinformatics: A brief survey and recommendations for practitioners. Computers in Biology and Medicine, 36(10): 1104–1125.
Cao, J., C. Tong, X. Wu, J. Lv, Z. Yang, and Y. Jin. 2008. Identification of conserved microRNAs in Bombyx mori (silkworm) and regulation of fibroin L chain production by microRNAs in heterologous system. Insect Biochemistry and Molecular Biology, 38(12): 1066–1071.
Cordero, J., V. Menkovski, and J. Allmer. 2019. Detection of pre-microRNA with Convolutional Neural Networks. bioRxiv, Europe PMC, 1-12.
Ding, J., S. Zhou, and J. Guan. 2010. MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features. BMC Bioinformatics, 11 Suppl 1(Suppl 11): S11.
Do, B. T., V. Golkov, G. E. Gürel, and D. Cremers. 2018. Precursor microRNA Identification Using Deep Convolutional Neural Networks. BioRxiv, 414656.
Fu, X., W. Zhu, L. Cai, B. Liao, L. Peng, Y.Chen, and J. Yang. 2019. Improved pre-miRNAs identification through mutual information of pre-miRNA sequences and structures. Frontiers in Genetics, 10(FEB): 1–12.
He, P., Z. Nie, J. Chen, Z. Lv, Q. Sheng, S. Zhou, X. Gao, L. Kong, and X. Wu. 2008. Identification and characteristics of microRNAs from Bombyx mori. BMC Genomics, 9(1): 248.
Huang, Y., Q. Zou, S. Tang, L. Wang, and X. Shen. 2010. Computational identification and characteristics of novel microRNAs from the silkworm (Bombyx mori L.). Molecular Biology Reports. 37: 3171–3176.
Jabbar, M. A., and S. Samreen. 2016. Heart disease prediction system based on hidden naïve bayes classifier. International Conference on Circuits, Controls, Communications and Computing (I4C): 1–5.
Jiang, P., H. Wu, W. Wang, W.Ma, X. Sun, and Z. Lu. 2007. MiPred: Classification of real and pseudo microRNA precursors using random forest prediction model with combined features. Nucleic Acids Research, 35(SUPPL.2): W339-W344.
Kadri, S., V. Hinman, and P. V. Benos. 2009. HHMMiR: efficient de novo prediction of microRNAs using hierarchical hidden Markov models. BMC Bioinformatics, 10(Suppl 1): S35.
Kozomara, A., M. Birgaoanu, and S. Griffiths-Jones. 2018. miRBase: from microRNA sequences to function. Nucleic Acids Research, 47(D1): D155–D162.
Lai, E. C., P. Tomancak, R. W. Williams, and G. M. Rubin. 2003. Computational identification of Drosophila microRNA genes. Genome Biology, 4(7): R42.
Larranaga, P., B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, Armananzas, G. Santafé, and A. Pérez. 2006. Machine learning in bioinformatics. Briefings in Bioinformatics, 7(1): 86–112.
Li, L., J. Xu, D. Yang, X. Tan, and H. Wang. 2010. Computational approaches for microRNA studies: a review. Mammalian Genome, 21(1–2): 1–12.
Li, S.C., C.K. Shiau, and W. Lin. 2007. Vir-Mir db: prediction of viral microRNA candidate hairpins. Nucleic Acids Research, 36(suppl_1): D184–D189.
Lim Lee, P., C. Lau Nelson, G. Weinstein Earl, Y. S. Abdelhakim Aliaa, W. Rhoades Matthew, B. Burge Christopher, and P. Bartel David. 2003. The microRNAs of Caenorhabditis elegans. Genes Dev, 17(8): 991–1008.
Lindow, M., and J. Gorodkin. 2007. Principles and limitations of computational microRNA gene and target finding. DNA and Cell Biology, 26(5): 339–351.
Liu, C.G., G. A. Calin, B.Meloon, N. Gamliel, C. Sevignani, M. Ferracin, C. D. Dumitru, M. Shimizu, S. Zupo, and M. Dono. 2004. An oligonucleotide microchip for genome-wide microRNA profiling in human and mouse tissues. Proceedings of the National Academy of Sciences, 101(26): 9740–9744.
Lou, S., T. Sun, H. Li, and Z. Hu. 2018. Mechanisms of microRNA-mediated gene regulation in unicellular model alga Chlamydomonas reinhardtii. Biotechnology for Biofuels, 11(1): 244.
Magyar, L. 2018. A Review of the Utility of Bayesian Network Models. The University of Akron, ideaexchange.uakron.edu.1-28.
Mendes, N. D., A. T. Freitas, and M.F. Sagot. 2009. Current tools for the identification of miRNA genes and their targets. Nucleic Acids Research, 37(8): 2419–2433.
Milagro, F. I., J. Miranda, M. P. Portillo, A. Fernandez-Quintela, J. Campion, and J. A. Martínez. 2013. High-throughput sequencing of microRNAs in peripheral blood mononuclear cells: identification of potential weight loss biomarkers. PloS One, 8(1): e54319.
Nam, J. W., J. Kim, S. K. Kim, and B. T. Zhang. 2006. ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs. Nucleic Acids Research, 34(suppl_2): W455–W458.
Nam, J. W., K. R. Shin, J. Han, Y. Lee, V. N. Kim, and B. T. Zhang. 2005. Human microRNA prediction through a probabilistic co-learning model of sequence and structure. Nucleic Acids Research, 33(11): 3570–3581.
Nelson, P. T., D. O. N. A. Baldwin, W. P. Kloosterman, S. Kauppinen, R. H. A. Plasterk, and Z. Mourelatos .2006. RAKE and LNA-ISH reveal microRNA expression and localization in archival human brain. Rna, 12(2): 187–191.
Ng, K. L. S., and S. K. Mishra. 2007. De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics, 23(11): 1321–1330.
Ntranos, V., L. Yi, Melsted, and L. Pachter. 2019. A discriminative learning approach to differential expression analysis for single-cell RNA-seq. Nature Methods, 16(2): 163-166.
Oulas, A., A. Boutla, K. Gkirtzou, M. Reczko, K. Kalantidis, and P. Poirazi. 2009. Prediction of novel microRNA genes in cancer-associated genomic regions—a combined computational and experimental approach. Nucleic Acids Research, 37(10): 3276–3287.
Paicu, C., I. Mohorianu, M. Stocks, P. Xu, A. Coince, M. Billmeier, T. Dalmay, V. Moulton, and S. Moxon. 2017. miRCat2: accurate prediction of plant and animal microRNAs from next-generation sequencing datasets. Bioinformatics, 33(16): 2446-2454.
Ritchie, W., D. Gao, and J. E. J. Rasko. 2012. Defining and providing robust controls for microRNA prediction. Bioinformatics, 28(8): 1058–1061.
Saçar, M. D., and J. Allmer. 2014. Machine learning methods for microRNA gene prediction. In miRNomics: MicroRNA Biology and Computational Analysis. Springer, 1107:177-87
Saçar, M. D., H. Hamzeiy, and J. Allmer. 2013. Can MiRBase provide positive data for machine learning for the detection of MiRNA hairpins? Journal of Integrative Bioinformatics, 10(2): 1–11.
Sheng, Y., P. G. Engström, and B. Lenhard. 2007. Mammalian microRNA prediction through a support vector machine model of sequence and structure. PloS One, 2(9): e946.
Singh, S., and R. Singh. 2017. Application of supervised machine learning algorithms for the classification of regulatory RNA riboswitches. Briefings in Functional Genomics, 16(2): 99–105.
Siomi, H., and M. C. Siomi. 2010. Posttranscriptional regulation of microRNA biogenesis in animals. Molecular Cell, 38(3): 323–332.
Terai, G., T. Komori, K. Asai, and T. Kin. 2007. miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity. Rna, 13(12): 2081–2090.
Tong, C., Y. Jin, and Y. Zhang. 2006. Computational prediction of microRNA genes in silkworm genome. Journal of Zhejiang University Science B, 7(10): 806–816.
Tran, V. D. T., S. Tempel, B. Zerath, F. Zehraoui, and F. Tahi. 2015. miRBoost: boosting support vector machines for microRNA precursor classification. RNA (New York, N.Y.), 21(5): 775-785.
Várallyay, E., J. Burgyán, a nd Z. Havelda. 2007. Detection of microRNAs by Northern blot analyses using LNA probes. Methods, 43(2): 140–145.
Wang, X., S. M. Tang, and X. J. Shen. 2014. Overview of research on Bombyx mori microRNA. Journal of Insect Science, 14(133): 133.
Wu, Y., B. Wei, H. Liu, T. Li, and S. Rayner. 2011. MiRPara: a SVM-based software tool for prediction of most probable microRNA coding regions in genome scale sequences. BMC Bioinformatics, 12(1): 107.
Xue, C., F. Li, T. He, G.P. Liu, Y. Li, and X. Zhang. 2005. Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics, 6: 310.
Xue, H., Z. Wei, K. Chen, Y. Tang, X. Wu, J. Su, and J. Meng. 2020. Prediction of RNA methylation status from gene expression data using classification and regression methods. Evolutionary Bioinformatics, 16: 1176934320915707.
Yousef, M., S. Jung, V. Kossenkov, L. C. Showe, and M. K. Showe. 2007. Naïve Bayes for microRNA target predictions—machine learning for microRNA targets. Bioinformatics, 23(22): 2987–2992.
Yu, X., Q. Zhou, S.C. Li, Q. Luo, Y. Cai, W. Lin, Chen, Y. Yang, S. Hu, and J.Yu. 2008. The silkworm (Bombyx mori) microRNAs and their expressions in multiple developmental stages. PloS One, 3(8): e2997.
Zhang, G., Y. Deng, Q. Liu, B. Ye, Z. Dai, Y. Chen, and X. Dai. 2020. Identifying circular RNA and predicting its regulatory interactions by machine learning. Frontiers in Genetics, 11: 655.
Zhang, Y. Q., J. C. Rajapakse, and B. T. Zhang. 2008. Supervised Learning Methods for MicroRNA Studies. Machine Learning in Bioinformatics, Chapter 16, page 339.
Zheng, K., Z. H. You, L. Wang, Y. Zhou, P. Li, and Z. W. Li. 2019. MLMDA: A machine learning approach to predict and validate MicroRNA-disease associations by integrating of heterogenous information sources. Journal of Translational Medicine, 17(1): 1–14.
Zheng, X., X. Fu, K. Wang, and M. Wang. 2020. Deep neural networks for human microRNA precursor detection. BMC Bioinformatics, 21(1): 1-7.
Zhong, L., and J. T. L. Wang. 2016. Effective Classification of MicroRNA Precursors Using Combinatorial Feature Mining and AdaBoost Algorithms. ArXiv:1610.02281,ui.adsabs.harvard.edu.