Document Type : Genetics & breeding
Authors
1
Animal Science Research Department, Khorasan Razavi Agricultural and Natural Resources Research and Education Center, AREEO, Mashhad, Iran
2
2- Animal Science Research Institute of Iran, Agricultural Research, Education and Extension Organization (AREEO), Karaj, Iran
Abstract
Introduction MicroRNAs (miRNAs) constitute a large family of non-protein-coding small RNA (ncRNA) molecules and have important roles in the regulation of both plant and animal developmental procedures. Generally, sequences of miRNA demonstrate high sequence conservation across animals and are produced from the primary stem-loop structure in the nucleus, which is an important feature of miRNAs. MiRNAs are one of the most important regulatory factors involved in post-transcriptional levels of gene expression that contribute to the modulation of a large number of physiological processes such as development, metabolism and disease occurrence. To date, A few studies related to miRNAs of the economically important silkworm, Bombyx mori, have been carried out, focusing on detection, expression study, and prediction of function. Machine learning approaches are crucial for prediction success. These methods can solve classification problem.
Materials and Method Although hundreds of miRNAs have been detected in different animals, a lot of them are still unknown. Then, finding of novel miRNA genes is an essential step for understanding miRNA intervened post transcriptional regulation processes. It appears that biological methods to recognize miRNA genes might be inadequate in their capacity to identify uncommon miRNAs and are further limited to the tissues surveyed and the developmental phase of the animal under experiment. These restrictions have led to the development of new computational methods attempting to detect potential miRNAs. Experimentally verified miRNA sequences in miRBase release 22.0 were extracted for inclusion in the positive data set. In the miRBase, the reported secondary structures were predicted by a collection of RNA folding software packages. Consequently, in this study for uniformity, all miRNA secondary structures analyzed using RNAfold packages. The major step for machine learning approaches is the selection of a suitable negative dataset. It is important for a well-trained classifier. If the sequences are too artificial, e.g. completely random sequences, then there is a risk that the classifiers will not be well trained to differentiate between different categories of real biological sequences. Conversely, if the negative dataset is too similar to the positive dataset, the classifiers will be unable to find a way to adequately differentiate between these two data sets. We investigated several different types of negative sequences and finally selected negative sequences which made the best distinction with positive data set. The positive training dataset for our classifier development composed of known silkworm pre miRNAs, while the negative training dataset composed of other ncRNA sequences. Our feature set composed of various features and selecting the most discriminative set of features would increase the performance, efficiency and comprehensibility of a classifier method by reducing its complexity.
Results and Discussion Secondary structural patterns of pre miRNA used in this study such as the intramolecular base pairing of pre miRNA is an important beneficial feature for miRNAs classification. The selective powers of the two different classes of miRNAs secondary structural conformation (dot-bracket notation) were analyzed. Secondary structural feature of miRNA such as Minimum Free Energy, Watson-crick base pairing (AU, GC), Wobble base pairing (G-U) and unpaired bases (A, G, C, U) is analyzed by different algorithms. Here we could successfully solve classification problem by developing an effective classification system using machine learning techniques. Our approach includes introducing more representative datasets, extracting new effective biological features, and comprehensive evaluating of classification performance through these methods via cross-validation. Performance of different algorithms was measured by the total number of true negatives (TN), true positives (TP), false positives (FP), false negatives (FN), and accuracy (Q). In order to evaluate the efficiency of various methods developed in this study, various parameters like F-measure, Matthews correlation coefficient (MCC), accuracy (Q) and, ROC area were calculated. Performance measurement of various models tested with data from miRBase in release 22 in ten-fold cross validation. Multilayer Perceptron model could predict pre miRNAs from non-coding sequences that can be important for detecting the true pre miRNAs in genomic sequences. Consequently a new method on miRNA prediction model could be favorable to understand the characteristics miRNA associated with miRNA biogenesis.
Conclusion Research on miRNA represents important progress in the study of ncRNAs and may provide further information on understanding of RNA regulation networks. Practical research on silkworm microRNAs has shown that microRNAs can have significant effects on the underlying mechanisms of silkworm growth processes. In addition to the research that has been done so far, it provides the basis for advances in improving our understanding of RNA regulatory networks and the molecular mechanisms involved in gene expression patterns during different stages of silkworm life. Due to insufficient computational research in the field of silkworm microRNAs, further research on the microRNAs of this species represents an important advance in the study of noncoding RNAs, which can provide further information on the activity of noncoding RNAs. Machine learning algorithms will help the researcher discover the uncover miRNA that many researchers were not able to explore.
Keywords
Send comment about this article