Meta-analysis of RNA-Seq and microarray expression data to identify effective genes in sheep muscle growth and development

Document Type : Research Articles


1 ferdosi mashhad

2 department of animal science, faculty of agriculture, ferdowsi university of mashhad, iran

3 خراسان جنوبی. شهر فردوس . شهرک ولیعصر. خیابان سعدی. سعدی ۱۹. پلاک ۳۱


Introduction Among different sheep breeds in the world, the Texel breed is known as a meaty and muscular breed. Skeletal muscle growth is a step-by-step and exponential process from differentiation, development and maturation, which is regulated by gene networks and cell signaling pathways, and several genes and factors are involved in the process of muscle fiber formation and their growth and hypertrophy (Badday Betti et al. 2022). The study of gene expression is done with several methods, and this gene expression information is used in breeding programs as a tool to improve phenotypic choices. Databases are a large source of expression data that can be used by bioinformatics methods to integrate heterogeneous data from different studies and platforms. In this study, by integrating the microarray and RNA-Seq data available in the database belonging to the muscle tissue of Texel breed sheep, the transcriptomic profile of the muscle was compared at two ages of embryonic and adult.

Materials and Methods Microarray data related to longissimus dorsi muscle tissue with three replicates d-70 embryos from GEO database with accession number GSE23563 and RNA-Seq data related to muscle tissue from six samples with two replicates from adult individuals from ArrayExpress database were selected. Limma, Biobase and GEOquery software packages were used to calculate the expression values of the microarray data related to the embryonic age in the R environment, and Tuxedo, HTSeq and DESeq2 packages were used in the Linux and R environment to calculate the expression values of the RNA-Seq data (Kamali et al. 2022; Sahraei et al. 2019). Then two types of expression values were integrated and to eliminate non-biological effects, the batch effects were also removed. Next, differential genes were identified with the limma software package. In order to identify the relationship between the identified differential genes, the gene network was drawn between them by software of Cytoscape version 3.7.1 and String 1.5.1 program. next, due to the vastness of the gene network, each network was clustered with MCODE 1.6.1 and CytoCluster 2.1.0 programs (ClusterOne algorithm) and significant clusters (P-value < 0.05) were identified (Saedi et al. 2022). In order to better understand the ontology and function of the identified differential genes, the Gene Ontology of the genes was investigated using software of Cytoscape version 3.7.1 and ClueGO 2.5.9 and CluePedia 1.5.9 programs. After receiving the Gene Ontology results, significant Gene Ontology terms (P-Value < 0.05) related to functional groups were identified. Finally, the selected genes (Adj P-Value < 0.05) were identified and introduced in these two age groups.

Results and Discussion After quality control, correcting and normalizing the microarray data, the GPL10778 platform annotation file with 1042520 Probe ID was used to calculate their expression values. After relevant analyzes of 9289 Probe ID identified related to the data of this study, 7918 Gene Symbol was identified finally. After quality control, trimming and normalizing the RNA-Seq data in total, the number of Ensembl_Genes based on which the reading values were calculated by HTSeq was 27056. After removing IDs that had zero readings in all 6 samples, 10855 IDs remained. Then, these 10855 Ensembl ID were merged with the annotation file to obtain Gene Symbol, and finally 9417 common genes were identified between the six samples of adult age. The results of differential expression analysis showed that there were significant differences in the expression of 62 genes (37 increased and 25 decreased) in the muscle tissue between adult and embryonic age. By creating a gene network between differential genes, 15 selected genes were identified, including MYH1, ACTN3, CASQ1, TMOD4, FBP2, SLC2A4, MX1, COX4I1, SOD2, MFN2, UQCRB, UCP3, PRKAB2, PHKG2, PPP1R3C. The function of these genes has been proven in cell proliferation, protein synthesis, myofibril formation, and lipid metabolism. Differential gene enrichment analysis revealed some biological processes such as Vasculogenesis, positive regulation of ossification, positive regulation of muscle tissue development, regulation of muscle contraction, contractile fiber part, calcium signaling, calcineurin-NFAT signaling cascade and regulation of receptor signaling pathway via JAK-STAT, the molecular function of regulating cation channel activity and the cellular components of the contractile fiber.

Conclusion This study in addition to confirming the accuracy of the integration method of two types of heterogeneous data, provided a general view of the transcriptomic differences of Texel sheep muscle tissue at two important age points to be a useful source for biological investigations of genes related to muscle growth and development in sheep.


Main Subjects


Articles in Press, Accepted Manuscript
Available Online from 05 February 2023
  • Receive Date: 04 December 2022
  • Revise Date: 29 January 2023
  • Accept Date: 05 February 2023
  • First Publish Date: 05 February 2023