Prediction of Novel Pseudogenes in Ovine Reference Genome

Document Type : Genetics & breeding


University of Tehran


Introduction Pseudogenes are copies of the ancestral genes which have undergone changes that were constructed based on gene duplications and reverse transcription in the genome. They have been reported in all types of organisms ranging from bacteria to mammals. Pseudogenes increase the genetic diversity of a plethora of genes and they do so through gene conversion and recombination. Three classes of pseudogenes are known to exist: duplicated pseudogenes; processed or retrotransposed pseudogenes; and unitary or disabled pseudogenes. Pseudogenes have long been considered as nonfunctional genomic sequences. However, recent studies reported that many of them might have some form of biological activity. Recently, it has reported that pseudogenes represent a conspicuous part of the human transcriptome and proteome, as thousands of them are transcribed and hundreds are also translated. Also, it has been demonstrated that pseudogenes exert important coding-dependent and coding-independent functions that are involved in complex regulatory networks. Hence, the possibility of functionality of these genes, has increased interest in their accurate annotation. According to the best of our knowledge, there is no available report on the high-throughput pseudogene identification in sheep. Therefore, in the present study, to improve the annotation of sheep genome, we present the first genome-wide pseudogene identification for protein-coding genes using a homology-based computational approach.
Materials and Methods The pseudogene content in the sheep genome was estimated using an in-house computational annotation pipeline, named PseudoPipe. The PseudoPipe pipeline predicts pseudogenes in the genome using homology-based method (BLAST and a clustering algorithm). In the present study, repeat-masked sheep genome reference (Ovis_aries.Oar_v3.1), genome annotation gtf file (version 77) and all of the protein coding genes sequences were downloaded from ENSEMBL database. To identify pseudogenes, the sheep genome was searched in a comprehensive and consistent manner. The key steps in the pipeline involved using BLAST to rapidly cross-reference potential ‘‘parent’’ proteins against the intergenic regions of the genome and then processing the resulting ‘‘raw hits’’ such as eliminating redundant ones, clustering together neighbors, and associating and aligning clusters with a unique parent. Then, pseudogenes were classified based on a combination of criteria including homology, intron/exon structure, and existence of stop codons and frameshifts. Finally, we investigated the results manually and false positive results were removed. Also, the gene ontology (GO) of the parental genes that pseudogenes derived from them, have been investigate by DAVID software. Furthermore, different characteristics of the identified new candidate pseudogene were compared with known pseudogenes in the human, mice and cattle species.
Results and Discussion It is vital to identify pseudogenes to better understand genome annotation and disease-related molecular mechanism. Identification of pseudogenes is an ongoing effort, and there are several groups continuously working on identification of pseudogenes. The complexity of the identification of pseudogenes can be addressed by in silico analysis and using a homology-based whole genome identification approach. Here, using a computational method, we identified 4,098 high confidence pseudogenes including 1,102 duplicated and 2,996 processed pseudogenes in sheep genome. The results of the GO analysis showed that identified pseudogenes are significantly enriched in various biological processes, such as mRNA splicing, ribosome structure, binding rRNA, mitochondrial electron transport, translation and etc. Interestingly, a growing body of evidence suggests parental genes of pseudogenes roles are associated with ribosome, rRNA and translational biological processes. Detailed comparison of our results with other species showed that our results are in consistence with previous studies. For example, pseudogene distribution on the sheep chromosomes was in consistence with human and mouse genome. Moreover, it is reported that, duplicated pseudogenes are commonly found on the same chromosome as their parent genes.
Our results showed that about 28% of the identified duplicated pseudogenes were on the same chromosome with their parent genes. The results of the study will help to improve the annotation of the sheep genome. The coincidence of the results of this study with previous studies indicates accuracy of the method used in this research.
Conclusion This study, for the first time, has generated the catalog of 4,098 sheep putative pseudogenes. Our findings provide an evidence for pseudogene content in sheep which is a starting point for understanding of their regulatory mechanism. The identification of the novel pseudogenes have greatly improved the genome annotation of sheep. The results of this study will help to better annotation of sheep genome. By using such methods, we can also improve annotation genomes of various organisms.


1- Jäger, M., C. E. Ott., J. Grünhagen., J. Hecht., H. Schell., S. Mundlos., G. N. Duda., P. N. Robinson, and J. Lienau. 2011. Composite transcriptome assembly of RNA-seq data in a sheep model for delayed bone healing. BMC Genomics, 12(1): 1-12.
2- Junker, B. H, and F. Schreiber. 2011. Analysis of biological networks. Vol. 2. John Wiley & Sons, New York.
3- Birzele, F., J. Schaub., W. Rust., C. Clemens., P. Baum., H. Kaufmann., A. Weith., T. W. Schulz, and T. Hildebrandt. 2010. Into the unknown: expression profiling without genome sequence information in CHO by next generation sequencing. Nucleic Acids Research, 38(12): 3999-4010.
4- Derrien, T., R. Johnson., G. Bussotti., A. Tanzer., S. Djebali., H. Tilgner., G. Guernec., D. Martin., A. Merkel, and D. G. Knowles. 2012. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Research, 22(9): 1775-1789.
5- Chen, S. M., K. Y. Ma, and J. Zeng. 2011. Pseudogene: lessons from PCR bias, identification and resurrection. Molecular Biology Reports, 38(6): 3709-3715.
6- Zhang, Z., N. Carriero., D. Zheng., J. Karro., P. M. Harrison, and M. Gerstein. 2006. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics, 22(12): 1437-1439.
7- Mighell, A., N. Smith., P. Robinson, and A. Markham. 2000. Vertebrate pseudogenes. FEBS Letters, 468(2-3): 109-114.
8- Harrison, P. M., N. Echols, and M. B. Gerstein. 2001. Gerstein, Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. Nucleic Acids Research, 29(3): 818-830.
9- Echols, N., P. Harrison., S. Balasubramanian., N. M. Luscombe., P. Bertone., Z. Zhang, and M. Gerstein. 2002. Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes. Nucleic Acids Research, 30(11): 2515-2523.
10- Balakirev, E. S, and F. J. Ayala. 2003. Pseudogenes: are they “junk” or functional DNA?. Annual Review of Genetics, 37(1): 123-151.
11- Zhang, Z. D., A. Frankish., T. Hunt., J. Harrow, and M. Gerstein. 2010. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biology, 11(3): 1-17.
12- Harrison, P. M, and M. Gerstein. 2002. Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. Journal of Molecular Biology, 318(5): 1155-1174.
13- Pei, B., C. Sisu., A. Frankish., C. Howald., L. Habegger., X. J. Mu., R. Harte., S. Balasubramanian., A. Tanzer, and M. Diekhans . 2012. The GENCODE pseudogene resource. Genome Biology, 13(9): 1-26.
14- Poliseno, L. 2014. Pseudogenes: Functions and Protocols. Humana Press, New Yok.
15- Ding, W., L. Lin., B. Chen, and J. Dai. 2006. L1 elements, processed pseudogenes and retrogenes in mammalian genomes. IUBMB Life, 58(12): 677-685.
16- Torrents, D., M. Suyama., E. Zdobnov, and P. Bork. 2003. A genome-wide survey of human pseudogenes. Genome Research, 13(12): 2559-2567.
17- Balasubramanian, S., D. Zheng., Y. J. Liu., G. Fang., A. Frankish., N. Carriero., R. Robilotto., P. Cayting, and M. Gerstein. 2009. Comparative analysis of processed ribosomal protein pseudogenes in four mammalian genomes. Genome Biology, 10(1): 1-10.
18- Khurana, E., H. Y. Lam., C. Cheng., N. Carriero., P. Cayting, and M. B. Gerstein. 2010. Segmental duplications in the human genome reveal details of pseudogene formation. Nucleic Acids Research, 38(20): 6997-7007.
19- Sisu, C., B. Pei., J. Leng., A. Frankish., Y. Zhang., S. Balasubramanian., R. Harte., D. Wang., M. Rutenberg-Schoenberg, and W. Clark. 2014. Comparative analysis of pseudogenes across three phyla. Proceedings of the National Academy of Sciences, 111(37): 13361-13366.
20- Huang, D. W., B. T. Sherman, and R. A. Lempicki. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4(1): 44-57.
21- Brent, M. R, and R. Guigo. 2004. Recent advances in gene structure prediction. Current opinion in structural Biology, 14(3): 264-272.
22- Zheng, D, and M. B. Gerstein. 2006. A computational approach for identifying pseudogenes in the ENCODE regions. Genome Biology, 7(1): 1-10.
23- Mehraban, M., J. Jamshidi, and S. Vallian. 2014. Gene Families: Structure, Organization and Evolution. Journal of Fasa University of Medical Sciences, 4(2): 134-153. (In Persian).
24- Poliseno, L., L. Salmena., J. Zhang., B. Carver., W. J. Haveman, and P. P. Pandolfi. 2010. A coding-independent function of gene and pseudogene mRNAs regulates tumour biology. Nature, 465(7301): 1033-1038.
25- Oliver, H. T., A. A. Arvin., P. Stein., A. Girard., E. P. Murchison., S. Cheloufi., E. Hodges., M. Anger., R. Sachidanandam, and R. M. Schultz. 2008. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature, 453(7194): 534-538.
26- Piehler, A. P., M. Hellum., J. J. Wenzel., E. Kaminski., K. B. Haug., P. Kierulf, and W. E. Kaminski. 2008. The human ABC transporter pseudogene family: Evidence for transcription and gene-pseudogene interference. BMC genomics, 9(1):1-13.
27- Muro, E. M., N. Mah, and M. A. Andrade-Navarro. 2011. Functional evidence of post-transcriptional regulation by pseudogenes. Biochimie, 93(11): 1916-1921.
28- Han, Y. J., S. F. Ma., G. Yourek., Y. D. Park, and J. G. Garcia. 2011. A transcribed pseudogene of MYLK promotes cell proliferation. The FASEB Journal, 25: 2305-2312.
29- Watanabe, T., Y. Totoki., A. Toyoda., M. Kaneda., S. Kuramochi-Miyagawa., Y. Obata., H. Chiba., Y. Kohara., T. Kono, and T. Nakano. 2008. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature, 453(7194): 539-543.
30- Guo, X., Z. Zhang, M. B. Gerstein and D. Zheng. 2009. Small RNAs originated from pseudogenes: cis-or trans-acting?. PLOS Computational Biology, 5(7): 1-15.
31- Hawkins, P. G., and K. V. Morris. 2010. Transcriptional regulation of Oct4 by a long non-coding RNA antisense to Oct4-pseudogene 5. Transcription, (3): 165-175.
32- Salmena, L., A. Carracedo, and P. P. Pandolfi. 2008. Tenets of PTEN tumor suppression. Cell, 133(3): 403-414.
33- Zhang, Z., N. Carriero, and M. Gerstein. 2004. Comparative analysis of processed pseudogenes in the mouse and human genomes. Trends in Genetics, 20(2): 62-67.
34- Clamp, M., B. Fry., M. Kamal., X. Xie., J. Cuff., M. F. Lin., M. Kellis., K. Lindblad-Toh, and E. S. Lander. 2007. Distinguishing protein-coding and noncoding genes in the human genome. Proceedings of the National Academy of Sciences, 104(49): 19428-19433.
35- Zhang, Z., P. Harrison, and M. Gerstein. 2002. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Research, 12(10): 1466-1482.
36- Kalyana-Sundaram, S., C. Kumar-Sinha., S. Shankar., D. R. Robinson., Y. M. Wu., X. Cao., I. A. Asangani., V. Kothari., J. R. Prensner, and R. J. Lonigro. 2012. Expressed pseudogenes in the transcriptional landscape of human cancers. Cell, 149(7): 1622-1634.
37- Dharia, A. P., A. Obla., M. D. Gajdosik., A. Simon, and C. E. Nelson. 2014. Tempo and mode of gene duplication in mammalian ribosomal protein evolution. Plos One, 9(11): 1-15.
38- ori, H., K. i. Higo, and S. Osawa. 1977. The rates of evolution in some ribosomal components. Journal of Molecular Evolution, 9(3): 191-201.
39- Gupta, V, and J. R. Warner. 2014. Ribosome-omics of the human ribosome. RNA, 20(7): 1004-1013.
40- Draptchinskaia, N., P. Gustavsson., B. Andersson., M. Pettersson., I. Dianzani., S. Ball., G. Tchernia., J. Klar., H. Matsson, and D. Tentler.1999. The gene encoding ribosomal protein S19 is mutated in Diamond-Blackfan anaemia. Nature Genetics, 21(2): 169-175.
41- Thorrez, L., K. Van Deun., L. C. Tranchevent., L. Van Lommel., K. Engelen., K. Marchal., Y. Moreau., I. Van Mechelen, and F. Schuit. 2008. Using ribosomal protein genes as reference: a tale of caution. Plos One, 3(3): 1-8.
42- Uechi, T., N. Maeda., T. Tanaka, and N. Kenmochi. 2002. Functional second genes generated by retrotransposition of the X‐linked ribosomal protein genes. Nucleic Acids Research, 30(24): 5369-5375.
43- Zhang, Z., P. M. Harrison., Y. Liu, and M. Gerstein. 2003. Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Research, 13(12): 2541-2558.
44- Garcia-Meunier, P., M. Etienne-Julan., P. Fort., M. Piechaczyk, and F. Bonhomme. 1993. Concerted evolution in the GAPDH family of retrotransposed pseudogenes. Mammalian Genome, 4(12): 695-703.
45- Liu, Y. J., D. Zheng., S. Balasubramanian., N. Carriero., E. Khurana., R. Robilotto, and M. B. Gerstein. 2009. Comprehensive analysis of the pseudogenes of glycolytic enzymes in vertebrates: the anomalously high number of GAPDH pseudogenes highlights a recent burst of retrotrans-positional activity. BMC Genomics, 10(1): 1-12.
46- Threadgill, D. W, and J. E. Womack. 1990. Genomic analysis of the major bovine milk protein genes. Nucleic Acids Research, 18(23): 6935-6942.
47- Chang, A. L. S., P. H. Bitter., K. Qu., M. Lin., N. A. Rapicavoli, and H. Y. Chang. 2013. Rejuvenation of gene expression pattern of aged human skin by broadband light treatment: a pilot study. Journal of Investigative Dermatology, 133(2): 394-402.
48- Duret, L., C. Chureau., S. Samain., J. Weissenbach, and P. Avner. 2006. The Xist RNA gene evolved in eutherians by pseudogenization of a protein-coding gene. Science, 312(5780): 1653-1655.
49- Rapicavoli, N. A., K. Qu., J. Zhang., M. Mikhail., R.-M. Laberge, and H. Y. Chang. 2013. A mammalian pseudogene lncRNA at the interface of inflammation and anti-inflammatory therapeutics. Elife, 2: e00762.
50- Deininger, P. L, and M. A. Batzer, 1993. Evolution of retroposons. Pages 157-196 in Evolutionary Biology. vol 27. M. K. Hecht., R. J. MacIntyre, and M. T. Clegg. ed. Springer, Boston, MA.
51- McBride, O. W., I. L. Pirtle, and R. M. Pirtle. 1989. Localization of three DNA segments encompassing tRNA genes to human chromosomes 1, 5, and 16: Proposed mechanism and significance of tRNA gene dispersion. Genomics, 5(3): 561-573.
52- Raha, D., Z. Wang., Z. Moqtaderi., L. Wu., G. Zhong., M. Gerstein., K. Struhl, and M. Snyder. 2010. Close association of RNA polymerase II and many transcription factors with Pol III genes. Proceedings of the National Academy of Sciences, 107(8): 3639-3644.
53- Johnsson, P., A. Ackley., L. Vidarsdottir., W. O. Lui., M. Corcoran., D. Grander, and K. V. Morris. 2013. A pseudogene long-noncoding-RNA network regulates PTEN transcription and translation in human cells. Nature Structural and Molecular Biology, 20(4): 440-446.
54- Ars, E., E. Serra., J. Garcia., H. Kruyer., A. Gaona., C. Lazaro, and X. Estivill. 2000. Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Human Molecular Genetics, 9(2): 237-247.
55- Overman, R. G., P. J. Enderle., J. M. Farrow., J. E. Wiley, and M. A. Farwell. 2003. The human mitochondrial translation initiation factor 2 gene (MTIF2): transcriptional analysis and identification of a pseudogene. Biochimica et Biophysica Acta (BBA)-Gene Structure and Expression, 1628(3): 195-205.
56- Emahazion, T., A. Beskow., U. Gyllensten, and A. Brookes. 1998. lntron based radiation hybrid mapping of 15 complex I genes of the human electron transport chain. Cytogenetic and Genome Research, 82(1-2): 115-119.
57- de Coo, R., P. Buddiger., H. Smeets., A. G. van Kessel., J. Morgan-Hughes., D. O. Weghuis., J. Overhauser, and B. van Oost. 1995. Molecular cloning and characterization of the active human mitochondrial NADH: ubiquinone oxidoreductase 24-kDa gene (NDUFV2) and its pseudogene. Genomics, 26(3): 461-466.
58- Lomax, M. I., C. L. Hsieh., B. T. Darras, and U. Francke. 1991. Structure of the human cytochrome c oxidase subunit Vb gene and chromosomal mapping of the coding gene and of seven pseudogenes. Genomics, 10(1): 1-9.
59- Blass, J. P., R. K. F. SHEU, and G. E. Gibson. 2000. Inherent abnormalities in energy metabolism in Alzheimer disease: interaction with cerebrovascular compromise. Annals of the New York Academy of Sciences, 903(1): 204-221.
60- Zhang, J., E. Nuebel., G. Q. Daley., C. M. Koehler, and M. A. Teitell. 2012. Metabolic regulation in pluripotent stem cells during reprogramming and self-renewal. Cell Stem Cell, 11(5): 589-595.