عنوان مقاله [English]
Introduction (SNPs) are single nucleotide base variations, caused by transitions (C/T or G/A) or transversions (C/G, C/A, or T/A, T/G), in the same position between individual genomic DNA sequences. Single nucleotide polymorphisms have been applied as important molecular markers in genetics and breeding studies. About 40% of the Single nucleotide polymorphisms in the genes cause a change in an amino acid. The rapid advance of next generation sequencing provides a high-throughput means of SNP discovery. Transcriptome study can fill the gap between genotype and phenotype and help understanding the mechanisms from sequence to function. RNA sequencing (RNA-Seq) is a next generating sequencing based technology for studying of whole transcriptome and gene expression. It simultaneously enables study of transcriptomics sequences and very accurate quantitative gene expression (digital expression). Hence, these data are very suitable for high-throughput study of expression level of all transcribed genes and their SNPs (Single Nucleotide Polymorphism. Recently, RNA-Seq has also been used as an efficient and cost-effective method to systematically identify SNPs in transcribed regions in different species. A transcriptomics-based sequencing approach offers a cheaper alternative to identify a large number of polymorphisms and possibly to discover causative variants.
Materials and Methods In this study, RNA-Seq data were used to SNP discovery in American Holstein (Bos taurus) and Pakistanian Cholistani (Bos indicus) cows. RNA-Seq data of 21,078,477 and 20940063 paired end reads with 75 bp length resulted from pooling of whole blood samples of 40 Holstein cows at the University of Wisconsin, Dairy Cattle Center, USA, and 45 Cholistani cows at Gujait Peer Farm, Bahawalpur, Punjab, Pakistan, respectively, obtained from SRA database in NCBI for Holstein cows (http://www.ncbi.nlm.nih.gov/sra/SRX317197) and Cholistani cows http://www.ncbi.nlm.nih.gov/sra/SRS454433). MRNA sequencing was run on Illumina Genome Analyzer IIx (Illumina Inc., San Diego, CA). Data were converted from Sra format to Fastq format by fastq-dump command from Ubuntu linux version of Sratoolkit 2.5.4-1. Data quality control was checked by FastQC (v0.11.3) likewise trimmed for linked adaptors and bad quality reads by Trimmomatic 0.33 Adaptors were considered according to sequencing instrument as default (TruSeq2-PE.fa) and the minimum read length was set at 50 bp. Trimmed reads were aligned on UMD3.1 reference genome (release 81) based on annotation data by Tophat2, which applies Bowtie2 as the aligner. The transcriptome was assembled by TopHat2 software in two cow’s population by aligning and mapping the RNA-Seq reads on bovine reference genome. The SNPs were discovered by Samtools software.
Results and Discussion After data editing, the removed and low quality reads in both breeds were almost equal and relatively low. The length of whole transcriptome assembled, for example 52798651 bases in Holstein, indicates around 2% of the whole genome (around 2.6 Mbp) expressed as mRNA. In Cholistani cows, read mapping rate for forward and reverse reads were 81.3 and 79.9%, respectively, and multiple alignments rate was about 9.4%. Overall read mapping was 80.6% and concordant pair alignment was 70.1%. In Holstein cows, read mapping rate for forward and reverse reads were 66.3 and 55.4%, respectively, and multiple alignments rate was about 7.2%,. Overall read mapping was 60.8% and concordant pair alignment was 51.3%. Results show that 50183 and 137954 SNPs were discovered on the assembled transcriptome of Holstein and Cholistani cow’s samples, respectively, and 15308 SNPs were common in both breeds. No direct relation was found between the number of discovered SNPs and the chromosome length. Also 12 SNP types were identified including 4 transition and 8 transversion. The most commonly discovered SNP were transition, which were 70.6% in Cholistani and 69.6% in Holstein cows. The ratio of transition to transversion SNP (Ts / Tv) was 2.4 and 2.3 in Cholistani and Holstein cows, respectively. The number of discovered SNPs in Cholistani cows were approximately three times higher than Holstein cows. Because, for the alignment of both species used a same reference genome with Herford origin.
Conclusion the expression difference between two alleles in a single-nucleotide position causes phenotype diversity and probably explains the large part of variances between these two bovine subspecies, especially in diversity, susceptibility to disease and parasites, tolerating environmental stress such as biological and non-biological stresses in different environmental conditions. While, differential gene expression analysis or even allelic specific expression in gene level may not be able to explain phenotype diversity.