Bioinformatics identification of genomic diversity of Copy Number Variations in laying and broiler hens
Copy number variation (CNV) consist of deletion, insertion, and duplications. It is an important source of genetic variation in organisms and thus influences on the gene expression and phenotypic variation. Copy number variation (CNV) is one of the structural variant with an intermediate size class larger than 50bp which involves unbalanced rearrangements that increase or decrease the amount of DNA (Pirooznia et al 2015, Alkan et al 2011). The size of CNVs is larger than 50bp, while smaller segments are known as insertions or deletions (indels). Thereupon these structural variations comprise more polymorphic than SNPs because of enormity, detection of them and their effect on phenotype has caught the attention of many researchers recently. It has been reported that CNVs changes in gene dosage and regulation as well as in transcript structure, and thus contribute to phenotypic variability (Pirooznia et al 2015, Alkan et al 2011). The pea-comb phenotype is caused by a CNV mapping to intron 1 of the SRY (sex determining region Y)-box 5 (SOX5) gene (Wright et al. 2009). Late feathering in chickens is due to incomplete duplication in PRLR and SPEF2 genes (Elfrink et al. 2008). In swine, dominant white colour has been related with a duplication of a 450-kb fragment of the KIT gene (Giuffra et al. 1999) and a splice mutation causing the skipping of exon 17 (Giuffra et al. 1999). In sheep, doubling in the ASIP gene results in the regulation of pigment in body coat (Norris et al. 2008). Doubling the 4.6 k base pair into the six introns of the STX17 gene results in a gray body color in the horse with age. Deletion of the intergenic region with a length of 11.7 kbp in the goat genome leads to the removal of horns (Clop et al. 2012). Chicken is the most intensively farmed animal on earth and is a major food source with billions of birds used in meat and egg production each year. A big share of chicken CNVs involves protein coding or regulatory sequences. A comprehensive study of chicken CNV can provide valuable information on genetic diversity and assist future analyses of associations between CNV and economically important traits in chickens. Unique chicken genome with macro and micro chromosomes and its biology make it an ideal organism for studies in development and evolution, as well as applications in agriculture and medicine (Burt 2005). In the last several years, There has been an increasing interest in the study of CNVs in the chicken. This study focuses on comparison of CNV between the broilers and layers chicken to find evidence of domestication on the genome using whole genome sequencing.
we used n=90 female birds of two commercial broiler (n=40) and layer (n=50) chicken. The broilers (BRs) were represented by 20 DNA samples of each of two lines (BRA and BRB) established independently and previously collected as part of the AVIANDIV project. In the layer group (LRs), data from 25 birds each from purebred white (WL) and brown (BL) egg laying populations, sequenced in the frame of the SYNBREED project (http://www.synbreed.tum.de/index.php?id=2 ,(were included. The paired-end reads with a read length of 101bp were mapped against the current reference genome assembly Galgal6 using the Burrows-Wheeler aligner (bwa, 0.6.2-r126 Version, with default parameters. Duplicate reads were masked during post-processing using the Picard tool set (version 2.9.2, http://picard.sourceforge.net). Finally, Genome Analysis Toolkit-3.3.0 was used to realign reads for correcting errors caused by InDels. Using GATK software package and Depth Of Coverage function (McKenna et al 2010), the depth of readings was calculated for each sample. Then filter out reads with mapping quality below 20. Because comparing the genomes of individuals in different groups was time consuming and computationally difficult for all parts of the genome, the genomes of each individual were divided into 1000 bp non-overlapping windows and the average reading depth per window was calculated. Then the results were normalized against the BL sample that showed highest average depth. In short, we created a correction factor per population and applied it on the depth of coverage value for each window. For all the contrasts, we performed an analysis of variance (ANOVA) as described (Carneiro et al 2014). For the Broilers-Layers contrast we scanned 935247 windows. 70372 windows showed significant by FDR with P < 0.001, with ANOVA using the Benjamini-Hochberg FDR method for multiple corrections (Benjamini and Hochberg 1995).
Mapping sequencing data to galGal6 assembly showed an average 98.61% mapping rate and 11.51 depth. Manhattan plot was plotted for regions of the genome that differed significantly between the two groups (FDR = 0.001). The points above the hypothetical line were identified and examined in a 25 Kbp confidence interval to identify possible genes. 39 regions were identified that half of them dose not contain any genes. Although Long noncoding RNAs are under lower selective pressure than protein-coding genes (Batista and Chang 2013), The other 11 regions contained 16 genes related to long non-coding RNAs. Long noncoding RNAs (lncRNAs) play a critical role in organizing the 3-dimensional genome architecture and regulating gene activity in cis or in trans through multiple mechanisms (Zhang et al 2019, Batista and Chang 2013). 6 othere regions also contained 12 coding genes. Most of the identified genes were somehow linked to the immune system disease or cancer. Genes such as DEDs and TNFAIP8 are involved in programmed cell death (apoptosis) and two genes NPAL3 and RCAN, which are involved in the immune system, had a copy number variation in the studied samples. In addition RCAN is involved in Down syndrome. The PFDN gene, located on chromosome 25, is also involved in Alzheimer's and Parkinson's disease.