Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce i...Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs. In this study, we used the Illumina Porcine SNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs. The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency(MAF) increased. However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines. Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested. This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.展开更多
Numbers of vertebrae is an important economic trait associated with body size and meat productivity in animals.However,the genetic basis of vertebrae number in donkey remains to be well understood.The aim of this stud...Numbers of vertebrae is an important economic trait associated with body size and meat productivity in animals.However,the genetic basis of vertebrae number in donkey remains to be well understood.The aim of this study was to identify candidate genes affecting the number of thoracic(TVn)and the number of lumbar vertebrae(LVn)in Dezhou donkey.A genome-wide association study was conducted using whole genome sequence data imputed from low-coverage genome sequencing.For TVn,we identified 38 genome-wide significant and 64 suggestive SNPs,which relate to 7 genes(NLGN1,DCC,SLC26A7,TOX,WNT7A,LOC123286078,and LOC123280142).For LVn,we identified 9 genome-wide significant and 38 suggestive SNPs,which relate to 8 genes(GABBR2,FBXO4,LOC123277146,LOC123277359,BMP7,B3GAT1,EML2,and LRP5).The genes involve in the Wnt and TGF-βsignaling pathways and may play an important role in embryonic development or bone formation and could be good candidate genes for TVn and LVn.展开更多
Background As pre-cut and pre-packaged chilled meat becomes increasingly popular,integrating the carcasscutting process into the pig industry chain has become a trend.Identifying quantitative trait loci(QTLs)of pork c...Background As pre-cut and pre-packaged chilled meat becomes increasingly popular,integrating the carcasscutting process into the pig industry chain has become a trend.Identifying quantitative trait loci(QTLs)of pork cuts would facilitate the selection of pigs with a higher overall value.However,previous studies solely focused on evaluating the phenotypic and genetic parameters of pork cuts,neglecting the investigation of QTLs influencing these traits.This study involved 17 pork cuts and 12 morphology traits from 2,012 pigs across four populations genotyped using CC1 PorcineSNP50 BeadChips.Our aim was to identify QTLs and evaluate the accuracy of genomic estimated breed values(GEBVs)for pork cuts.Results We identified 14 QTLs and 112 QTLs for 17 pork cuts by GWAS using haplotype and imputation genotypes,respectively.Specifically,we found that HMGA1,VRTN and BMP2 were associated with body length and weight.Subsequent analysis revealed that HMGA1 primarily affects the size of fore leg bones,VRTN primarily affects the number of vertebrates,and BMP2 primarily affects the length of vertebrae and the size of hind leg bones.The prediction accuracy was defined as the correlation between the adjusted phenotype and GEBVs in the validation population,divided by the square root of the trait’s heritability.The prediction accuracy of GEBVs for pork cuts varied from 0.342 to 0.693.Notably,ribs,boneless picnic shoulder,tenderloin,hind leg bones,and scapula bones exhibited prediction accuracies exceeding 0.600.Employing better models,increasing marker density through genotype imputation,and pre-selecting markers significantly improved the prediction accuracy of GEBVs.Conclusions We performed the first study to dissect the genetic mechanism of pork cuts and identified a large number of significant QTLs and potential candidate genes.These findings carry significant implications for the breeding of pork cuts through marker-assisted and genomic selection.Additionally,we have constructed the first reference populations for genomic selection of pork cuts in pigs.展开更多
Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungeno...Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.展开更多
Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genom...Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info 〉 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) 〈5%), the proportion of well- imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF 〈 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.展开更多
Sequencing-based genome-wide association studies(GWAS) have facilitated the identification of causal associations between genetic variants and traits in diverse species. However, it is cost-prohibitive for the majorit...Sequencing-based genome-wide association studies(GWAS) have facilitated the identification of causal associations between genetic variants and traits in diverse species. However, it is cost-prohibitive for the majority of research groups to sequence a large number of samples. Here, we carried out genotype imputation to increase the density of single nucleotide polymorphisms in a large-scale Swine F;population using a reference panel including 117 individuals, followed by a series of GWAS analyses. The imputation accuracies reached 0.89 and 0.86 for allelic concordance and correlation, respectively. A quantitative trait nucleotide(QTN) affecting the chest vertebrate was detected directly, while the investigation of another QTN affecting the residual glucose failed due to the presence of similar haplotypes carrying wild-type and mutant allelesin the reference panel used in this study. A high imputation accuracy was confirmed by Sanger sequencing technology for the most significant loci. Two candidate genes,CPNE5 and MYH3, affecting meat-related traits were proposed. Collectively, we illustrated four scenarios in imputation-based GWAS that may be encountered by researchers, and our results will provide an extensive reference for future genotype imputation-based GWAS analyses in the future.展开更多
基金supported by the China Agriculture Research System of MOF and MARA(CARS-35)the National Natural Science Foundation of China(32072696,31790414 and 31601916)the Fundamental Research Funds for the Central Universities(2662019PY011)。
文摘Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs. In this study, we used the Illumina Porcine SNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs. The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency(MAF) increased. However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines. Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested. This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding.
基金the Natural Science Foundation of Shandong Province,China(ZR2020MC168)。
文摘Numbers of vertebrae is an important economic trait associated with body size and meat productivity in animals.However,the genetic basis of vertebrae number in donkey remains to be well understood.The aim of this study was to identify candidate genes affecting the number of thoracic(TVn)and the number of lumbar vertebrae(LVn)in Dezhou donkey.A genome-wide association study was conducted using whole genome sequence data imputed from low-coverage genome sequencing.For TVn,we identified 38 genome-wide significant and 64 suggestive SNPs,which relate to 7 genes(NLGN1,DCC,SLC26A7,TOX,WNT7A,LOC123286078,and LOC123280142).For LVn,we identified 9 genome-wide significant and 38 suggestive SNPs,which relate to 8 genes(GABBR2,FBXO4,LOC123277146,LOC123277359,BMP7,B3GAT1,EML2,and LRP5).The genes involve in the Wnt and TGF-βsignaling pathways and may play an important role in embryonic development or bone formation and could be good candidate genes for TVn and LVn.
基金National Natural Science Foundation of China[grant number 32160782].
文摘Background As pre-cut and pre-packaged chilled meat becomes increasingly popular,integrating the carcasscutting process into the pig industry chain has become a trend.Identifying quantitative trait loci(QTLs)of pork cuts would facilitate the selection of pigs with a higher overall value.However,previous studies solely focused on evaluating the phenotypic and genetic parameters of pork cuts,neglecting the investigation of QTLs influencing these traits.This study involved 17 pork cuts and 12 morphology traits from 2,012 pigs across four populations genotyped using CC1 PorcineSNP50 BeadChips.Our aim was to identify QTLs and evaluate the accuracy of genomic estimated breed values(GEBVs)for pork cuts.Results We identified 14 QTLs and 112 QTLs for 17 pork cuts by GWAS using haplotype and imputation genotypes,respectively.Specifically,we found that HMGA1,VRTN and BMP2 were associated with body length and weight.Subsequent analysis revealed that HMGA1 primarily affects the size of fore leg bones,VRTN primarily affects the number of vertebrates,and BMP2 primarily affects the length of vertebrae and the size of hind leg bones.The prediction accuracy was defined as the correlation between the adjusted phenotype and GEBVs in the validation population,divided by the square root of the trait’s heritability.The prediction accuracy of GEBVs for pork cuts varied from 0.342 to 0.693.Notably,ribs,boneless picnic shoulder,tenderloin,hind leg bones,and scapula bones exhibited prediction accuracies exceeding 0.600.Employing better models,increasing marker density through genotype imputation,and pre-selecting markers significantly improved the prediction accuracy of GEBVs.Conclusions We performed the first study to dissect the genetic mechanism of pork cuts and identified a large number of significant QTLs and potential candidate genes.These findings carry significant implications for the breeding of pork cuts through marker-assisted and genomic selection.Additionally,we have constructed the first reference populations for genomic selection of pork cuts in pigs.
基金supported by the National Natural Science Foundation of China(32022078)the Local Innovative and Research Teams Project of Guangdong Province,China(2019BT02N630)the support from the National Supercomputer Center in Guangzhou,China。
文摘Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction.
文摘Common variants explain little of the variance of most common disease, prompting large-scale sequencing studies to understand the contribution of rare variants to these diseases. Imputation of rare variants from genome-wide genotypic arrays offers a cost-efficient strategy to achieve necessary sample sizes required for adequate statistical power. To estimate the performance of imputation of rare variants, we imputed 153 individuals, each of whom was genotyped on 3 different genotype arrays including 317k, 610k and 1 million single nucleotide polymorphisms (SNPs), to two different reference panels: HapMap2 and 1000 Genomes pilot March 2010 release (1KGpilot) by using IMPUTE version 2. We found that more than 94% and 84% of all SNPs yield acceptable accuracy (info 〉 0.4) in HapMap2 and 1KGpilot-based imputation, respectively. For rare variants (minor allele frequency (MAF) 〈5%), the proportion of well- imputed SNPs increased as the MAF increased from 0.3% to 5% across all 3 genome-wide association study (GWAS) datasets. The proportion of well-imputed SNPs was 69%, 60% and 49% for SNPs with a MAF from 0.3% to 5% for 1M, 610k and 317k, respectively. None of the very rare variants (MAF 〈 0.3%) were well imputed. We conclude that the imputation accuracy of rare variants increases with higher density of genome-wide genotyping arrays when the size of the reference panel is small. Variants with lower MAF are more difficult to impute. These findings have important implications in the design and replication of large-scale sequencing studies.
基金supported by the National Natural Science Foundation of China (31640046 and 31760656)the National Key Research and Development Program of China (2020YFA0509500)
文摘Sequencing-based genome-wide association studies(GWAS) have facilitated the identification of causal associations between genetic variants and traits in diverse species. However, it is cost-prohibitive for the majority of research groups to sequence a large number of samples. Here, we carried out genotype imputation to increase the density of single nucleotide polymorphisms in a large-scale Swine F;population using a reference panel including 117 individuals, followed by a series of GWAS analyses. The imputation accuracies reached 0.89 and 0.86 for allelic concordance and correlation, respectively. A quantitative trait nucleotide(QTN) affecting the chest vertebrate was detected directly, while the investigation of another QTN affecting the residual glucose failed due to the presence of similar haplotypes carrying wild-type and mutant allelesin the reference panel used in this study. A high imputation accuracy was confirmed by Sanger sequencing technology for the most significant loci. Two candidate genes,CPNE5 and MYH3, affecting meat-related traits were proposed. Collectively, we illustrated four scenarios in imputation-based GWAS that may be encountered by researchers, and our results will provide an extensive reference for future genotype imputation-based GWAS analyses in the future.