January 18, 2025
Search
Expand search form
Follow Us:

The Role of Principal Component Analysis and Robust Principal Component Analysis in Genome-Wide Association

Identification of novel putative alleles related to important agronomic traits of wheat using robust strategies in GWAS

A Case Study in Wheat Spike Traits

Abstract

Genome-wide association studies (GWAS) have revolutionized the field of genetics by enabling the identification of genetic variants associated with complex traits. In this study, we investigate the application of principal component analysis (PCA) and robust principal component analysis (rPCA) in GWAS to elucidate the genetic basis of spike traits in Iranian wheat cultivars and landraces. We explore the impact of outliers on the analysis, evaluate the effectiveness of different PCA methods, and identify marker-trait associations (MTAs) for various spike traits under different environmental conditions. Additionally, we annotate the identified markers with relevant genes and pathways to gain insights into the biological processes involved.

Introduction

Genetic improvement of crop plants, such as wheat, is crucial for meeting the increasing global demand for food. Genome-wide association studies have emerged as powerful tools for dissecting the genetic architecture of complex traits. By analyzing genetic variations across the entire genome, GWAS enables the identification of marker-trait associations, providing valuable insights into the underlying genetics of important traits. The incorporation of population structure analysis, such as principal component analysis (PCA), helps account for the underlying genetic diversity in the studied population. However, traditional PCA methods may be sensitive to outliers, which can lead to biased results. Robust PCA methods offer robustness against outliers and provide a more accurate estimation of population structure. Understanding the genetic basis of spike traits in wheat is essential for enhancing crop productivity and breeding programs.

Methods

The study employed a dataset of 294 Iranian wheat genotypes, including cultivars and landraces. Phenotypic data were collected for various spike traits under well-watered and rain-fed environments. Genotyping data were obtained using single nucleotide polymorphism (SNP) markers, which provide dense genomic coverage. PCA was performed to explore the population structure, and various robust PCA methods, such as Hubert, Grid, Locantore, and Proj, were employed to robustly estimate the principal components. Outliers were identified using both traditional PCA and robust PCA approaches. GWAS was conducted using both traditional single-trait approaches and PCA-based approaches, with marker-trait associations (MTAs) identified for each trait. Gene annotation and pathway analysis were performed to unravel the biological significance of the identified markers.

Results

The analysis of linkage disequilibrium (LD) revealed varying patterns of LD decay between the sub-genomes of wheat, with higher LD observed in the D genome compared to the B genome. PCA analysis demonstrated a clear differentiation between Iranian wheat cultivars and landraces, indicating distinct genetic backgrounds. Interestingly, some cultivars were found to have originated from landraces, suggesting the incorporation of landrace genetic diversity into modern cultivars through breeding programs.

The GWAS results revealed significant MTAs for various spike traits under both well-watered and rain-fed conditions. The number and identity of associated SNPs varied depending on the trait and environmental condition. The traditional single-trait GWAS approach using PCA as population structure covariates identified a set of MTAs, while the PCA-based GWAS approach using the first two principal components (PCs) obtained from phenotypic data provided additional insights into pleiotropic marker associations. The incorporation of robust PCA methods, such as Hubert and Grid, led to the identification of novel markers associated with yield and spike weight traits. Q-Q plots demonstrated good agreement between observed and expected values, indicating controlled type I error in the GWAS analysis.

The gene annotation and pathway analysis of the identified markers revealed their involvement in crucial biological processes and molecular functions. The identified genes were found to be associated with processes such as protein processing, defense response, regulation of transcription, and DNA template. The enriched pathways included biosynthesis of flavonoids, carotenoids, and secondary metabolites, metabolic pathways, ubiquitin-mediated proteolysis, and plant-pathogen interaction. These findings provide valuable insights into the genetic mechanisms underlying wheat spike traits and their potential relevance to stress responses and plant development.

Conclusion

This study highlights the significance of PCA and rPCA in GWAS for understanding the genetic basis of spike traits in wheat. The incorporation of robust PCA methods mitigates the effects of outliers and improves the accuracy of population structure estimation. The identified MTAs and their association with important traits provide valuable information for wheat breeding programs aimed at improving crop productivity and stress tolerance. The gene annotation and pathway analysis offer insights into the biological processes involved in spike development and stress responses. The findings from this study contribute to our understanding of the genetic architecture of complex traits in wheat and demonstrate the potential of PCA-based approaches in GWAS. Future research can build upon these findings to further unravel the genetic basis of wheat spike traits and enhance breeding strategies for sustainable agriculture.

If you need more information on this article, click on the link below where we reference how we obtained the information for this article.

Abdi, Hossein, et al. “Identification of Novel Putative Alleles Related to Important Agronomic Traits of Wheat Using Robust Strategies in Gwas.” Nature News, 19 June 2023, www.nature.com/articles/s41598-023-36134-z.

Previous Article

DNA and Gene Chip Market: Robust Growth Expected as Applications Expand

Next Article

Novel Tool GASPACHO Uncovers Gene Variant Linked to COVID-19 Susceptibility

You might be interested in …