Session 5: Rare variants

Practical session

This workshop will explore rare variant analysis using the STAAR pipeline.

Practical Overview:

Rare variant association analysis of WGS data
WGS data preparation and annotation
PC / Sparse Genetic Relatedness Matrix (GRM) generation
Functionally-informed rare variant association analysis
Conditional analysis
Annotating rare variant analysis results

Practical Goals:

To prepare 1000G WGS data in a Genomic Data Structure (GDS) format
To functionally annotate 1000G WGS data into an annotated GDS (aGDS) format
To generate ancestral principal components and sparse GRM (using FastSparseGRM)
To perform a functionally-informed rare variant analysis using linear mixed model
To perform conditional analysis for a significant rare variant set (mask) of interest
To annotate a significant rare variant set (mask) of interest

Data Used

1000G High Coverage WGS: https://pubmed.ncbi.nlm.nih.gov/36055201/
1000G Phase 3: https://www.nature.com/articles/nature15393
1000G Original: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498066/

Programs Used

GDS: https://academic.oup.com/bioinformatics/article/33/15/2251/3072873
aGDS: https://academic.oup.com/nar/article/51/D1/D1300/6814464
FastSparseGRM: https://github.com/rounakdey/FastSparseGRM
STAARpipeline: https://www.nature.com/articles/s41592-022-01640-x

Workshop Material

Due to the time and resource limit of the workshop, we are not going to do a live demo of the preprocessing steps. Please refer to the README.md file for the precoessing steps and the 1000G_scripts_part1 folder for the preprocessing steps.

In this workshop, we will be focusing on the R scripts in 1000G_scripts_part2

Let’s get started

Please go ahead and open the 05_RareVariants.ipynb Google colab notebook.