Session 5: Rare variants

Practical session

This workshop will explore rare variant analysis using the STAAR pipeline.

Practical Overview:

  • Rare variant association analysis of WGS data
  • WGS data preparation and annotation
  • PC / Sparse Genetic Relatedness Matrix (GRM) generation
  • Functionally-informed rare variant association analysis
  • Conditional analysis
  • Annotating rare variant analysis results

Practical Goals:

  • To prepare 1000G WGS data in a Genomic Data Structure (GDS) format
  • To functionally annotate 1000G WGS data into an annotated GDS (aGDS) format
  • To generate ancestral principal components and sparse GRM (using FastSparseGRM)
  • To perform a functionally-informed rare variant analysis using linear mixed model
  • To perform conditional analysis for a significant rare variant set (mask) of interest
  • To annotate a significant rare variant set (mask) of interest

Data Used

  • 1000G High Coverage WGS: https://pubmed.ncbi.nlm.nih.gov/36055201/
  • 1000G Phase 3: https://www.nature.com/articles/nature15393
  • 1000G Original: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3498066/

Programs Used

  • GDS: https://academic.oup.com/bioinformatics/article/33/15/2251/3072873
  • aGDS: https://academic.oup.com/nar/article/51/D1/D1300/6814464
  • FastSparseGRM: https://github.com/rounakdey/FastSparseGRM
  • STAARpipeline: https://www.nature.com/articles/s41592-022-01640-x

Workshop Material

Due to the time and resource limit of the workshop, we are not going to do a live demo of the preprocessing steps. Please refer to the README.md file for the precoessing steps and the 1000G_scripts_part1 folder for the preprocessing steps.

In this workshop, we will be focusing on the R scripts in 1000G_scripts_part2

Let’s get started

Please go ahead and open the 05_RareVariants.ipynb Google colab notebook.