DCEG Statistical Genetics Workshop

Next Generation Statistical Methods for Genome Wide Association Studies: A Hands-On Course

Background

Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic basis of complex traits and diseases. In the early years of GWAS, data analysis primarily relied on relatively simplistic methods, such as running millions of univariate linear or logistic regressions, one for each genetic variant. However, as the sample sizes for some GWAS have become extremely large and various types of other genomic data have become widely available, analysis of such data has also become much more complex and statistically sophisticated. The availability of summary statistics from GWAS of hundreds of traits can increase power and provide etiological insights, but these data present analytical and interpretive challenges. In addition, as researchers study more diverse populations, new methods for analyses accounting for diverse genetic ancestries and leveraging recently admixed populations are required. There are tremendous opportunities to address new types of scientific questions using existing or anticipated genome-wide genotyping data combined with emerging genomic annotations and epigenomic, transcriptomic, proteomic, metabolomic and other ‘omics’ data. To realize this promise it is critical that GWAS pipelines follow FAIR principles to optimize re-usability and reproducibility. This includes building pipelines that are modular, versionable and executable, with all the necessary metadata for others to run them in different computational environments, as well as sharing data (individual and summary results) following community standards for metadata and format and consistent with participants’ consent and privacy. Additionally, ensuring data security and implementing appropriate access control measures are essential.

Course description

This course will provide researchers and analysts with a review of cutting-edge statistical methods and hands-on tutorials for analyzing large-scale genome-wide genetic data. The tutorials will be provided in a reproducible compute environment with R and commonly used GWAS tools. Topics include methods for complex association testing (for common and rare variants); inference on genetic architecture using mixed model techniques; development of polygenic risk scores, methods for understanding causal mechanisms using Mendelian randomization; integrative genomic analyses; analyses in ancestry-diverse and admixed populations; analyses of genetic mosaicism and clonal hematopoiesis; and functional follow-up of genetic association studies.

Course format

The course will consist of nine sessions held from September to December of 2023. Sessions will be held on Wednesdays from 9:30 to 12:30 will include a lecture (1.25 hours, including Q&A) and a 1.5-hour practical tutorial. (See schedule below for specific dates.) Participants are expected to complete background reading before each session (estimated out-of-class time: < 2 hours) and hands-on exercises after each session (estimated out-of-class time: < 2 hours). The course will be hybrid with both in-person and online participants, and all lectures will be recorded and archived for future use. Practical tutorials will be in-person at the Shady Grove NCI campus.

Intended audience

Researchers and analysts with strong quantitative background who are involved or anticipate being involved in analysis of large-scale genome-wide genotyping data. Participants should have basic knowledge of epidemiologic study designs, genetic concepts and terminologies, and statistical methodologies (e.g., hypothesis testing, parameter estimation, regression models, Bayes probability), as well as familiarity with R and command-line interfaces.

By the end of the course, participants will have gained a deep understanding of advanced statistical methods and computational tools for analyzing GWAS data, and will be able to apply these methods to their own research. They will also be familiar with best practices for data management and sharing in GWAS, and will be able to produce reproducible and FAIR-compliant pipelines.

Schedule

Please check here for the latest schedule.

Discussions

We invite everyone to utilize the GitHub discussion forum to pose questions, connect with peers, and discuss course-related topics.

Latest Posts

Statistical Genetics Workshop Announcement

DCEG Statistical Genetics Workshop schedule for fall 2023

Location: Rm 1106-A/B at the CRL Building, 9615 Medical Center Drive, Rockville, MD 20892/online

Time: 9:30-12:30 EST