A Nested Machine- and Statistical-Learning Approach to Discover Causal Variants

Perform fine-mapping of causal variants for eQTL and GWAS studies. It searches cis-acting co-regulatory variants in 4Mbps gene-flanking regions and in gene bodies.

  • Account for multi-site effects that are commonly observed in quantitative traits or complex phenotypes.
  • Account for linkage disequilibrium between causal variants and tagging variants.
  • Computationally efficient to perform genome-wide scanning of long-range eQTL.
  • Cluster-aware for parallel processing.
  • R implementation of the TreeMap algorithm is available at https://github.com/liliulab/treemap.
Manuscript is currently under review in Bioinformatics.

ABSTRACT: Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression.