Abstract
Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements. Visualization |
DATA SOURCE
- Download URL for browser tracks: http://chromovar3d.stanford.edu/browserTracks/
   
Show local QTLs
Show distal QTLs
Show signal tracks
Show HiC (takes several minutes to visualize according to your computer resources)
Show ChiA-PET
Open in a new page (deactivate pop-up blockers)
Legend for QTL tracks: The QTL tracks contain all associations passing the p-value threshold corresponding to a 10% FDR. Each association is represented in the "interaction" format, that is, as a link between the QTL SNP (1 base) and the peak. The scores (colors of the links) represent the association effect sizes.
Data overview
Our data are organized in the following directories:
Data processing: Personal genomes, Histone marks, DNaseI-seq, RNAseq, HiC, ChIA-PET QTL analysis: QTLs Other analyses: Motif analysis/TF binding, GWAS Raw data: GEO Code: Code
Quick start
The most common use of this portal is using the QTL analysis. For that, all you need to download is the QTL directory at http://chromovar3d.stanford.edu/QTLs/
http://chromovar3d.stanford.edu/QTLs/ has the following contents (see READMEs in each of these for more detailed descriptions of the files):Genomic regions
- QTLs/peak_info - Files containing information about the peaks used in this study. Specifically, these files assign peak IDs to peaks in this study, and these peak IDs are referenced in the localQTL and distalQTL analysis. These peaks and genes represent the rows in the data matrices below. The peak IDs follow the order or peaks (or genes) in the data matrices from the subdirectory uncorrectedSignal. There are 2 subdirectories:
- QTLs/peak_info/total_peaks - All genomic regions (peaks or genes) for which we have computed data matrices for QTL calling. These data matrices are in QTLs/results/uncorrrectedSignal/. Each peak is associated with an ID that is referenced in our analyses below (for instance, the row names of the PEER-corrected data matrices in QTLs/results/correctedSignal/ are the peak IDs.
- QTLs/peak_info/tested_peaks - Genomic regions (peaks or genes) that were used in the QTL calling. They are fewer than the total regions, because we used a filter on peak variability for testing it for being a QTL (see Methods). We provide raw signal data matrices for the complete set in total_peaks, so you can start from these and perform your analysis, if you wish to not impose the variability filter (our filter on variability might not be the best filter).
Data matrices
- QTLs/uncorrectedSignal - the initial data matrices, obtained by computing the mean signal inside histone mark peaks, DNaseI(dhs) peaks or gene expression estimates.
- QTLs/correctedSignal - the same matrices as in uncorrectedSignal, but after removing confounding factors using PEER. For the histone marks, the data matrices in uncorrectedSignal and correctedSignal have the same dimensions. For dhs, the data matrix in uncorrectedSignal contains the signal in all 700k dhs merged peaks we called. However, the data matrix in correctedSignal only contains signal from the top 200k peaks (ranked by signal intensity). For RNA, the data matrix from uncorrectedSignal contains signal in all protein-coding and lncRNA genes used in this study, whereas the data matrix in correctedSignal only contains signal for those genes that did not have 0 entries in the uncorrected matrix.
- To get the coordinates of the genomic regions in the correctedSignal matrices, use the first column, which is the peak/gene ID, as referenced in peak_info/total_peaks/
QTLs
- QTLs/localQTL - Bedpe files with the results from the local QTL analysis
- QTLs/distalQTL - Bedpe files with the results from the distal QTL analysis