Last updated: 2021-06-06

Checks: 7 0

Knit directory: rss-net/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20190823) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version fc71df0. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    examples/ibd2015_nkcell/ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat
    Ignored:    examples/ibd2015_nkcell/randinit_results/
    Ignored:    examples/ibd2015_nkcell/results/
    Ignored:    examples/wtccc_bcell/results/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (rmd/ibd2015_nkcell.Rmd) and HTML (docs/ibd2015_nkcell.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd fc71df0 xiangzhu 2021-06-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 771dcad xiangzhu 2021-06-06 Build site.
Rmd a229ad9 xiangzhu 2021-06-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html a3be5c3 xiangzhu 2021-06-06 Build site.
Rmd afe8479 xiangzhu 2021-06-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html efd45b7 xiangzhu 2021-06-06 Build site.
html ce99060 xiangzhu 2020-12-02 Build site.
html a7ed39f xiangzhu 2020-11-30 Build site.
html 93ce892 xiangzhu 2020-06-29 Build site.
html 76bc401 xiangzhu 2020-03-15 Build site.
Rmd a6069aa xiangzhu 2020-03-15 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 4f31597 xiangzhu 2020-03-09 Build site.
Rmd 820a508 xiangzhu 2020-03-09 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html a424171 xiangzhu 2020-03-08 Build site.
html 5c5e8e1 xiangzhu 2020-03-08 Build site.
html 3af1c6c xiangzhu 2020-03-06 Build site.
Rmd 9dbe6b7 xiangzhu 2020-03-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 8d17f94 xiangzhu 2020-03-06 Build site.
Rmd 2d37013 xiangzhu 2020-03-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 7e923a9 xiangzhu 2020-03-06 Build site.
Rmd 6bf894e xiangzhu 2020-03-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 36d2ce3 xiangzhu 2020-03-06 Build site.
Rmd bcc6e4a xiangzhu 2020-03-06 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html ada5d78 xiangzhu 2020-03-05 Build site.
Rmd 1753ab5 xiangzhu 2020-03-05 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 4310e9f xiangzhu 2020-03-05 Build site.
Rmd 793e39e xiangzhu 2020-03-05 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 269dbf1 xiangzhu 2020-03-05 Build site.
Rmd dccb92c xiangzhu 2020-03-05 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 34ece5f xiangzhu 2020-03-05 Build site.
Rmd de5a209 xiangzhu 2020-03-05 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 19bd6c5 xiangzhu 2020-03-05 Build site.
Rmd b464c14 xiangzhu 2020-03-05 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 2a47740 xiangzhu 2020-03-04 Build site.
Rmd ea4499b xiangzhu 2020-03-04 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html e454742 xiangzhu 2020-03-04 Build site.
Rmd 08f4c53 xiangzhu 2020-03-04 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 72d8e78 xiangzhu 2020-03-04 Build site.
Rmd 6b97ae5 xiangzhu 2020-03-04 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 0e9e3a2 xiangzhu 2020-03-03 Build site.
Rmd eb9c073 xiangzhu 2020-03-03 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
Rmd 6002e97 Xiang Zhu 2020-03-03 update ibd2015 nkcell example
html 6002e97 Xiang Zhu 2020-03-03 update ibd2015 nkcell example
html 5974d28 xiangzhu 2020-03-03 Build site.
Rmd 4a13f45 xiangzhu 2020-03-03 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html bf742e0 xiangzhu 2020-03-03 Build site.
Rmd a16d453 xiangzhu 2020-03-03 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 785a423 xiangzhu 2020-03-01 Build site.
Rmd bda7382 xiangzhu 2020-03-01 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html 830b581 xiangzhu 2020-03-01 Build site.
Rmd a5b592e xiangzhu 2020-03-01 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)
html c3ca100 xiangzhu 2020-03-01 Build site.
Rmd 2e5247b xiangzhu 2020-03-01 wflow_publish(“rmd/ibd2015_nkcell.Rmd”)

Overview

Here we describe an end-to-end RSS-NET analysis of inflammatory bowel disease (IBD) GWAS summary statistics (Liu et al, 2015) and a gene regulatory network inferred for natural killer (NK) cells (ENCODE Project Consortium, 2012). This example illustrates the actual data analyses performed in Zhu et al (2021).

To reproduce results of this example, please use scripts in the directory ibd2015_nkcell/, and follow the step-by-step guide below. Before running any script in ibd2015_nkcell/, please install RSS-NET.

Since a real genome-wide analysis is conducted here, this example is more complicated than the previous simulation example. It is advisable to go through the previous simulation example before diving into the present real-world example.

With the software installed and the input data downloaded, one should be able to run this example by simply typing the following line in shell:

$ sbatch ibd2015_nkcell.sbatch

If a different directory is used to store the software, input data and/or output data, please modify file paths in the given scripts accordingly.

Step-by-step illustration

Download data files

All data files required to run this example are freely available at Zenodo DOI. Please contact me (xiangzhu[at]psu.edu) if you have any trouble accessing these files. After a complete download, you should see the following files.

$ tree . 
.
├── ibd2015_gene_grch37.mat
├── ibd2015_nkcell_data.md5
├── ibd2015_nkcell_full_results.zip
├── ibd2015_nkcell_summary_results.zip
├── ibd2015_NK_snp2gene_cis.mat
├── ibd2015_null_seed_459_squarem_step2.mat
├── ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
├── ibd2015_snp2gene.mat
├── ibd2015_sumstat.mat
└── Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat

0 directories, 10 files

To help readers confirm if they use the same files as we do, we report 128-bit MD5 hashes of all files in ibd2015_nkcell_data.md5.

To help readers confirm if they can reproduce results of this example, we also provide the full results (ibd2015_nkcell_full_results.zip) and summarized results (ibd2015_nkcell_summary_results.zip) in the same Zenodo deposit.

For simplicity and generality, we introduce the following short-hand notations.

gwas = 'ibd2015';
net  = 'Primary_Natural_Killer_cells_from_peripheral_blood'; 
cis  = 'NK';

1. ${gwas}_sumstat.mat: processed GWAS summary statistics and LD matrix estimates

This file contains processed IBD GWAS summary statistics and LD matrix estimates for 1.1 million common SNPs. This file is large (43G) because of the LD matrix .

$ md5sum ibd2015_sumstat.mat
ad1763079ee7e46b21722f74e037a230  ibd2015_sumstat.mat

$ du -sh ibd2015_sumstat.mat                                             
43G ibd2015_sumstat.mat

Let’s look at the contents of ibd2015_sumstat.mat.

>> sumstat = matfile('ibd2015_sumstat.mat');
>> sumstat
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_sumstat.mat'
    Properties.Writable: false                                                          
                     BR: [22x1 cell]                                                    
                      R: [22x1 cell]                                                    
                  SiRiS: [22x1 cell]                                                    
                betahat: [22x1 cell]                                                    
                    chr: [22x1 cell]                                                    
                    pos: [22x1 cell]                                                    
                     se: [22x1 cell]

GWAS summary statistics and LD estimates are stored as cell arrays. RSS-NET only uses the following variables:

  • betahat{j,1}, single-SNP effect size estimates of all SNPs on chromosome j;
  • se{j,1}, standard errors of betahat{j, 1};
  • chr{j,1} and pos{j, 1}, physical positions of these SNPs (GRCh37 build);
  • SiRiS{j,1}, a sparse matrix, defined as repmat((1./se),1,p) .* R .* repmat((1./se)',p,1), where R is the estimated LD matrix of these p SNPs.

2. ${gwas}_snp2gene.mat: physical distance between SNPs and genes

This file contains the physical distance between each GWAS SNP and each protein-coding gene, within 1 Mb. This file corresponds to \({\bf G}_j\) in the RSS-NET model.

$ md5sum ibd2015_snp2gene.mat 
7832838e2675e4cf3b85f471fed95554  ibd2015_snp2gene.mat

$ du -sh ibd2015_snp2gene.mat 
224M    ibd2015_snp2gene.mat

In this example, there are 18334 genes and 1081481 SNPs.

>> snp2gene = matfile('ibd2015_snp2gene.mat');
>> snp2gene                                   
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_snp2gene.mat'
    Properties.Writable: false                                                                
                    chr: [1081481x1  int32]                                                   
                  colid: [14126805x1 int32]                                                   
                numgene: [1x1        int32]                                                   
                 numsnp: [1x1        int32]                                                   
                    pos: [1081481x1  int32]                                                   
                  rowid: [14126805x1 int32]                                                   
                    val: [14126805x1 double]

>> [snp2gene.numgene snp2gene.numsnp]                                                              
     18334   1081481

The SNP-to-gene distance information is captured by a three-column matrix [colid rowid val]. For example, the distance between gene 1 and SNP 6 is 978947 bps.

>> colid=snp2gene.colid; rowid=snp2gene.rowid; val=snp2gene.val;
>> [colid(6) rowid(6) val(6)]
        1        6   978947

3. ${net}_gene2gene.mat: gene regulatory network

This file contains information of gene-to-gene connections in a given regulatory network.

$ md5sum Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat 
35ac724b86f7777d87116cc48166caa2  Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat

$ du -sh Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
1.7M    Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat
>> gene2gene = matfile('Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat');
>> gene2gene
  matlab.io.MatFile
  Properties:
      Properties.Source: 'Primary_Natural_Killer_cells_from_peripheral_blood_gene2gene.mat'
    Properties.Writable: false                                                                                                            
                  colid: [110733x1 int32]                                                                                                 
                numgene: [1x1      int32]                                                                                                 
                  rowid: [110733x1 int32]                                                                                                 
                    val: [110733x1 double]

For implementation convenience, this file contains the trivial case where each gene is mapped to itself with val=1.

>> colid=gene2gene.colid; rowid=gene2gene.rowid; val=gene2gene.val;             
>> [gene2gene.numgene sum(colid==rowid) unique(val(colid==rowid))]
   18334   18334       1

For a given network, transcription factors (TFs) are stored in rowid and target genes (TGs) are stored in colid. In this example there are 3105 TGs and 376 TFs. Among these TFs and TGs, there are 92399 edges. The edge weights range from 0.61 to 1. These TF-to-TG connections and edge weights correspond to \(\{{\bf T}_g,v_{gt}\}\) in the RSS-NET model.

>> [length(unique(colid(colid ~= rowid))) length(unique(rowid(colid ~= rowid)))]
        3105         376
>> [length(colid(colid ~= rowid)) length(rowid(colid ~= rowid))]
       92399       92399

>> val_tftg = val(colid ~= rowid);
>> [min(val_tftg) quantile(val_tftg, 0.25) median(val_tftg) quantile(val_tftg, 0.75) max(val_tftg)]
    0.6138    0.6324    0.6568    0.6949    1.0000       

4. ${gwas}_${net}_snp2net.mat: SNP-to-network proximity annotation

This file contains binary annotations whether a SNP is “near” the given network, that is, within 100 kb of any network element (TF, TG or associated regulatory elements).

$ md5sum ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
d96cd9b32759f954cc37680dc6aeafd8  ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat

$ du -sh ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat
21M ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat

In this example, there are 1081481 GWAS SNPs and 382443 of them are near the NK cell network (i.e. val=1).

>> snp2net = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat');
>> snp2net
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_snp2net.mat'
    Properties.Writable: false
                    chr: [1081481x1 int32]
                    pos: [1081481x1 int32]
                  snpid: [1081481x1 int32]
                    val: [1081481x1 double]
                 window: [1x1       double]
                 
>> [length(snp2net.val) sum(snp2net.val) snp2net.window]
     1081481      382443      100000
>> unique(snp2net.val)'
     0     1     

5. ${gwas}_${cis}_snp2gene_cis.mat: SNP-to-gene cis regulation

This file contains the SNP-to-gene cis regulation scores derived from context-matching cis eQTL studies. This file corresponds to \((c_{jg}-1)\) in the RSS-NET model.

$ md5sum ibd2015_NK_snp2gene_cis.mat
dedad8e25773fad69576dbce0f7d9f93  ibd2015_NK_snp2gene_cis.mat

$ du -sh ibd2015_NK_snp2gene_cis.mat
165M    ibd2015_NK_snp2gene_cis.mat
>> snp2gene_cis = matfile('ibd2015_NK_snp2gene_cis.mat');
>> snp2gene_cis
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_NK_snp2gene_cis.mat'
    Properties.Writable: false
                  colid: [10790012x1 int32]
                numgene: [1x1        int32]
                 numsnp: [1x1        int32]
                  rowid: [10790012x1 int32]
                    val: [10790012x1 double]

In this example, there are 10790012 SNP-gene pairs with cis regulation scores available, consisting of 829280 SNPs and 18230 genes. The cis regulation scores (val) range from 0 to 0.76. The cis regulation scores used in this example are derived from recently published cis eQTLs in NK cells (Schmiedel et al, 2018).

>> [snp2gene_cis.numsnp length(unique(snp2gene_cis.rowid)) snp2gene_cis.numgene length(unique(snp2gene_cis.colid))]
   1081481    829280     18334     18230
   
>> val=snp2gene_cis.val;
>> [min(val) quantile(val, 0.25) median(val) quantile(val, 0.75) max(val)]
         0    0.0226    0.0636    0.1172    0.7622   

Run RSS-NET analysis

To facilitate running RSS-NET on real data, we provide a generic script analysis_template.m. For the present example, the RSS-NET analysis is implemented by ibd2015_nkcell.m and ibd2015_nkcell.sbatch.

1. Specify analysis-specific variables

We need to specify a few analysis-specific variables that are required by analysis_template.m, a template script that fits RSS-NET to the given data. For this example, we use ibd2015_nkcell.m for the specification. In brief, the following variables are specified.

  • Data names: gwas_name, net_name, cis_name;
  • GWAS sample size and number of genes: nsam, ngene;
  • Hyper-parameter grid: eta_set, rho_set, theta0_set, theta_set.

In general, if you want to use RSS-NET to analyze a different GWAS and/or network, simply modify ibd2015_nkcell.m and there is no need to change analysis_template.m.

2. Submit job arrays

For a given GWAS and a given regulatory network, all RSS-NET analysis tasks are almost identical and they only differs in hyper-parameter values. To exploit this feature, we run one RSS-NET analysis as a job array with multiple tasks that run in parallel.

To this end, we write a simple sbatch script ibd2015_nkcell.sbatch, and submit it to a cluster with Slurm available.

$ sbatch ibd2015_nkcell.sbatch

After the submission, 125 tasks are created. As shown below, multiple tasks should run in different nodes simultaneously.

$ squeue -u xiangzhu
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      62554249_107    owners ibd2015_ xiangzhu  R       0:32      1 sh02-17n12
      62554249_108    owners ibd2015_ xiangzhu  R       0:32      1 sh02-17n12
      62554249_109    owners ibd2015_ xiangzhu  R       0:32      1 sh01-28n08
      62554249_110    owners ibd2015_ xiangzhu  R       0:32      1 sh01-17n18
      62554249_111    owners ibd2015_ xiangzhu  R       0:32      1 sh01-26n33
      62554249_112    owners ibd2015_ xiangzhu  R       0:32      1 sh01-27n30

For each task of this job array, we request 1 node with 8 CPUs and 32 Gb total memory and set the maximum job wall-clock time as 12.5 hours (see ibd2015_nkcell.sbatch for details). The actual memory utilized per task is 26.58 GB (efficiency: 85.06% of 31.25 GB). Across all 125 tasks, the actual running time per task ranges from 5 minutes to 8.9 hours, with median being 2.1 hours.

We request 8 CPUs for each task because RSS-NET takes advantage of parfor in the MATLAB Parallel Computing Toolbox. If this toolbox is not available in your environment, you can still run the same RSS-NET codes on this example (in a serial manner), with longer computation time per task.

Each task of the job array saves results in a Version 7 MAT-file, ${gwas}_${net}_${cis}_out_${id}.mat. Each MAT-file contains variational estimates for a given set of hyper-parameter values. For example, the following MAT-file ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat stores RSS-NET results based on the 66-th set of hyper-parameter values from the grid.

>> res = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat');
>> res
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_out_66.mat'
    Properties.Writable: false                                                                                                                            
                  alpha: [1081481x1 double]                                                                                                               
                   logw: [1x1       double]                                                                                                               
                     mu: [1081481x1 double]                                                                                                               
               run_time: [1x1       double]                                                                                                               
                      s: [1081481x1 double]                                                                                                               
                   sigb: [1x1       double]                                                                                                               
                   sige: [1x1       double]                                                                                                               
                  theta: [1x1       double]                                                                                                               
                 theta0: [1x1       double]

Here [alpha mu s] correspond to the optimal variational parameters \(\{\alpha_j^\star,\mu_j^\star,(\tau_j^\star)^2\}\) for the given hyper-parameters, logw corresponds to the variational lower bound \(F^\star\) and [theta0 theta sigb sige] corresponds to \(\{\theta_0,\theta,\sigma_0,\sigma\}\). Please see Supplementary Information of Zhu et al (2021) for definitions.

Summarize RSS-NET results

The RSS-NET result files ${gwas}_${net}_${cis}_out_${id}.mat shown above can be further summarized into both network-level and gene-level statistics as reported in Zhu et al (2021). To facilitate summarizing RSS-NET results, we provide a generic script summary_template.m. For the present example, the RSS-NET summary is implemented by summarize_ibd2015_nkcell.m. If you need to summarize a different RSS-NET analysis, simply modify summarize_ibd2015_nkcell.m and there is no need to change summary_template.m.

For this example, simply run the following line in a Matlab console.

>> run summarize_ibd2015_nkcell.m;

Running summarize_ibd2015_nkcell.m yields two (much smaller) MAT-files: ${gwas}_${net}_${cis}_results_model.mat that stores network-level enrichment results, and ${gwas}_${net}_${cis}_results_gene.mat that stores locus-level association results.

1. ${gwas}_${net}_${cis}_results_model.mat: network-level enrichments

To assess whether a regulatory network is enriched for genetic associations with a trait, we evaluate a Bayes factor (BF) comparing the baseline model (\(M_0:\theta=0~\text{and}~\sigma^2=0\)) in RSS-NET with an enrichment model.

>> model_res = matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_modet');
>> model_res
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat'
    Properties.Writable: false                                                                                                                                                    
               log10_bf: [1x1   double]                                                                                                                                           
            log10_bf_ns: [1x1   double]                                                                                                                                           
            log10_bf_nt: [1x1   double]                                                                                                                                           
            log10_bf_ts: [1x1   double]                                                                                                                                           
                   logw: [125x1 double]                                                                                                                                           
                   sigb: [125x1 double]                                                                                                                                           
                   sige: [125x1 double]                                                                                                                                           
                  theta: [125x1 double]                                                                                                                                           
                 theta0: [125x1 double]                                                                                                                                           
                   time: [125x1 double]

Here log10_bf* are log 10 BFs comparing the following 4 enrichment models against \(M_0\).

  • log10_bf: \(M_1:\theta>0~\text{or}~\sigma^2>0\);
  • log10_bf_ns: \(M_{11}:\theta>0~\text{and}~\sigma^2=0\);
  • log10_bf_nt: \(M_{12}:\theta=0~\text{and}~\sigma^2>0\);
  • log10_bf_ts: \(M_{13}:\theta>0~\text{and}~\sigma^2>0\).

Because \(M_1\) is more flexible than other models, we mainly use log10_bf as recommended by Zhu et al (2021).

By running this example, we reproduce the enrichment BFs of IBD GWAS and NK cell network reported in Zhu et al (2021). The NK cell network shows strong enrichment of IBD genetic associations, which seems consistent with the role of NK cell in autoimmune diseases like IBD.

>> load ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat
>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
   35.7048   29.4216   15.1461   35.8986

Let’s perform a more rigorous check of reproducibility. For the same hyper-parameter values, we compare the resulting variational lower bounds from my previous run and from the current run. Differences are numerical negligible.

>> res_file = 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_model.mat';               
>> old_path = '~/Dropbox/rss/Data/peca_human/job_camp/gwas33_net71/results/model_results/peca_encode/ibd2015/';
>> new_path = strcat(src_path,'rss-net/examples/ibd2015_nkcell/results/');                                             
>> old_res = matfile(strcat(old_path,res_file));                                                               
>> new_res = matfile(strcat(new_path,res_file));

>> [min(old_res.logw - new_res.logw) median(old_res.logw - new_res.logw) max(old_res.logw - new_res.logw)]
   1.0e-10 *
   -0.0568         0    0.1592

2. ${gwas}_${net}_${cis}_results_gene.mat: locus-level associations

To summarize association between a locus and a trait, we compute \(P_1^{\sf net}\), the posterior probability that at least one SNP \(j\) in the locus is associated with the trait (\(\beta_j\neq 0\)): \[ P_1^{\sf net}=1-\Pr(\beta_j=0,~\forall j\in\text{locus}~|~\text{data},M_1). \] As in Zhu et al (2021), a locus is defined as the transcribed region of a gene plus 100 kb upstream and downstream. The locus definition is provided in ibd2015_gene_grch37.mat.

>> gene_res=matfile('ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat');
>> gene_res
  matlab.io.MatFile
  Properties:
      Properties.Source: 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat'
    Properties.Writable: false                                                                                                                                                   
                P1_gene: [18334x1 double]                                                                                                                                        
               gene_chr: [18334x1 double]                                                                                                                                        
             gene_start: [18334x1 double]                                                                                                                                        
              gene_stop: [18334x1 double]

Here P1_gene corresponds to \(P_1^{\sf net}\) and [gene_chr gene_start gene_stop] denote physical position of genes based on GRCh37.

Let’s perform a reproducibility check for gene-level results. Again, my previous analysis and the current analysis yield numerically identical answers.

>> res_file = 'ibd2015_Primary_Natural_Killer_cells_from_peripheral_blood_NK_results_gene.mat'; 
>> old_path = '~/Dropbox/rss/Data/peca_human/job_camp/gwas33_net71/results/gene_results/peca_encode/ibd2015/'; 
>> new_path = strcat(src_path,'rss-net/examples/ibd2015_nkcell/results/');
>> old_res = matfile(strcat(old_path,res_file));                                                              
>> new_res = matfile(strcat(new_path,res_file));

>> [min(old_res.P1_gene - new_res.P1_gene) median(old_res.P1_gene - new_res.P1_gene) max(old_res.P1_gene - new_res.P1_gene)]                  
   1.0e-15 *
   -0.2220    0.0278    0.5551

More examples

The RSS-NET analyses of 18 complex traits and 38 gene regulatory networks reported in Zhu et al (2021) are essentially 684 modified reruns of the example above (with different input GWAS and/or network data, and different hyper-parameter grids). Our full analysis results are publicly available at https://suwonglab.github.io/rss-net/results.html.

Appendix

Careful readers may notice an optional input data file ibd2015_null_seed_459_squarem_step2.mat specified in ibd2015_nkcell.m and used in analysis_template.m. This file provides the RSS-E baseline model fitting results of IBD GWAS data. If this file is available, analysis_template.m uses the optimal RSS-E baseline model results to initialize RSS-NET. If not, RSS-NET uses a random initialization.

For this example, here are the enrichment BFs based on using the optimal RSS-E baseline model results to initialize RSS-NET.

>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
   35.7048   29.4216   15.1461   35.8986

Here are the enrichment BFs based on random initialization, which are consistent with, but smaller than previous results.

>> [log10_bf log10_bf_ns log10_bf_nt log10_bf_ts]
   32.1495   29.5295   11.0758   32.3432

Using optimal RSS-E baseline model results to initialize RSS-NET is not required, but highly recommended in practice, because this often yields a better fit as shown above. Please see this tutorial for more details of fitting RSS-E baseline model on GWAS data.


devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.0 (2021-05-18)
 os       Ubuntu 20.04.2 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2021-06-06                  

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date       lib source        
 bslib         0.2.5.1 2021-05-18 [3] CRAN (R 4.1.0)
 cachem        1.0.5   2021-05-15 [3] CRAN (R 4.0.5)
 callr         3.7.0   2021-04-20 [3] CRAN (R 4.0.5)
 cli           2.5.0   2021-04-26 [3] CRAN (R 4.0.5)
 crayon        1.4.1   2021-02-08 [3] CRAN (R 4.0.3)
 desc          1.3.0   2021-03-05 [3] CRAN (R 4.0.4)
 devtools      2.4.1   2021-05-05 [3] CRAN (R 4.0.5)
 digest        0.6.27  2020-10-24 [3] CRAN (R 4.0.3)
 ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.0.5)
 evaluate      0.14    2019-05-28 [3] CRAN (R 4.0.0)
 fansi         0.5.0   2021-05-25 [3] CRAN (R 4.1.0)
 fastmap       1.1.0   2021-01-25 [3] CRAN (R 4.0.3)
 fs            1.5.0   2020-07-31 [3] CRAN (R 4.0.2)
 git2r         0.28.0  2021-01-10 [1] CRAN (R 4.1.0)
 glue          1.4.2   2020-08-27 [3] CRAN (R 4.0.2)
 htmltools     0.5.1.1 2021-01-22 [3] CRAN (R 4.0.3)
 httpuv        1.6.1   2021-05-07 [3] CRAN (R 4.0.5)
 jquerylib     0.1.4   2021-04-26 [3] CRAN (R 4.0.5)
 jsonlite      1.7.2   2020-12-09 [3] CRAN (R 4.0.3)
 knitr         1.33    2021-04-24 [3] CRAN (R 4.0.5)
 later         1.2.0   2021-04-23 [3] CRAN (R 4.0.5)
 lifecycle     1.0.0   2021-02-15 [3] CRAN (R 4.0.4)
 magrittr      2.0.1   2020-11-17 [3] CRAN (R 4.0.3)
 memoise       2.0.0   2021-01-26 [3] CRAN (R 4.0.3)
 pillar        1.6.1   2021-05-16 [3] CRAN (R 4.0.5)
 pkgbuild      1.2.0   2020-12-15 [3] CRAN (R 4.0.3)
 pkgconfig     2.0.3   2019-09-22 [3] CRAN (R 4.0.0)
 pkgload       1.2.1   2021-04-06 [3] CRAN (R 4.0.5)
 prettyunits   1.1.1   2020-01-24 [3] CRAN (R 4.0.0)
 processx      3.5.2   2021-04-30 [3] CRAN (R 4.0.5)
 promises      1.2.0.1 2021-02-11 [3] CRAN (R 4.0.3)
 ps            1.6.0   2021-02-28 [3] CRAN (R 4.0.4)
 purrr         0.3.4   2020-04-17 [3] CRAN (R 4.0.0)
 R6            2.5.0   2020-10-28 [3] CRAN (R 4.0.3)
 Rcpp          1.0.6   2021-01-15 [3] CRAN (R 4.0.5)
 remotes       2.4.0   2021-06-02 [3] CRAN (R 4.1.0)
 rlang         0.4.11  2021-04-30 [3] CRAN (R 4.0.5)
 rmarkdown     2.8     2021-05-07 [3] CRAN (R 4.0.5)
 rprojroot     2.0.2   2020-11-15 [3] CRAN (R 4.0.3)
 sass          0.4.0   2021-05-12 [3] CRAN (R 4.0.5)
 sessioninfo   1.1.1   2018-11-05 [3] CRAN (R 4.0.0)
 stringi       1.6.2   2021-05-17 [3] CRAN (R 4.0.5)
 stringr       1.4.0   2019-02-10 [3] CRAN (R 4.0.0)
 testthat      3.0.2   2021-02-14 [3] CRAN (R 4.0.4)
 tibble        3.1.2   2021-05-16 [3] CRAN (R 4.0.5)
 usethis       2.0.1   2021-02-10 [3] CRAN (R 4.0.3)
 utf8          1.2.1   2021-03-12 [3] CRAN (R 4.0.4)
 vctrs         0.3.8   2021-04-29 [3] CRAN (R 4.0.5)
 whisker       0.4     2019-08-28 [3] CRAN (R 4.0.0)
 withr         2.4.2   2021-04-18 [3] CRAN (R 4.0.5)
 workflowr   * 1.6.2   2020-04-30 [1] CRAN (R 4.1.0)
 xfun          0.23    2021-05-15 [3] CRAN (R 4.0.5)
 yaml          2.2.1   2020-02-01 [3] CRAN (R 4.0.0)

[1] /home/maimaizhu/R/x86_64-pc-linux-gnu-library/4.1
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library