OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

A powerful TWAS framework leveraging summary-level reference data.

We provide an integrated tool that can

perform LD-clumping
prepare the required inputs (LD reference and eQTL summary statistics) in the required formats for P+T, lassosum, SDPR, and PRS-CS
perform TWAS Stage I: run P+T, lassosum, SDPR, and PRS-CS to train eQTL weights.
perform TWAS Stage II: gene-based association test using GWAS summary statistics.

In this tool, we

integrate TABIX and PLINK 1.9 tools to extract input data per target gene more efficiently
integrate PRS-CS software, lassosum R package, and SDPR software to train eQTL weights
enable parallel computation to train GReX imputation models / perform gene-based association test simultaneously for multiple genes.

Latest Updates

January 14, 2025: Added `--lassosum_ld_blocks` argument

Purpose: Select LD blocks for lassosum analysis
Available options (from Berisa & Pickrell (2015)):
- EUR.hg19, ASN.hg19, AFR.hg19
- EUR.hg38, ASN.hg38, AFR.hg38
Default: EUR.hg38

March 29, 2025: Added support for sex chromosomes (`--chrom=X` and `--chrom=Y`)

Note: In OTTERS outputs:
- Chromosome X → labeled as 23
- Chromosome Y → labeled as 24
Example resources:
- ChromX Analysis Script
  - data to run this example analysis

March 31, 2025: Added VCF/VCF.gz support for LD reference panels

New --geno_type options: vcf
Default remains: plink (binary PLINK format)
Note:
- Require bgzip and tabixed vcf files, e.g. Exp_geno.vcf.gz and Exp_geno.vcf.gz.tbi
- For each gene, OTTERS uses the Tabix tool to extract the relevant region from the VCF, and then calls PLINK to convert the extracted vcf into binary PLINK format. See the functions extract_vcf_region_with_tabix and call_PLINK_extract for details.
Example resources:
- Analysis with VCF.gz genotype data
  - geno_dir=Exp_geno.vcf.gz
  - --geno_type=vcf
  - data to run this example analysis

Getting Started

Download and install following required tools, modules, and packages:

Download OTTERS using:

git clone https://github.com/daiqile96/OTTERS.git

BGZIP and TABIX

Here is my code to download and install BGZIP and TABIX:

wget https://sourceforge.net/projects/samtools/files/tabix/tabix-0.2.6.tar.bz2
tar jxvf tabix-0.2.6.tar.bz2
cd tabix-0.2.6
make

# copy the binary bgzip and tabix to my path: ~/projects/bin
cp bgzip tabix ~/projects/bin

# test if BGZIP and TABIX are successfully installed
export PATH=~/projects/bin:$PATH 
bgzip
tabix

PLINK 1.9

Here is my code to download and install PLINK 1.9:

# Download the latest binary of PLINK 1.9
wget https://s3.amazonaws.com/plink1-assets/plink_linux_x86_64_20220402.zip

# Unzip the archive
unzip plink_linux_x86_64_20220402.zip -d ~/projects/bin

# Remove archive
rm plink2_linux_x86_64_latest.zip

# Test if PLINK 1.9 is successfully installed
export PATH=~/projects/bin:$PATH 
plink

To permanently store the path, please add export PATH=~/projects/bin:$PATH to your ~/.bash_profile (or ~/.bashrc).

Then reload file .bash_profile from the command line:
```
source ~/.bash_profile
```
Python modules/libraries:
If you're new to python, here is an example to set up the Python Environment to use OTTERS.
To apply SDPR and lassosum as imputation models, please install:
- SDPR to perform SDPR
  
  Here is my code to download SDPR
```
cd ~/projects/bin
git clone https://github.com/eldronzhou/SDPR.git

# make sure that dynamic libraries gsl/lib/libgsl.so and MKL/lib/libmkl_rt.so are not changed
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/MKL/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/projects/bin/SDPR/gsl/lib
```
  Please make sure that dynamic libraries are not changed every time when use SDPR.
- R packages:
  - lassosum to perform lassosum (lassosum requires several R packages, please follow the lassosum installation guidance to install all of them).
  - fdrtool to perform pseudo-validation implemented in lassosum.

Example:

# activate the environment
conda activate otters

# set number of threads to be used
N_THREADS=1

# set up my OTTERS directory and SDPR directory
OTTERS_DIR=/home/qdai8/projects/bin/OTTERS
SDPR_DIR=/home/qdai8/projects/bin/SDPR

# make sure the dynamic libraries of SDPR are not changed (For SDPR)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/MKL/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${SDPR_DIR}/gsl/lib

# prevent automatically using  all available cores on a compute node (For SDPR and PRS-CS)
export MKL_NUM_THREADS=$N_THREADS
export NUMEXPR_NUM_THREADS=$N_THREADS
export OMP_NUM_THREADS=$N_THREADS

# Load R to perform lassosum
module load R

# Start to run OTTERS
cd ${OTTERS_DIR}/Example

# Input for OTTERS STAGE I 
# Annotation File 
exp_anno=exp_anno.txt
# Genotype data from LD reference panel
geno_dir=Exp_geno
# eQTL summary statistics 
sst_file=Exp_eQTLSumStats.txt
# Input for OTTERS STAGE II
# GWAS summary statistics 
gwas_sst_file=Exp_GWASSumStats.txt

# Set chromosome number (The example is for chromosome 4)
chr=4
# Set LD-clumping threshold in STAGE I
clump_r2=0.99
# Set output directory for STAGE I
out_dir=Results

# STAGE I
# train eQTL weights using P+T, lassosum, SDPR and PRS-CS. 
# It may take several minutes to complete.
python3 ${OTTERS_DIR}/training.py \
--OTTERS_dir=${OTTERS_DIR} \
--SDPR_dir=${SDPR_DIR} \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--sst_file=${sst_file} \
--out_dir=${out_dir} \
--chrom=${chr} \
--r2=${clump_r2} \
--models=PT,lassosum,SDPR,PRScs \
--lassosum_ld_blocks=EUR.hg38 \
--thread=$N_THREADS

# Set output directory for STAGE II
twas_dir=TWAS

# STAGE II
# gene-based association test using eQTL-weight trained from P+T, lassosum, SDPR and PRS-CS.
python3 ${OTTERS_DIR}/testing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${twas_dir} \
--gwas_file=${gwas_sst_file} \
--chrom=${chr} \
--thread=$N_THREADS

# get imputed genetically regulated gene expression
impute_dir=GReX
# samples to perform imputation
samples=Exp_samples.txt
# imputation
python3 ${OTTERS_DIR}/imputing.py \
--OTTERS_dir=${OTTERS_DIR} \
--weight_dir=${OTTERS_DIR}/Example/Results \
--models=P0.001,P0.05,lassosum,SDPR,PRScs \
--anno_dir=${exp_anno} \
--geno_dir=${geno_dir} \
--out_dir=${impute_dir} \
--chrom=${chr} \
--samples=${samples} \
--thread=$N_THREADS

Required Inputs

Annotation File should contain following columns with the same name in same order (text file):

CHROM GeneStart GeneEnd TargetID

1 100 200 ENSG0000

Example: Example/exp_anno.txt
Genotype data from LD reference panel in PLINK binary files :

Example: Example/Exp_geno.bed; Example/Exp_geno.bim; Example/Exp_geno.fam
eQTL summary statistics should contain following columns with the same name in same order (text file):

Please sort by chromosome and then by SNP position in ascending order
< 64C5 td align="center">T

CHROM POS A1 A2 Zscore TargetID N

1 100 C 3 ENSG0000 200

To convert a two-sided pvalue and beta (coefficients) to Zscore, in python, we can do:
```
import numpy as np
from scipy.stats import norm
Z = np.sign(beta) * abs(norm.ppf(pvalue/2.0))
```
Example: Example/Exp_eQTLSumStats.txt
GWAS summary statistics should contain following columns with the same name in same order (text file):

Not required to be sorted

CHROM POS A1 A2 Zscore

1 100 C T 2

Example: Example/Exp_GWASSumStats.txt

Notes:

OTTERS is based on PLINK 1.9, PRS-CS software, lassosum R package, and SDPR software. We really appreciate the authors of these softwares.

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
Example		Example
PRSmodels		PRSmodels
.DS_Store		.DS_Store
.gitignore		.gitignore
F1.png		F1.png
README.html		README.html
README.md		README.md
acat.R		acat.R
example.sh		example.sh
imputing.py		imputing.py
ottersutils.py		ottersutils.py
prepare.py		prepare.py
testing.py		testing.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

Latest Updates

January 14, 2025: Added `--lassosum_ld_blocks` argument

March 29, 2025: Added support for sex chromosomes (`--chrom=X` and `--chrom=Y`)

March 31, 2025: Added VCF/VCF.gz support for LD reference panels

Getting Started

Example:

Required Inputs

Notes:

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

daiqile96/OTTERS

Folders and files

Latest commit

History

Repository files navigation

OTTERS: Omnibus Transcriptome Test using Expression Reference Summary data

Latest Updates

January 14, 2025: Added --lassosum_ld_blocks argument

March 29, 2025: Added support for sex chromosomes (--chrom=X and --chrom=Y)

March 31, 2025: Added VCF/VCF.gz support for LD reference panels

Getting Started

Example:

Required Inputs

Notes:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

January 14, 2025: Added `--lassosum_ld_blocks` argument

March 29, 2025: Added support for sex chromosomes (`--chrom=X` and `--chrom=Y`)

Packages