8000 GitHub - mskilab-org/gosh: gOSh (gOS Shell) a CLI for running the nf-gOS pipeline
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

mskilab-org/gosh

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gOSh

A CLI for the nf-gOS pipeline.

Quickstart

  1. gosh run pipeline to run the nf-gos pipeline (make sure to provide a samplesheet)
  • If you want to ru 8000 n only the JaBbA pipeline, use gosh run pipeline --preset jabba
  • If you want to run only the HRD classifier pipeline, use gosh run pipeline --preset hrd
  1. gosh run skilift to convert the pipeline outputs into something ingestible by the gOS frontend
  2. (optional) gosh run outputs -p [pipeline results directory] -s [samplesheet path] to generate a csv containing your outputs. This is automatically done when you run gosh run skilift with a pipeline directory

Detailed Usage

This tutorial provides a comprehensive guide to using the gosh-cli, a command-line interface designed to interact with the nf-gOS backend. The gosh-cli facilitates running the nf-gOS cancer genomics pipeline and subsequently converting the raw pipeline outputs into formats compatible with the gOS frontend (JSON and Arrow files).

The tutorial is divided into four main sections:

  1. Samplesheet Preparation: Creating the input file that describes your samples and their data.
  2. Pipeline Execution: Running the nf-gOS pipeline using gosh-cli.
  3. Skilifting Outputs: Converting the pipeline results into gOS-compatible formats.
  4. gOS Setup: Configuring the gOS frontend to visualize your results.

1. Samplesheet Preparation

The first step is to prepare a samplesheet.csv file. This comma-separated value file details the input data and metadata for each sample you want to process.

Key Requirements:

  • Column Names: The column headers in your samplesheet.csv must exactly match the names expected by the nf-gOS pipeline.
  • Required Columns: Some columns are mandatory (patient, sample, status), while others are optional depending on the input data you have and the pipeline steps you intend to run.
  • Parsimonious Execution: The pipeline intelligently uses the information provided in the samplesheet to run only the necessary steps to generate the desired outputs. If an output file (e.g., a BAM file) is already provided in the samplesheet, the step that generates it (e.g., alignment) will be skipped.

Samplesheet Columns:

The following table lists the possible column names and their descriptions. Populate your samplesheet.csv with the relevant columns for your data.

Column Name Description
patient (Required) Patient ID. This should differentiate each patient. Note: Each patient can have multiple sample IDs.
sample (Required) Sample ID for each Patient. Should differentiate between tumor and normal (e.g., sample1_t vs. sample1_n). Sample IDs must be unique across the entire sheet.
lane If starting with FASTQ files, and if there are multiple lanes for each sample, specify the lane name here. Otherwise, leave blank.
sex Patient's sex, if known. Use XY for Male, XX for Female, or NA if unknown/not applicable.
status (Required) Indicates sample type: 1 for tumor, 0 for normal.
fastq_1 Full path to FASTQ file read 1. Must end in .fastq.gz or .fq.gz.
fastq_2 Full path to FASTQ file read 2 (for paired-end reads). Must end in .fastq.gz or .fq.gz.
bam Full path to BAM file. Must end in .bam. The corresponding index file (.bai) must exist in the same directory with the same base name (e.g., sample.bam and sample.bam.bai).
msi Full path to Msisensor-pro report file.
msi_germline Full path to Msisensor-pro germline file.
hets Full path to sites.txt file containing heterozygous sites (often generated by AMBER).
amber_dir Full path to the output directory generated by AMBER.
frag_cov Full path to the fragCounter/GC corrected coverage file.
dryclean_cov Full path to the Dryclean corrected coverage file.
cobalt_dir Full path to the output directory generated by Cobalt.
purity Purity value for the sample (often estimated by PURPLE).
ploidy Ploidy value for the sample (often estimated by PURPLE).
seg Full path to the CBS segmentation file for the tumor sample.
nseg Full path to the CBS segmentation file for the normal sample (if applicable).
vcf Full path to the GRIDSS structural variant VCF file.
vcf_tbi Full path to the index file (.tbi) for the GRIDSS VCF.
jabba_rds Full path to the JaBbA results RDS file (typically jabba.simple.rds).
jabba_gg Full path to the JaBbA gGraph RDS file (typically jabba.gg.rds).
ni_balanced_gg Full path to the non-integer balanced gGraph (non_integer.balanced.gg.rds) file.
lp_phased_gg Full path to the LP phased gGraph (lp_phased.balanced.gg.rds) file.
events Full path to the genomic events file (often derived from JaBbA).
fusions Full path to the gene fusions file (often derived from JaBbA/gGraph).
snv_somatic_vcf Full path to the somatic SNV/indel VCF file (e.g., from SAGE).
snv_somatic_tbi Full path to the index file (.tbi) for the somatic SNV VCF.
snv_germline_vcf Full path to the germline SNV/indel VCF file (e.g., from SAGE).
snv_germline_tbi Full path to the index file (.tbi) for the germline SNV VCF.
variant_somatic_ann Full path to the annotated somatic SNV/indel VCF file (e.g., from SnpEff).
variant_somatic_bcf Full path to the somatic SNV/indel BCF file (binary VCF, optional).
variant_germline_ann Full path to the annotated germline SNV/indel VCF file (e.g., from SnpEff).
variant_germline_bcf Full path to the germline SNV/indel BCF file (binary VCF, optional).
snv_multiplicity Full path to the SNV multiplicity file.
oncokb_maf Full path to the OncoKB-annotated MAF file.
oncokb_fusions Full path to the OncoKB-annotated fusions file.
oncokb_cna Full path to the OncoKB-annotated CNA (Copy Number Alteration) file.
sbs_signatures Full path to the SBS signatures file from SigProfilerAssignment.
indel_signatures Full path to the indel signatures file from SigProfilerAssignment.
signatures_matrix Full path to the signatures matrix file from SigProfilerAssignment.
hrdetect Full path to the HRDetect RDS file from SigProfilerAssignment.
onenesstwoness Full path to the onenesstwoness output file from SigProfilerAssignment.

Handling Sample Types:

  • Paired Tumor-Normal: Use the same patient ID for both samples. Use different sample IDs (e.g., Patient1_T, Patient1_N). Set status to 1 for the tumor and 0 for the normal.
  • Tumor-Only: Each sample should have a unique patient and sample ID. Set status to 1 for all samples.
  • Multiple Samples per Patient: You can include multiple tumor samples (e.g., from different time points or sites) for the same patient. Use the same patient ID and unique sample IDs, setting status to 1 for all.

Resuming from Previous Outputs:

If you have results from a previous nf-gOS run and want to generate a samplesheet to resume or run downstream steps, use the gosh run samplesheet command. This is particularly useful if the original run was interrupted or if you want to add new analyses.

gosh run samplesheet -p /path/to/previous/pipeline/results -s /path/to/original/samplesheet.csv -o new_samplesheet.csv
  • -p: Path to the results directory of the previous pipeline run.
  • -s: Path to the samplesheet.csv used for the previous run.
  • -o: Path where the new, generated samplesheet will be saved.

You can use the -c (include columns) and -C (exclude columns) flags with gosh run samplesheet to customize which output file paths are included in the new samplesheet. This allows fine-grained control over which steps the pipeline will attempt to run when using this new samplesheet. Run gosh run samplesheet --help for details on available column flags.

Pipeline Input/Output Dependencies:

Understanding which tools generate which outputs is crucial for preparing the samplesheet correctly. The pipeline skips steps for which outputs are already provided. Here's a map of key tools and their primary inputs/outputs (as represented by samplesheet columns):

Tool Inputs (Samplesheet Columns) Outputs (Samplesheet Columns)
aligner fastq_1, fastq_2 bam
bamqc bam wgs_metrics, alignment_metrics, insert_size_metrics (Internal/Reported)
msisensorpro bam msi, msi_germline
gridss bam vcf, vcf_tbi
amber bam hets, amber_dir
fragcounter bam frag_cov
dryclean frag_cov dryclean_cov
cbs dryclean_cov seg, nseg
sage bam snv_somatic_vcf, snv_germline_vcf, snv_somatic_tbi, snv_germline_tbi
cobalt bam cobalt_dir
purple cobalt_dir, amber_dir (, bam internally) purity, ploidy
jabba vcf, hets, dryclean_cov, ploidy, seg, nseg jabba_rds, jabba_gg
non_integer_balance jabba_gg ni_balanced_gg
lp_phased_balance ni_balanced_gg lp_phased_gg
events ni_balanced_gg events
fusions ni_balanced_gg fusions
snpeff snv_somatic_vcf, snv_germline_vcf variant_somatic_ann, variant_germline_ann, variant_somatic_bcf, variant_germline_bcf
snv_multiplicity jabba_gg, variant_somatic_ann snv_multiplicity
oncokb variant_somatic_ann, snv_multiplicity, jabba_gg, fusions oncokb_maf, oncokb_fusions, oncokb_cna
signatures snv_somatic_vcf sbs_signatures, indel_signatures, signatures_matrix
hrdetect hets, vcf, jabba_gg, snv_somatic_vcf hrdetect
onenesstwoness events, hrdetect onenesstwoness

2. Pipeline Execution

With your samplesheet.csv ready, you can now execute the nf-gOS pipeline using gosh-cli.

Setup:

  1. Run Directory: Choose a directory where you want the pipeline to run. This directory will contain intermediate files and logs.
  2. Place Samplesheet: Copy or move your prepared samplesheet.csv into this run directory.
  3. Disk Space: The pipeline generates a work/ subdirectory containing intermediate files for every task. This can become very large. Ensure your run directory is located on a filesystem with ample space.
    • NYU HPC: Use your scratch directory: /gpfs/scratch/$(whoami)/ (replace $(whoami) with your actual username or run echo "/gpfs/scratch/$(whoami)/" to get the path).

Running the Pipeline:

Navigate to your chosen run directory in the terminal and execute the main command:

gosh run pipeline

Key Features & Options:

  • Automatic Samplesheet Detection: If your samplesheet is named samplesheet.csv and located in the current directory, gosh run pipeline will detect it automatically. Otherwise, specify the path using the -s/--samplesheet flag.

  • Parameter Wizard: By default, the command launches a Command Line Interface (CLI) wizard. This interactive tool guides you through setting essential parameters for the run (like reference genome paths, output directory, etc.) and saves them into a params.json file.

  • Command-Line Flags: You can bypass the wizard and provide all parameters directly via command-line flags. Use gosh run pipeline --help to see the full list of available flags and their corresponding params.json keys.

  • Using an Existing params.json: If you already have a configured params.json file, you can pass it directly:
    gosh run pipeline --params-file /path/to/your/params.json

  • OncoKB API Key: Running the OncoKB annotation step requires an API key.

    • Provide it via the flag: --oncokb-api-key YOUR_API_KEY
    • Or, set it as an environment variable (recommended):
      export ONCOKB_API_KEY=YOUR_API_KEY Then run gosh run pipeline

The pipeline will automatically detect the environment variable.

  • Execution Profiles (--profile): gosh-cli attempts to detect the execution environment (e.g., HPC scheduler) and suggests appropriate Nextflow profiles.

    • Automatic Detection: nyu for NYU Langone HPC, nygc for NY Genome Center HPC.
    • Manual Specification: Use the --profile flag for other environments or specific needs. Profiles are comma-separated.
    • Local Execution (e.g., NYU Condo): Use --profile local,singularity.
    • IMPORTANT: The singularity profile is required to instruct Nextflow to use the containerized software versions defined by the pipeline, ensuring reproducibility. Include it unless you have a specific reason not to (e.g., using locally installed dependencies with the local profile alone).
  • Pipeline Source Code (--pipeline-dir): The tool automatically finds the nf-gOS pipeline code in shared lab directories for nyu and nygc profiles. To use a different version (e.g., your own clone), specify its path:
    gosh run pipeline --pipeline-dir /path/to/your/nf-gOS/clone

  • Presets (--preset): Apply predefined parameter sets and potentially skip irrelevant pipeline steps for specific analysis types (e.g., heme). This simplifies configuration for common scenarios.

The following presets are supported:

  • --preset jabba: For running the JaBbA pipeline.
  • --preset hrd: For running the HRD classification pipeline.
  • --preset heme: For hematologic malignancies.

Resuming a Pipeline Run:

Nextflow (the workflow engine used by nf-gOS) has built-in resume capabilities. gosh-cli provides convenient ways to manage reruns or specific restarts.

  • Method 1: Rerun Specific Processes/Samples (--processes, --samples)
    • If a run failed or you need to regenerate output for specific steps or samples, use these flags.

    • Provide process names (all caps, comma-separated) via --processes. Find process names in gosh run pipeline --help or pipeline logs.

    • Provide sample IDs (all caps, comma-separated) via --samples.

    • Example: Rerun the JABBA process for samples SAMPLE_T1 and SAMPLE_T2:
      gosh run pipeline --processes JABBA --samples SAMPLE_T1,SAMPLE_T2

    • IMPORTANT: This command will first identify the relevant intermediate directories within the work/ directory and ask for confirmation before deleting them. This ensures the specified processes rerun from scratch for the selected samples. The rest of the pipeline will use cached results where possible.

  • Method 2: Using gosh run samplesheet
    • As described in the Samplesheet Preparation section, generate a new samplesheet using gosh run samplesheet -p ... -s ....
    • Modify this new samplesheet: To rerun a specific process, remove the corresponding output file path column(s) from the header row. To rerun for specific samples, remove the rows corresponding to other samples.
    • Run gosh run pipeline -s new_samplesheet.csv with the modified samplesheet.
    • Advantage: This method bypasses Nextflow's internal caching mechanism based on input/parameter hashes. It's useful if the pipeline code, containers, or parameters have changed since the last run, which would normally break Nextflow's default -resume functionality due to altered "hash integrity".

Pipeline Outputs:

  • Final pipeline outputs are placed in the directory specified during the wizard setup or via the -o/--outdir flag (default is results/ within the run directory).

  • Outputs are organized by patient ID and then by the process that generated them:

    <output_directory>/  
    ├── <patient_id_1>/  
    │   ├── <process_name_1>/  
    │   │   └── <output_file(s)>  
    │   ├── <process_name_2>/  
    │   │   └── <output_file(s)>  
    │   └── ...  
    ├── <patient_id_2>/  
    │   ├── <process_name_1>/  
    │   │   └── <output_file(s)>  
    │   └── ...  
    └── ...
    
  • These files are copied/moved from the work/ directory. If you accidentally delete files from the results/ directory, you can often regenerate them by rerunning the pipeline (using resume functionality).

Debugging Pipeline Execution

When a pipeline process fails, Nextflow provides detailed logs and isolated environments to help diagnose the issue.

  1. Check Nextflow Log: The primary log file is .nextflow.log located in the directory where you launched the pipeline. Search this file for lines containing ERROR. These lines often indicate which process failed and provide the path to its specific work/ directory (e.g., work/ab/c123def...).
  2. Identify Work Directory:
    • The error message in .nextflow.log usually contains the work directory path.

    • Alternatively, use the gosh debug log command:

      # Get info for all failed processes  
      gosh debug log --status FAILED
      
      # Get only the workdir column for failed processes, filtering out non-existent dirs  
      gosh debug log --status FAILED --work-dir-exists -c workdir

      This command provides a table with information about pipeline processes, including their status and work directory paths.

  3. Navigate to Work Directory: Go to the specific work directory identified (e.g., cd work/ab/c123def...). This directory is a self-contained environment for that specific task instance.
  4. Inspect Work Directory Contents: Inside the work directory, you'll find:
    • Input Files: Symlinks to the input files used by this specific task. Inspect these to ensure they are correct.
    • .command.sh: The actual shell script that Nextflow attempted to execute for this task. You can examine this script to understand the commands being run.
    • .command.log: The standard output and standard error captured during the execution attempt. This often contains the specific error message from the underlying tool.
    • .command.run: A helper script used by Nextflow to execute .command.sh, often handling container execution.
  5. Debugging Steps (Order of Increasing Intervention):
    • Examine Logs: Carefully read .command.log to find the specific error message.
    • Inspect Script: Review .command.sh to understand the command logic and parameters passed to the tool.
    • Modify and Rerun (Experimentation):
      • You can carefully edit .command.sh for quick tests (e.g., adding echo statements, modifying parameters).

      • To re-execute the task within its containerized environment, run the .command.run script:
        ./.command.run

        Do not run .command.sh directly, as this will execute it outside the container, likely failing or producing different results.

    • Enter the Container: Modify .command.run to drop you into an interactive shell inside the container (e.g., change the final execution line to bash). Then run ./.command.run. This allows you to manually execute commands within the exact environment the task used.
    • Modify Source Code (Advanced): If the error lies within an R or Python script called by .command.sh (these scripts originate from the pipeline source code), the best practice is:
      1. Clone the nf-gOS pipeline repository locally (git clone ...).
      2. Make your modifications to the script(s) within your local clone.
      3. When rerunning the pipeline (or the specific task via ./.command.run if you modify it to point to your local scripts), ensure you are using your modified pipeline source code. For a full pipeline rerun, use the --pipeline-dir /path/to/your/nf-gOS/clone flag with gosh run pipeline.
      4. Note: Modifying pipeline source code will change file hashes and break Nextflow's default resume capability for subsequent runs. It's often better to fix the code in your clone and then restart the analysis using the M 8000 ethod 2 (Using gosh run samplesheet) described in the "Resuming a Pipeline Run" section above, ensuring the --pipeline-dir points to your fixed clone.

3. Skilifting Outputs

After the nf-gOS pipeline successfully completes, the raw outputs in the results/ directory need to be converted ("skilifted") into formats suitable for the gOS frontend (JSON and Arrow files). The gosh run skilift command handles this.

Command:

gosh run skilift -p /path/to/pipeline/results -s /path/to/samplesheet.csv -o /path/to/skilift/output/dir --cohort-type

Arguments & Options:

  • -p/--pipeline-results: (Required) Path to the results directory generated by gosh run pipeline.
  • -s/--samplesheet: (Required) Path to the samplesheet.csv file used for the pipeline run.
  • -o/--output-dir: (Required) Path where the skilifted JSON and Arrow files will be saved.
    • Accessibility: This directory must be accessible via a web server for the gOS frontend to load the data.
    • NYU/NYGC HPC: Place this directory within your public_html directory (e.g., /gpfs/home/$(whoami)/public_html/my_gos_cohort).
  • --cohort-type: (Required) Specifies the type of cohort, which determines the appropriate set of "lifters" (conversion scripts) to use. Common options:
    • paired: For paired tumor-normal samples (default if omitted).
    • tumor_only: For cohorts with only tumor samples.
    • heme: For hematologic malignancy cohorts (may use specific lifters or parameters).
  • -l/--skilift-repo: Path to your local clone of the Skilift R package repository. If not provided, gosh-cli will try to find it in common locations (like ~/git/) or clone it from the remote repository (mskilab-org/Skilift).
  • -c/--cores: Number of CPU cores to use for parallel processing during skilifting. Highly recommended for large cohorts to speed up the process.
    # Example using 8 cores  
    gosh run skilift ... -o /path/to/output --cohort-type paired -c 8

What it Does:

  1. Generates outputs.csv: It first creates an outputs.csv file within the specified output directory (-o). This file merges metadata from your input samplesheet.csv with the paths to the output files found in the pipeline results directory (-p).
    • Format: Unlike the input samplesheet.csv (tall format, one row per sample), outputs.csv is in a wide format, with one row per patient. Columns represent different data types (BAMs, VCFs, segments, etc.) for that patient (potentially including both tumor and normal sample data if paired).
  2. Creates Skilift Cohort Object: It uses the outputs.csv to instantiate a Skilift::Cohort object in R. This object organizes all the file paths and metadata.
  3. Runs Skilifters: It iterates through the appropriate lifters (determined by --cohort-type) for the Skilift::Cohort object using the Skilift::lift_all function. Each lifter reads specific raw output files (e.g., JaBbA RDS, segment files, VCFs) and converts them into JSON and/or Apache Arrow files within the output directory (-o). Note: Depending on the gOS version, you might also need a datafiles.json listing these outputs; check the gOS Setup section.
  4. Saves Cohort Object: The Skilift::Cohort R object itself is also saved as an RDS file (e.g., skilift_cohort.rds) in the output directory for potential debugging or reuse.

Debugging Skilift

Errors during the gosh run skilift process are typically printed directly to the console.

  1. Check Console Output: Look for error messages printed during the gosh run skilift execution. Skilift functions often use tryCatchLog wrappers, which should report the specific error message and the line number in the R code where the error occurred.

  2. Inspect Inputs:

    • outputs.csv: Examine the outputs.csv file generated in the output directory (-o). This file shows the exact input file paths that Skilift used to create the cohort object. Verify that these paths are correct and the files exist.
    • cohort.rds: Load the skilift_cohort.rds file (also in the output directory) into an R session (cohort = readRDS("skilift_cohort.rds")). Inspect this Skilift::Cohort object to see the organized input data as Skilift understood it.
  3. Modify Skilift Code (Recommended Practice):

    • If the error stems from the Skilift R code itself, clone the Skilift repository: git clone https://github.com/mskilab-org/skilift.git
    • Make necessary changes to the R scripts within your local clone.
    • Rerun the gosh run skilift command, explicitly pointing to your modified repository using the -l/--skilift-repo flag:
      gosh run skilift ... -l /path/to/your/skilift/clone ...
  4. Run Individual Lifters: The gosh run skilift command uses Skilift::lift_all to run all relevant conversions. However, Skilift provides individual functions (lifters) for each specific output type (e.g., lift_sv, lift_seg, lift_mutations). For focused debugging:

    • Load the cohort.rds object into an R session.
    • Load your modified Skilift code (devtools::load_all("/path/to/your/skilift/clone")).
    • Call the specific lifter function that failed directly on the cohort object (e.g., lift_sv(cohort, output_directory = "/path/to/skilift/output/dir")).
    • A list of available lifters and their purposes can be found in the Skilift README: https://github.com/mskilab-org/skilift?tab=readme-ov-file#lifters

4. gOS Setup

You now have a directory containing the JSON and Arrow files ready for visualization in the gOS frontend. The final step is to make gOS aware of this new dataset.

Option 1: Use Existing Lab gOS Instance (NYU/NYGC)

This is the simplest method for members of the lab with access to the shared HPC resources.

  1. Edit datasets.json: Locate the central configuration file for the lab's gOS instance. This file lists all available cohorts.
    • NYU Path: /gpfs/data/imielinskilab/external/mskiweb/xanthc01/case-report/datasets.json
    • (Consult your local administrator for the NYGC path if different)
  2. Add New Entry: Open this JSON file in a text editor. Add a new object to the main array, describing your cohort using the following schema:
    • id: A short, unique, machine-readable identifier for your cohort (e.g., myproject_cohort1).
    • title: A human-readable name for your cohort displayed in the gOS dropdown (e.g., "My Project Cohort 1").
    • dataPath: The web-accessible URL path to the directory where you ran gosh run skilift (the -o directory containing the generated JSON/Arrow files). This path is typically relative to the web server's document root configured for the gOS instance.
      • For the NYU example, if your skilift output is in /gpfs/home/user/public_html/my_gos_cohort, the dataPath might be /external/imielinskilab/mskiweb/user/my_gos_cohort/. Verify the correct base path with your administrator.
    • datafilesPath: The web-accessible URL path to a specific datafiles.json file located within the dataPath directory. This file explicitly lists the generated JSON and Arrow files for gOS to load. You may need to create this datafiles.json file manually or check if your Skilift version generates it automatically. It should contain relative paths from dataPath to the actual data files.
[  
  // ... existing entries ...  
  {  
    "id": "myproject_cohort1",  
    "title": "My Project Cohort 1 (Processed 2025-05-01)",  
    "dataPath": "/external/imielinskilab/mskiweb/YOUR_USERNAME/path/to/skilift_output/", // <-- IMPORTANT: Path to the DIRECTORY  
    "datafilesPath": "/external/imielinskilab/mskiweb/YOUR_USERNAME/path/to/skilift_output/datafiles.json" // <-- IMPORTANT: Path to the JSON FILE listing data  
  }  
]
  1. Save the File: Save your changes to datasets.json.
  2. Access gOS: Open the lab's gOS instance URL in your web browser.
    • NYU URL: https://genome.med.nyu.edu/external/imielinskilab/mskiweb/xanthc01/case-report/ You should now see your cohort listed in the dataset selection dropdown menu under the specified title.

Option 2: Create a New gOS Build

If you want to run a standalone instance of gOS or don't have access to the shared lab instance configuration.

  1. Clone gOS Repository:

    git clone https://github.com/mskilab-org/gOS  
    cd gOS
  2. Set Node.js Version: Ensure you are using the correct Node.js version. gOS builds are sensitive to this.

    • Required Version: v16.5.1 (Check with node -v.
  3. Install Dependencies: npm install -f (The -f or --force flag might be needed to resolve potential peer dependency conflicts).

  4. Create datasets.json: Create a new file named datasets.json inside the public/ directory within your cloned gOS repository.

  5. Add Your Cohort: Populate public/datasets.json with an entry for your cohort, following the schema described in Option 1. The dataPath and datafilesPath here should be the relative paths from the public/ directory to where your skilifted data directory and datafiles.json file will be served. Often, you'll copy your skilift output into the public/ directory. Example: gOS/public/datasets.json

    [  
      {  
        "id": "my_standalone_cohort",  
        "title": "Locally built gOS with my data",  
        "dataPath": "my_skilift_output/", // Assuming data dir is gOS/public/my_skilift_output/  
        "datafilesPath": "my_skilift_output/datafiles.json" // Assuming file is gOS/public/my_skilift_output/datafiles.json  
      }  
    ]

    Note: Ensure the datafiles.json file exists at the specified location within your public/ directory.

  6. Copy Gene Tracks: gOS requires gene annotation files. Copy the genes/ directory from the shared lab instance into your public/ directory:

    • Source (NYU): /gpfs/data/imielinskilab/external/mskiweb/xanthc01/case-report/genes
    • Destination: gOS/public/
# On NYU HPC  
cp -r /gpfs/data/imielinskilab/external/mskiweb/xanthc01/case-report/genes /path/to/your/gOS/public/
  1. Copy Skilift Output: Copy the entire output directory from gosh run skilift (which should contain the JSON/Arrow files and the necessary datafiles.json) into your gOS/public/ directory. Update the paths in public/datasets.json accordingly.

    # Example  
    cp -r /path/to/skilift/output/dir /path/to/your/gOS/public/my_skilift_output  
    # Make sure /path/to/your/gOS/public/my_skilift_output/datafiles.json exists!
  2. Build gOS: Run the build command. This compiles the React application into static files.

    npm run build

    This will create a build/ directory containing the static HTML, CSS, and JavaScript files.

  3. Deploy: Copy the contents of the gOS/build/ directory to a location served by a web server.

    • NYU/NYGC HPC: Copy the contents into a subdirectory within your public_html directory.
# Example: Deploying to public_html/my_gos_instance  
mkdir -p /gpfs/home/$(whoami)/public_html/my_gos_instance  
cp -r /path/to/your/gOS/build/* /gpfs/home/$(whoami)/public_html/my_gos_instance/
  1. Access Your Instance: Navigate to the URL corresponding to where you deployed the build.
    • Example URL (NYU): https://genome.med.nyu.edu/external/imielinskilab/mskiweb/YOUR_USERNAME/my_gos_instance/

You have now successfully processed your samples using gosh-cli and nf-gOS, converted the results using Skilift, and set up the gOS frontend to visualize your genomic data.

About

gOSh (gOS Shell) a CLI for running the nf-gOS pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

2938
0