8000 Batch qc script by anoronh4 · Pull Request #147 · mskcc/forte · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Batch qc script #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: dev
Choose a base branch
from
Draft

Batch qc script #147

wants to merge 3 commits into from

Conversation

anoronh4
Copy link
Collaborator
@anoronh4 anoronh4 commented Apr 17, 2025

Partially addresses #146 -- short term solution for now.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the mskcc/forte branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link
github-actions bot commented Apr 17, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 0ecf784

+| ✅ 212 tests passed       |+
#| ❔  14 tests were ignored |#
!| ❗  78 tests had warnings |!

❗ Test warnings:

  • nextflow_config - Config variable not found: validation.help.beforeText
  • nextflow_config - Config variable not found: validation.help.afterText
  • nextflow_config - Config variable not found: validation.summary.beforeText
  • nextflow_config - Config variable not found: validation.summary.afterText
  • nextflow_config - Config manifest.version should end in dev: 1.0.1
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "# mskcc/forte\n\nGitHub Actions CI Status\nGitHub Actions Linting StatusCite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\n## Introduction\n\nmskcc/forte is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run mskcc/forte \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\n## Credits\n\nmskcc/forte was originally written by Anne Marie Noronha.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use mskcc/forte for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nThis pipeline uses code and infrastructure developed and maintained by the nf-core c 8000 ommunity, reused here under the MIT license.\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • schema_lint - Schema $id should be https://raw.githubusercontent.com/mskcc/forte/main/nextflow_schema.json or https://raw.githubusercontent.com/mskcc/forte/master/nextflow_schema.json.
    Found https://raw.githubusercontent.com/mskcc/forte//nextflow_schema.json
  • schema_description - Ungrouped param in schema: maf_input
  • schema_description - Ungrouped param in schema: extract_fq_read_group
  • schema_description - Ungrouped param in schema: genome
  • schema_description - Ungrouped param in schema: igenomes_base
  • schema_description - Ungrouped param in schema: igenomes_ignore
  • schema_description - Ungrouped param in schema: targets_base
  • schema_description - Ungrouped param in schema: skip_trimming
  • schema_description - Ungrouped param in schema: ignore_read_pair_suffixes
  • schema_description - Ungrouped param in schema: save_unaligned
  • schema_description - Ungrouped param in schema: save_align_intermeds
  • schema_description - Ungrouped param in schema: run_oncokb_fusionannotator
  • schema_description - Ungrouped param in schema: fusion_tool_cutoff
  • schema_description - Ungrouped param in schema: rseqc_modules
  • schema_description - Ungrouped param in schema: dedup_umi_for_kallisto
  • schema_description - Ungrouped param in schema: kallisto_fragment_len
  • schema_description - Ungrouped param in schema: kallisto_fragment_sd
  • schema_description - Ungrouped param in schema: expression_quantifier
  • schema_description - Ungrouped param in schema: multiqc_config
  • schema_description - Ungrouped param in schema: multiqc_title
  • schema_description - Ungrouped param in schema: multiqc_logo
  • schema_description - Ungrouped param in schema: max_multiqc_email_size
  • schema_description - Ungrouped param in schema: multiqc_methods_description
  • schema_description - Ungrouped param in schema: publish_dir_mode
  • schema_description - Ungrouped param in schema: email
  • schema_description - Ungrouped param in schema: email_on_fail
  • schema_description - Ungrouped param in schema: plaintext_email
  • schema_description - Ungrouped param in schema: monochrome_logs
  • schema_description - Ungrouped param in schema: hook_url
  • schema_description - Ungrouped param in schema: pipelines_testdata_base_path
  • schema_description - Ungrouped param in schema: config_profile_name
  • schema_description - Ungrouped param in schema: config_profile_description
  • schema_description - Ungrouped param in schema: custom_config_version
  • schema_description - Ungrouped param in schema: custom_config_base
  • schema_description - Ungrouped param in schema: config_profile_contact
  • schema_description - Ungrouped param in schema: config_profile_url
  • schema_description - Ungrouped param in schema: reference_base
  • schema_description - Ungrouped param in schema: fasta
  • schema_description - Ungrouped param in schema: gtf
  • schema_description - Ungrouped param in schema: starfusion_url
  • schema_description - Ungrouped param in schema: refflat
  • schema_description - Ungrouped param in schema: baits
  • schema_description - Ungrouped param in schema: cdna
  • schema_description - Ungrouped param in schema: arriba_blacklist
  • schema_description - Ungrouped param in schema: arriba_known_fusions
  • schema_description - Ungrouped param in schema: arriba_protein_domains
  • schema_description - Ungrouped param in schema: metafusion_blocklist
  • schema_description - Ungrouped param in schema: metafusion_gene_bed
  • schema_description - Ungrouped param in schema: metafusion_gene_info
  • schema_description - Ungrouped param in schema: ensembl_version
  • schema_description - Ungrouped param in schema: clinicalgenes
  • local_component_structure - fusion.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - preprocess_reads.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - fillout.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - group_reads.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - extract_dedup_fq.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_references.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - baits.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - quantification.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - align_reads.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - qc.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: assets/nf-core-forte_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-forte_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-forte_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: assets/nf-core-forte_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-forte_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-forte_logo_dark.png

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-04-17 17:05:30

@anoronh4 anoronh4 requested review from carynhale and kofiamoah April 17, 2025 17:59
@anoronh4
Copy link
Collaborator Author

How to use the script:

nextflow run /path/to/batch_qc.nf --input input.csv -profile juno,singularity --outdir /path/to/result --genome GRCh38

Input.csv should just be a list of folders to look inside. For example:

/juno/cmo/ccs/noronhaa/forte_dev/validation_28_GRCh38/results/analysis/M18-39155_6
/juno/cmo/ccs/noronhaa/forte_dev/validation_28_GRCh37/results_2023-09-08

Each item in the list can point to the folder for the specific sample or the result folder for a whole forte run.

@anoronh4 anoronh4 linked an issue May 9, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

combine qc from different forte runs
1 participant
0