8000 recetox-aplcms `v0.10.1` by xtrojak · Pull Request #317 · RECETOX/galaxytools · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

recetox-aplcms v0.10.1 #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Feb 13, 2023
Merged

recetox-aplcms v0.10.1 #317

merged 70 commits into from
Feb 13, 2023

Conversation

xtrojak
Copy link
Contributor
@xtrojak xtrojak commented Dec 12, 2022

Preparation of wrappers for the new version v0.10.1 of recetox-aplcms.
Details on introduced and changed wrapper are available in #308.

We have introduced an approach in the wrappers that stores the attribute sample_name in the .parquet files. This way, we don't have to rely on filenames but use sample names to actually sort and keep the feature tables in the correct order. This is example code how it can be done (but is already implemented in the wrappers):

table <- arrow::read_parquet('file.parquet')
attributes(table) # no sample_name present
attr(table, "sample_name") <- "theSampleXYZ"
attributes(table) # now it contains sample_name
arrow::write_parquet(table, "output.parquet")

loaded_table <- arrow::read_parquet('output.parquet')
attributes(loaded_table) # still contains sample_name
attr(loaded_table, "filename") # this is how it can be accessed

TODO 8000 list:

  • add all newly introduced wrappers, defined their inputs, respective recetox-aplcms function call, and outputs.
  • remove obsolete wrappers
  • clean up macro file
  • update utils R functions and remove the obsolete ones
  • actually run and test the wrappers once the bioconda package is available, make sure all runs smoothly and keep in mind the individual wrappers need to "fit together" to form 'unsupervised' and 'hybrid' workflows
  • solve how to get value of sample_name from input mzml files (this concerns remove_noise and recover_weaker wrappers) - do we want to use filename? Are these values present inside the files? (see 28d28db)
  • finish select_table_with_sample_name (in utils.R) function - make sure it works (see recetox-aplcms v0.10.1 #317 (comment))
  • we will not include tests in the individual wrappers - instead, we will write workflow tests. For that we will reuse package test data. The main motivation is that for meaningful tests, a larger dataset is required, and in workflows testing they can be downloaded from a remove location instead of storing them locally with the wrappers. This should also resolve aplcms: replace galaxy tests with real data #169.
  • so far, we are relying on the presence of the attribute sample_name in the .parquet, this needs to be handled properly in the case it is missing (raise a proper error?)
  • compute_clusters should have a switch to enter the input tolerances as numbers or use an input parquet file
  • recetox-aplcms-extract-features input form #291
  • aplcms: Improve documentation #184

Close #308, closes #184

@hechth
Copy link
Member
hechth commented Dec 13, 2022

so far, we are relying on the presence of the attribute sample_name in the .parquet, this needs to be handled properly in the case it is missing (raise a proper error? use the filename instead and hope for the best?)

There should be an exception raised in the wrapper when loading the data.

@hechth
Copy link
Member
hechth commented Dec 13, 2022

solve how to get value of sample_name from input mzml files (this concerns remove_noise and recover_weaker wrappers) - do we want to use filename? Are these values present inside the files?

This should be read from the file using the mzR package

@maximskorik
Copy link
Member
maximskorik commented Dec 20, 2022

solve how to get value of sample_name from input mzml files (this concerns remove_noise and recover_weaker wrappers) - do we want to use filename? Are these values present inside the files?

This should be read from the file using the mzR package

After some research, it appears that the actual sample names are stored in id attribute of run subelement of mzml XML file (http://www.peptideatlas.org/tmp/mzML1.1.0.html#run). Unfortunately, mzR doesn't read sample names – its fileName method always returns the path used to read a file. So far, the only option to get that attribute is with pymzml package. Alternatively, it may be possible to parse XML with some R package or bash script (checking if that's feasible).

UPD: see 28d28db. grep pattern: https://unix.stackexchange.com/a/529674/536940

@xtrojak
Copy link
Contributor Author
xtrojak commented Dec 30, 2022

Bioconda package for v0.10.0 was released today bioconda/bioconda-recipes#38281.

@xtrojak xtrojak self-assigned this Jan 5, 2023
@xtrojak xtrojak closed this Jan 5, 2023
@xtrojak xtrojak changed the title recetox-aplcms v0.10.0 recetox-aplcms v0.10.1 Feb 3, 2023
@xtrojak xtrojak marked this pull request as ready for review February 10, 2023 10:25
@hechth hechth merged commit beaa9e4 into RECETOX:master Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

apLCMS: update wrappers to v0.10.0 aplcms: Improve documentation aplcms: replace galaxy tests with real data
4 participants
0