-
Notifications
You must be signed in to change notification settings - Fork 8
Ro crate for the metaGOflow output #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
As it, we keep both the output directory and the |
jprmachado
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, minor points to be considered but if it's merged in the current state I don't have any issues.
Particularly in cwl files I don't have a broad view to see the impact, it's passing the syntax check and I will assume that was tested.
From my side it is approved and changes are upon you decision.
|
|
||
| # For parallelization cases might worth decrease global value of threads | ||
| interproscan_threads: 20 | ||
| # As a rule of thumb keep that as floor(threads/8) where threads the previous parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good rule. Yet is not assure that will always split in 8 instances, at least afters reading their code on cwltool. This may require fine tuning on the usage, but /8 is a general good starting point.
|
|
||
| # Global | ||
| threads: 20 | ||
| threads: 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am comfortable with changing config.yml paramters, just not sure if it is what is intended in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's zorbas-oriented.
Feel free to have a nice config in the documentation branch and we keep the one from there! 👍
| "name": "Apache License 2.0" | ||
| } | ||
| ] | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run_wf.sh
Outdated
| mv allfiles.gz ${prefix}".tsv.gz" | ||
| fi | ||
|
|
||
| cd ../../../ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we store previously the target path in VAR and then here go to there? These kind of navigations inside scripts always make me nervous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you're right it's an extreme quick and dirty way here.
I ll fix that.
| # Load module | ||
| module load python/3.7.8 | ||
| module load singularity/3.7.1 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not require env setting here? I never run directly using sbatch just via toil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a funny thing that your job may get envs from your local account - if i am not mistaken.
The sbatch file could be removed in general as it's only as you said for the HPC case and only if slurm is used.
We could keep that as an example but no strong opinion on that. Let's decide this on the documentation branch
| hmmer software HMMER searches biological sequence databases for homologous sequences, using either single sequences or multiple sequence alignments as queries. HMMER implements a technology called "profile hidden Markov models" (profile HMMs). 3.2.1 functional annotation https://github.com/EddyRivasLab/hmmer microbiomeinformatics/pipeline-v5.hmmer:v3.2.1 | ||
| megahit software MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly. 1.2.9 assembly https://github.com/voutcn/megahit quay.io/biocontainers/megahit:1.2.9--h2e03b76_1 | ||
| mOTUs software The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data. 2.5.1 taxonomy inventory https://github.com/motu-tool/mOTUs microbiomeinformatics/pipeline-v5.motus:v2.5.1 | ||
| GO-slim script Format IPS output 1.0.0 functional annotation https://github.com/EBI-Metagenomics/pipeline-v5/blob/master/tools/GO-slim/go_summary_pipeline-1.0.py microbiomeinformatics/pipeline-v5.go-summary:v1.0 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this!
| try: | ||
| pk = description["@id"] in entry.id | ||
| except: | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could throw the exception here and print the stack trace
suggestion:
import traceback
e.g traceback.print_exc()
This PR aims at building RO-crates of the metaGOflow output.
It requires the installation of rocrate as a dependency.
In general, the approach used is not the best way to go as we do not exploit the cwl descriptions.
However, it does the job and we get valid ro-crates.