Skip to content

Conversation

@hariszaf
Copy link
Member

This PR aims at building RO-crates of the metaGOflow output.

It requires the installation of rocrate as a dependency.

In general, the approach used is not the best way to go as we do not exploit the cwl descriptions.
However, it does the job and we get valid ro-crates.

@hariszaf
Copy link
Member Author

As it, we keep both the output directory and the .zip file with the complete ro-crate.
It is my belief that we should remove either the output directory and keep just the .zip or vice versa.
No strong opinion though.

Copy link
Collaborator

@jprmachado jprmachado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor points to be considered but if it's merged in the current state I don't have any issues.

Particularly in cwl files I don't have a broad view to see the impact, it's passing the syntax check and I will assume that was tested.

From my side it is approved and changes are upon you decision.


# For parallelization cases might worth decrease global value of threads
interproscan_threads: 20
# As a rule of thumb keep that as floor(threads/8) where threads the previous parameter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good rule. Yet is not assure that will always split in 8 instances, at least afters reading their code on cwltool. This may require fine tuning on the usage, but /8 is a general good starting point.


# Global
threads: 20
threads: 40
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am comfortable with changing config.yml paramters, just not sure if it is what is intended in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's zorbas-oriented.
Feel free to have a nice config in the documentation branch and we keep the one from there! 👍

"name": "Apache License 2.0"
}
]
} No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

run_wf.sh Outdated
mv allfiles.gz ${prefix}".tsv.gz"
fi

cd ../../../
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we store previously the target path in VAR and then here go to there? These kind of navigations inside scripts always make me nervous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right it's an extreme quick and dirty way here.
I ll fix that.

# Load module
module load python/3.7.8
module load singularity/3.7.1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does not require env setting here? I never run directly using sbatch just via toil

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a funny thing that your job may get envs from your local account - if i am not mistaken.

The sbatch file could be removed in general as it's only as you said for the HPC case and only if slurm is used.

We could keep that as an example but no strong opinion on that. Let's decide this on the documentation branch

hmmer software HMMER searches biological sequence databases for homologous sequences, using either single sequences or multiple sequence alignments as queries. HMMER implements a technology called "profile hidden Markov models" (profile HMMs). 3.2.1 functional annotation https://github.com/EddyRivasLab/hmmer microbiomeinformatics/pipeline-v5.hmmer:v3.2.1
megahit software MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly. 1.2.9 assembly https://github.com/voutcn/megahit quay.io/biocontainers/megahit:1.2.9--h2e03b76_1
mOTUs software The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data. 2.5.1 taxonomy inventory https://github.com/motu-tool/mOTUs microbiomeinformatics/pipeline-v5.motus:v2.5.1
GO-slim script Format IPS output 1.0.0 functional annotation https://github.com/EBI-Metagenomics/pipeline-v5/blob/master/tools/GO-slim/go_summary_pipeline-1.0.py microbiomeinformatics/pipeline-v5.go-summary:v1.0 No newline at end of file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this!

try:
pk = description["@id"] in entry.id
except:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could throw the exception here and print the stack trace

suggestion:
import traceback
e.g traceback.print_exc()

@jprmachado
Copy link
Collaborator

The #26 and #24 should take into consideration the changes made by this PR

@hariszaf hariszaf merged commit 07b331c into develop Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants