Ro crate for the metaGOflow output #27

hariszaf · 2023-03-25T00:13:16Z

This PR aims at building RO-crates of the metaGOflow output.

It requires the installation of rocrate as a dependency.

In general, the approach used is not the best way to go as we do not exploit the cwl descriptions.
However, it does the job and we get valid ro-crates.

hariszaf · 2023-03-25T00:22:30Z

As it, we keep both the output directory and the .zip file with the complete ro-crate.
It is my belief that we should remove either the output directory and keep just the .zip or vice versa.
No strong opinion though.

jprmachado

LGTM, minor points to be considered but if it's merged in the current state I don't have any issues.

Particularly in cwl files I don't have a broad view to see the impact, it's passing the syntax check and I will assume that was tested.

From my side it is approved and changes are upon you decision.

jprmachado · 2023-03-25T15:48:18Z

config.yml


-# For parallelization cases might worth decrease global value of threads
-interproscan_threads: 20
+# As a rule of thumb keep that as floor(threads/8) where threads the previous parameter


This is a good rule. Yet is not assure that will always split in 8 instances, at least afters reading their code on cwltool. This may require fine tuning on the usage, but /8 is a general good starting point.

jprmachado · 2023-03-25T15:49:25Z

config.yml


 # Global
-threads: 20
+threads: 40


I am comfortable with changing config.yml paramters, just not sure if it is what is intended in this PR.

That's zorbas-oriented.
Feel free to have a nice config in the documentation branch and we keep the one from there! 👍

jprmachado · 2023-03-25T15:50:29Z

ro-crate-metadata-example.json

+            "name": "Apache License 2.0"
+        }
+    ]
+}


jprmachado · 2023-03-25T15:54:00Z

run_wf.sh

+  mv allfiles.gz ${prefix}".tsv.gz"
+fi 
+
+cd ../../../


Could we store previously the target path in VAR and then here go to there? These kind of navigations inside scripts always make me nervous.

Yes you're right it's an extreme quick and dirty way here.
I ll fix that.

jprmachado · 2023-03-25T15:56:29Z

slurm_run.sh

 # Load module
 module load python/3.7.8
 module load singularity/3.7.1 



does not require env setting here? I never run directly using sbatch just via toil

it's a funny thing that your job may get envs from your local account - if i am not mistaken.

The sbatch file could be removed in general as it's only as you said for the HPC case and only if slurm is used.

We could keep that as an example but no strong opinion on that. Let's decide this on the documentation branch

jprmachado · 2023-03-25T15:57:16Z

software_versions.tsv

+hmmer	software	HMMER searches biological sequence databases for homologous sequences, using either single sequences or multiple sequence alignments as queries. HMMER implements a technology called "profile hidden Markov models" (profile HMMs). 	3.2.1	functional annotation	https://github.com/EddyRivasLab/hmmer	microbiomeinformatics/pipeline-v5.hmmer:v3.2.1
+megahit	software	MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized for metagenomes, but also works well on generic single genome assembly (small or mammalian size) and single-cell assembly.	1.2.9	assembly	https://github.com/voutcn/megahit	quay.io/biocontainers/megahit:1.2.9--h2e03b76_1
+mOTUs	software	The mOTU profiler is a computational tool that estimates relative taxonomic abundance of known and currently unknown microbial community members using metagenomic shotgun sequencing data.	2.5.1	taxonomy inventory	https://github.com/motu-tool/mOTUs	microbiomeinformatics/pipeline-v5.motus:v2.5.1
+GO-slim	script	Format IPS output 	1.0.0	functional annotation	https://github.com/EBI-Metagenomics/pipeline-v5/blob/master/tools/GO-slim/go_summary_pipeline-1.0.py	microbiomeinformatics/pipeline-v5.go-summary:v1.0


I like this!

jprmachado · 2023-03-25T16:01:01Z

utils/edit-ro-crate.py

+            try:
+                pk = description["@id"] in entry.id
+            except:
+                pass


we could throw the exception here and print the stack trace

suggestion:
import traceback
e.g traceback.print_exc()

jprmachado · 2023-03-25T16:11:49Z

The #26 and #24 should take into consideration the changes made by this PR

…per ro-crate-metadata

hariszaf and others added 23 commits March 22, 2023 01:47

init branch for issue #18

a6ec747

software versions file

36e80fb

remove unecessary output files; init ro-crate script

5c85a97

remove unecessary output files; init ro-crate script

78333a1

Merge remote-tracking branch 'origin/develop' into ro-crate

7ed085d

minor edits

4f5e357

Merge branch 'develop' into ro-crate

29f5ff3

fix summary qc

08c62cc

Merge remote-tracking branch 'origin/develop' into ro-crate

664b825

fix qc_summary output filenames

e88632d

fix qc_summary output filenames

cfc54fd

remove prov output folder

e4dc458

all test cases

121c570

logic to address #18

5d96e20

build ro-crate as zip #18

a3d5ffd

ro-crate example #18

74ab4f2

edits to keep only for the record; remove file afterwards

2821479

drop script

06fa1f2

edit wf output to remove chunks and merge ips parts to single file

19e53db

add embrc records #18

746c603

return ips chunks

e5fa080

set threads properly to exploit fat node on zorbas hpc

7a9eb65

edit ENA description #18

d45aaca

hariszaf requested review from jprmachado and steninidak March 25, 2023 00:13

hariszaf added 2 commits March 25, 2023 14:35

minor edits reg #18

9a537aa

ignore dev outputs

a3db2c9

jprmachado approved these changes Mar 25, 2023

View reviewed changes

hariszaf added 4 commits March 25, 2023 20:31

cwd var

e2ceb74

add dataset metadata #18

c327600

remove output dir so we keep only the .zip file that includes the pro…

4e7f671

…per ro-crate-metadata

update the example ro-crate-metadata #18

d042146

hariszaf merged commit 07b331c into develop Mar 25, 2023

Ro crate for the metaGOflow output #27

Ro crate for the metaGOflow output #27

Uh oh!

Conversation

hariszaf commented Mar 25, 2023

Uh oh!

hariszaf commented Mar 25, 2023

Uh oh!

jprmachado left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jprmachado commented Mar 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants