Skip to content

Conversation

@RosiTea
Copy link
Collaborator

@RosiTea RosiTea commented Oct 3, 2025

Some small bugs I encountered and fixed during my test run on the Cambridge HPC.

  1. Change suffix in line 57 in SignificantKmerAnalysis.nf because for some reason the GFF files passed to the process now has .gff3 suffix instead of .gff, while the paths in references.txt file generated by the process WriteReferenceText are .gff in suffix.

  2. Change line 214-216 of main.nf (subsequently line 88 and line 95 of SignificantKmerAnalysis.nf) because it seems that we are calling an absolute path on the host from inside the container when trying to run the R script, and when the task runs under Singularity, that host path isn’t mounted in the container, so Rscript can’t see it and errors with “No such file or directory”. I get around this by now adding a plot_script channel to pass it to the process, and Nextflow now sees the R script as a file and carries it to the work directory.

  3. Change line 5 in BaktaAnnotate.nf to update the container.

By the way I also made some minor changes on the input file examples (e.g. cleaning the file path to hide the Sanger-ness in them) and edited README to ensure it targets the communication to broader users.

@RosiTea RosiTea requested a review from Lfulcrum October 3, 2025 13:15
do
if [[ \${sample_id} != "sample_id" ]]; then
echo -e "\${assembly_path}\\t\${sample_id}.gff\\tdraft"
echo -e "\${assembly_path}\\t\${sample_id}.gff3\\tdraft"
Copy link
Contributor

@Lfulcrum Lfulcrum Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I wonder - is this related to the annotation method you choose? Perhaps instead of assuming the file extension here when it could vary by tool, e.g. between bakta and prokka or user-specified annotation files, you could:

  • adapt the AnnotateKmers process to stage the gff files with a consistent extension OR
  • name the annotation files output by the tools consistently and document that a specific extension is required if using --mygff

The former is probably the better solution if possible. However, short of staging the gff/gff3 files and then making a symlink to them with a different name, I see no easy/elegant way to do this that also preserves caching 🤔

I suppose you could actually construct the reftext via channel operators joining manifest_ch and gff_files by sample_id? Maybe that would be an approach that prevents us from having to rename anywhere.

The issue I have with this change is that it may work for some input files, but not others.

genehit = AnnotateKmers.out.annotated_kmers_out

GeneHitPlot(genehit)
plot_script = Channel.value(file("${projectDir}/scripts/gene_hit_summary_plot.R"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit odd that this is necessary - it feels wrong to supply the script as a value channel to the process. Can't we workaround this using a more native mechanism, e.g. putting it inside a bin directory, instead of scripts in the root of this repo. I think nextflow will automatically make such scripts available even running within a singularity container.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have to supply a shebang to your gene_hit_summary_plot.R script, like:

#!/usr/bin/env Rscript

And make the script execuable with chmod. Then within the GeneHitPlot process, we could simply have:

gene_hit_summary_plot.R ${genehit}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants