You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added merging of headers when using the sfm command with a directory
containing multiple files as input.
The merging respects the requirements outlined in the SAM spec:
- order of sequence dictionaries is kept
- read group identifiers are made unique if needed + optional rg tags of
reads are updated accordingly
- program line identifiers are made unique if needed + optional pg tags
of reads are updated accordingly
- comment lines are merged
- the sorting order is set to Unknown
If elprep cannot merge while guaranteeing the above constraints, an
error is produced.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -772,7 +772,7 @@ file is created for each entry in the sequence dictionary.
772
772
773
773
The elprep split command requires two arguments: 1) the input file or a path to multiple input files and 2) a path to a directory where elPrep can store the split files. The input file(s) can be .sam or .bam. It is also possible to use /dev/stdin as the input for using Unix pipes. There are no structural requirements on the input file(s) for using elprep split. For example, it is not necessary to sort the input file, nor is it necessary to convert to .bam or index the input file.
774
774
775
-
Warning: If you pass a path to multiple input files to the elprep split command, elprep assumes that they all have the same (or compatible) headers, and just picks the first header it finds as the header for all input files. elprep currently does not make an attempt to resolve potential conflicts between headers, especially with regard to the @SQ, @RG, or @PG header records. We will include proper merging of different SAM/BAM files in a future version of elprep. In the meantime, if you need proper merging of SAM/BAM files, please use samtools merge, Picard MergeSamFiles, or a similar tool. (If such a tool produces SAM file as output, it can be piped into elprep using Unix pipes.)
775
+
If you pass a path to multiple input files to the elprep split command, elprep attempts to merge the headers, resolving potential conflicts by adhering to the SAM specification. Specifically, while merging headers: 1) the order of sequence dictionaries must be kept (@sq); 2) read group identifiers must be unique (@rg) and in case of collisions elprep makes the IDs unique and updates optional @rg tags in alignments accordingly; 3) program identifiers must be unique (@pg) and elprep changes IDs to be unique in case of collisions and updates optional @pg tags in alignments accordingly; 4) comment lines are all merged (any order); 5) the order of the header is updated (GO entry). In case the headers are incompatible and merging violates any of the SAM specification requirements, elPrep produces an error and aborts the execution of the command.
776
776
777
777
elPrep creates the output directory denoted by the output path, unless the directory already exists, in which case elPrep may override the existing files in that directory. Please make sure elPrep has the correct permissions for writing that directory.
0 commit comments