-
Notifications
You must be signed in to change notification settings - Fork 36
Description
pairtools/pairtools/cli/select.py
Line 230 in 7e69d6c
| body_stream, condition, column_names, type_cast, startup_code |
evaluate_stream receives a full/unmodified stream of pairs, yet column_names comes from a modified header and column_scheme refers to a reduced list of columns as well - this can cause a "silent" bug when it looks like pairs have been filtered, yet not all of the conditions would be met ...
Example:
say we start with a pairs-file with columns: #columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type walk_pair_index walk_pair_type read_len1 read_len2 mapq1 mapq2 ...
and say we --remove-columns read_len1,read_len2 - then any filtering expression referring to mapq1/2 would actually be using columns corresponding to read_len1/2 instead ... leading to incorrect results