-
Notifications
You must be signed in to change notification settings - Fork 0
Sample Processing
When included in your sample prep, the UMI will reside in the R2 reads and will be handled by the Toolkit. If you only did single end (SE) sequencing you will not have a UMI and will not be able to deduplicate reads.
The deduplication occurs by the barcode parser first identifying the UMI based on the position in the read and labeling it by using the XU tag for downstream processing. After reads are aligned to the reference genome, the deduplication works by finding a sequence and UMI and building a graph with corrections for potential sequencing errors in the UMI, and collapses the graph to remove those reads determined to be duplicates.
The process of UMI tagging is done at the BAM file level in the process UMI tagging as part of the deduplication. Where the UMI is added to this tag. There are other tags that can be used and have a full list from the SAM specification.
This can occur when someone runs SEQuoia Complete data sets in the SEQuoia Express Toolkit, or if you forgot to add the UMI in your sample prep.
There are default trimming quality cutoffs. Our defaults are suggestions and can be modified by the user to suit their needs. After running the Toolkit for the first time, you will have the FASTQC output that will show the quality of the reads and allow for more informative trimming if low quality reads are present.
The SEQuoia Express toolkit has an option to allow users to filter the reads based on a threshold. the option to do this is two parts:
-
minGeneType = "none": this can be["none","reads","RPKM","TPM"] -
minGeneCutoff = 0: threshold you want to use The results of this filtering are not in the report folder, instead they are put in output/SampleFiles/sample_name/RNACounts