Skip to content

Commit d58fdd5

Browse files
authored
Merge pull request #19 from AlexandrovLab/dev_v3
SPA v0.0.6
2 parents 493f581 + e75a2a9 commit d58fdd5

30 files changed

+473
-167
lines changed

.DS_Store

6 KB
Binary file not shown.

README.md

Lines changed: 86 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
[![License](https://img.shields.io/badge/License-BSD\%202--Clause-orange.svg)](https://opensource.org/licenses/BSD-2-Clause)
22
[![Build Status](https://api.travis-ci.com/AlexandrovLab/SigProfilerAssignment.svg)](https://app.travis-ci.com/AlexandrovLab/SigProfilerAssignment)
33

4-
# SigProfilerAssignment
54

5+
6+
<img src="SigProfilerAssignment/src/figures/SigProfilerAssignment.png" alt="drawing" width="1000"/>
7+
8+
# SigProfilerAssignment
69
SigProfilerAssignment is a new mutational attribution and decomposition tool that performs the following functions:
710
- Attributing a known set of mutational signatures to an individual sample or multiple samples.
811
- Decomposing de novo signatures to COSMIC signature database.
@@ -25,9 +28,37 @@ Unzip the contents of SigProfilerExtractor-master.zip or the zip file of a corre
2528
$ cd SigProfilerAssignment-master
2629
$ pip install .
2730
```
31+
## Signature Subtypes
32+
```python
33+
signature_subgroups = ['remove_MMR_deficiency_signatures',
34+
'remove_POL_deficiency_signatures',
35+
'remove_HR_deficiency_signatures' ,
36+
'remove_BER_deficiency_signatures',
37+
'remove_Chemotherapy_signatures',
38+
'remove_APOBEC_signatures',
39+
'remove_Tobacco_signatures',
40+
'remove_UV_signatures',
41+
'remove_AA_signatures',
42+
'remove_Colibactin_signatures',
43+
'remove_Artifact_signatures',
44+
'remove_Lymphoid_signatures']
45+
```
2846

2947

30-
Decomposes the De Novo Signatures into COSMIC Signatures and assigns COSMIC signatures into samples
48+
| Signature Subgroup | SBS Signatures that are excluded |
49+
| ----------- | ----------- |
50+
|MMR_deficiency_signatures| 6, 14, 15, 20, 21, 26, 44|
51+
|POL_deficiency_signatures| 10a, 10b, 10c, 10d, 28|
52+
|HR_deficiency_signatures| 3|
53+
|BER_deficiency_signatures| 30, 36|
54+
|Chemotherapy_signatures| 11, 25, 31, 35, 86, 87, 90|
55+
|APOBEC_signatures| 2, 13|
56+
|Tobacco_signatures |4, 29, 92|
57+
|UV_signatures| 7a, 7b, 7c, 7d, 38|
58+
|AA_signatures| 22|
59+
|Colibactin_signatures| 88|
60+
|Artifact_signatures| 27, 43, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60|
61+
|Lymphoid_signatures| 9, 84, 85|
3162

3263
<!--
3364
```python
@@ -36,29 +67,62 @@ spa_analyze( samples, output, signatures=None, signature_database=None,decompo
3667
genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False):
3768
``` -->
3869
### Decompose Fit
70+
Decomposes the De Novo Signatures into COSMIC Signatures and assigns COSMIC signatures into samples.
71+
<img src="SigProfilerAssignment/src/figures/decomp_pic.jpg" alt="drawing" width="600"/>
72+
3973
```python
4074
from SigProfilerAssignment import Analyzer as Analyze
41-
Analyze.decompose_fit(samples, output, signatures=None, signature_database=None,genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False)
75+
Analyze.decompose_fit(samples,
76+
output,
77+
signatures=signatures,
78+
signature_database=sigs,
79+
genome_build="GRCh37",
80+
verbose=False,
81+
new_signature_thresh_hold=0.8,
82+
signature_subgroups=signature_subgroups)
4283
```
4384
### *De Novo* Fit
85+
Attributes mutations of given Samples to input denovo signatures.
86+
<img src="SigProfilerAssignment/src/figures/denovo_fit.jpg" alt="drawing" width="600"/>
87+
4488
```python
4589
from SigProfilerAssignment import Analyzer as Analyze
46-
Analyze.denovo_fit(samples, output, signatures=None, signature_database=None,genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False)
90+
Analyze.denovo_fit( samples,
91+
output,
92+
signatures=signatures,
93+
signature_database=sigs,
94+
genome_build="GRCh37",
95+
verbose=False)
4796
```
48-
### Cosmic Fit
97+
### COSMIC Fit
98+
Attributes mutations of given Samples to input COSMIC signatures. Note that penalties associated with denovo fit and COSMIC fits are different.
99+
<img src="SigProfilerAssignment/src/figures/cosmic_fit.jpg" alt="drawing" width="600"/>
100+
49101
```python
50102
from SigProfilerAssignment import Analyzer as Analyze
51-
Analyze.cosmic_fit(samples, output, signatures=None, signature_database=None,genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False)
103+
Analyze.cosmic_fit( samples,
104+
output,
105+
signatures=None,
106+
signature_database=sigs,
107+
genome_build="GRCh37",
108+
verbose=False,
109+
collapse_to_SBS96=False,
110+
signature_subgroups=signature_subgroups,
111+
make_plots=True)
52112
```
53-
## Parameters
113+
## Main Parameters
54114
| Parameter | Variable Type | Parameter Description |
55115
| --------------------- | -------- |-------- |
56116
| **signatures** | String | Path to a tab delimited file that contains the signaure table where the rows are mutation types and colunms are signature IDs. |
57117
| **activities** | String | Path to a tab delimilted file that contains the activity table where the rows are sample IDs and colunms are signature IDs. |
58118
| **samples** | String | Path to a tab delimilted file that contains the activity table where the rows are mutation types and colunms are sample IDs. |
59119
| **output** | String | Path to the output folder. |
60120
| **genome_build** | String | The genome type. Example: "GRCh37", "GRCh38", "mm9", "mm10". The default value is "GRCh37" |
121+
| **new_signature_thresh_hold**|Float | Parameter in Cosine similarity to declare a new signature. Applicable for decompose fit only. The default value is 0.8 |
122+
| **make_plots** | Boolean | Toggle on and off for making and saving all plots. Default value is True. |
123+
| **signature_subgroups** | List | Removes the signatures corresponding to specific subtypes for better fitting. The usage is given above. Default value is None. |
61124
| **verbose** | Boolean | Prints statements. Default value is False. |
125+
62126
63127

64128
#### SPA analysis Example
@@ -70,15 +134,23 @@ import SigProfilerAssignment as spa
70134
from SigProfilerAssignment import Analyzer as Analyze
71135

72136
#set directories and paths to signatures and samples
73-
dir_inp = spa.__path__[0]+'/data/Examples/'
74-
signatures = dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/SBS96_S3_Signatures.txt"
75-
activities=dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Activities/SBS96_S3_NMF_Activities.txt"
76-
samples=dir_inp+"Input_scenario_8/Samples.txt"
77-
output="output_example/"
78-
sigs= "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database
137+
dir_inp = spa.__path__[0]+'/data/Examples/'
138+
signatures = dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Signatures/SBS96_S3_Signatures.txt"
139+
activities = dir_inp+"Results_scenario_8/SBS96/All_Solutions/SBS96_3_Signatures/Activities/SBS96_S3_NMF_Activities.txt"
140+
samples = dir_inp+"Input_scenario_8/Samples.txt"
141+
output = "output_example/"
142+
sigs = "COSMIC_v3_SBS_GRCh37_noSBS84-85.txt" #Custom Signature Database
79143

80144
#Analysis of SP Assignment
81-
Analyze.cosmic_fit( samples, output, signatures=None,signature_database=sigs,genome_build="GRCh37", verbose=False)
145+
Analyze.cosmic_fit( samples,
146+
output,
147+
signatures=None,
148+
signature_database=sigs,
149+
genome_build="GRCh37",
150+
verbose=False,
151+
collapse_to_SBS96=False,
152+
signature_subgroups=signature_subgroups,
153+
make_plots=True)
82154

83155
```
84156
## <a name="copyright"></a> Copyright

SigProfilerAssignment/.DS_Store

0 Bytes
Binary file not shown.

SigProfilerAssignment/Analyzer.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
from SigProfilerAssignment import decomposition as decomp
22

3-
def decompose_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05, nnls_remove_penalty=0.01, initial_remove_penalty=0.05,genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None):
3+
def decompose_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05, nnls_remove_penalty=0.01, initial_remove_penalty=0.05,genome_build="GRCh37", make_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None,new_signature_thresh_hold=0.8,signature_subgroups=None):
44

5-
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_decomposition_plots=make_decomposition_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= True,denovo_refit_option=False,cosmic_fit_option=False,devopts=devopts)
5+
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_plots=make_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= True,denovo_refit_option=False,cosmic_fit_option=False,devopts=devopts,new_signature_thresh_hold=new_signature_thresh_hold,signature_subgroups=signature_subgroups)
66

7-
def denovo_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05,nnls_remove_penalty=0.01, initial_remove_penalty=0.05, genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None):
8-
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_decomposition_plots=make_decomposition_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= False,denovo_refit_option=True,cosmic_fit_option=False,devopts=devopts)
7+
def denovo_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05,nnls_remove_penalty=0.01, initial_remove_penalty=0.05, genome_build="GRCh37", make_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None,new_signature_thresh_hold=0.8):
8+
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_plots=make_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= False,denovo_refit_option=True,cosmic_fit_option=False,devopts=devopts)
99

10-
def cosmic_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05, nnls_remove_penalty=0.01, initial_remove_penalty=0.05,genome_build="GRCh37", make_decomposition_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None):
11-
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_decomposition_plots=make_decomposition_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= False,denovo_refit_option=False,cosmic_fit_option=True,devopts=devopts)
10+
def cosmic_fit(samples, output, signatures=None, signature_database=None,nnls_add_penalty=0.05, nnls_remove_penalty=0.01, initial_remove_penalty=0.05,genome_build="GRCh37", make_plots=True, collapse_to_SBS96=True,connected_sigs=True, verbose=False,devopts=None,signature_subgroups=None):
11+
decomp.spa_analyze(samples=samples, output=output, signatures=signatures, signature_database=signature_database,nnls_add_penalty=nnls_add_penalty, nnls_remove_penalty=nnls_remove_penalty, initial_remove_penalty=initial_remove_penalty,genome_build=genome_build, make_plots=make_plots, collapse_to_SBS96=collapse_to_SBS96,connected_sigs=connected_sigs, verbose=verbose,decompose_fit_option= False,denovo_refit_option=False,cosmic_fit_option=True,devopts=devopts,signature_subgroups=signature_subgroups)
267 Bytes
Binary file not shown.
395 Bytes
Binary file not shown.
-15 Bytes
Binary file not shown.
3 Bytes
Binary file not shown.
Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)