You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SigProfilerAssignment is a new mutational attribution and decomposition tool that performs the following functions:
10
-
- Attributing a known set of mutational signatures to an individual sample or multiple samples.
11
-
- Decomposing de novo signatures to COSMIC signature database.
12
-
- Attributing COSMIC database or a custom signature database to given samples.
8
+
SigProfilerAssignment enables assignment of previously known mutational signatures to individual samples and individual somatic mutations. The tool refits different types of reference mutational signatures, including [COSMIC signatures](https://cancer.sanger.ac.uk/signatures/), as well as custom signature databases. Refitting of known mutational signatures is a numerical optimization approach that not only identifies the set of operative mutational signatures in a particular sample, but also quantifies the number of mutations assigned to each signature found in that sample. SigProfilerAssignment makes use of [SigProfilerMatrixGenerator](https://github.com/AlexandrovLab/SigProfilerMatrixGenerator) and [SigProfilerPlotting](https://github.com/AlexandrovLab/SigProfilerPlotting), seamlessly integrating with other [SigProfiler tools](https://cancer.sanger.ac.uk/signatures/tools/).
9
+
10
+
For users that prefer working in an R environment, a wrapper package is provided and can be found and installed from: https://github.com/AlexandrovLab/SigProfilerAssignmentR. Detailed documentation can be found at: https://osf.io/mz79v/wiki/home/.
13
11
14
-
The tool identifies the activity of each signature in the sample and assigns the probability for each signature to cause a specific mutation type in the sample. The tool makes use of SigProfilerMatrixGenerator, SigProfilerExtractor and SigProfilerPlotting.
15
12
13
+
## Table of contents
14
+
-[Installation](#installation)
15
+
-[Running](#running)
16
+
-[Main Parameters](#parameters)
17
+
-[Signature Subgroups](#subgroups)
18
+
-[Examples](#examples)
19
+
-[_De novo_ extraction of mutational signatures downstream analysis](#denovo)
20
+
-[Copyright](#copyright)
21
+
-[Contact Information](#contact)
16
22
17
-
## Installs
18
-
for installing from PyPi in new conda environment
23
+
## <aname="installation"></a> Installation
19
24
25
+
Install the current stable PyPi version of SigProfilerAssignment:
20
26
```
21
27
$ pip install SigProfilerAssignment
22
28
```
23
29
24
-
Installing this package : git clone this repo or download the zip file.
25
-
Unzip the contents of SigProfilerExtractor-master.zip or the zip file of a corresponding branch.
30
+
If mutation calling files (MAF, VCF, or simple text files) are used as input, please install your desired reference genome as follows (available reference genomes are: GRCh37, GRCh38, mm9, mm10, and rn6):
31
+
```python
32
+
$ python
33
+
from SigProfilerMatrixGenerator import install as genInstall
34
+
genInstall.install('GRCh37')
35
+
```
36
+
## <aname="running"></a> Running
37
+
38
+
Assignment of known mutational signatures to individual samples is performed using the `cosmic_fit` function. Input samples are provided using the `samples` parameter in the form of mutation calling files (VCFs, MAFs, or simple text files), segmentation files or mutational matrices. COSMIC mutational signatures v3.3 are used as the default reference signatures, although previous COSMIC versions and custom signature databases are also supported using the `cosmic_version` and `signature_database` parameters. Results will be found in the folder specified in the `output` parameter.
26
39
27
-
```bash
28
-
$ cd SigProfilerAssignment-master
29
-
$ pip install .
40
+
```python
41
+
from SigProfilerAssignment import Analyzer as Analyze
| Parameter | Variable Type | Parameter Description |
55
+
| ------ | ----------- | ----------- |
56
+
| samples | String | Path to the input somatic mutations file (if using segmentation file/mutational matrix) or input folder (mutation calling file/s). |
57
+
| output | String | Path to the output folder. |
58
+
| input_type | String | Three accepted input types:<ul><li> "vcf": if using mutation calling file/s (VCF, MAF, simple text file) as input</li><li>"seg:TYPE": if using a segmentation file as input. Please check the required format at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator#copy-number-matrix-generation. The accepted callers for TYPE are the following {"ASCAT", "ASCAT_NGS", "SEQUENZA", "ABSOLUTE", "BATTENBERG", "FACETS", "PURPLE", "TCGA"}. For example:"seg:BATTENBERG"</li><li>"matrix": if using a mutational matrix as input</li></ul>The default value is "matrix". |
59
+
| context_type | String | Required context type if `input_type` is "vcf". `context_type` takes which context type of the input data is considered for assignment. Valid options include "96", "288", "1536", "DINUC", and "ID". The default value is "96". |
60
+
| cosmic_version | Float | Defines the version of the COSMIC reference signatures. Takes a positive float among 1, 2, 3, 3.1, 3.2 and 3.3. The default value is 3.3. |
61
+
| exome | Boolean | Defines if the exome renormalized COSMIC signatures will be used. The default value is False. |
62
+
| genome_build | String | The reference genome build, used for select the appropriate version of the COSMIC reference signatures, as well as processing the mutation calling file/s. Supported genomes include "GRCh37", "GRCh38", "mm9", "mm10" and "rn6". The default value is "GRCh37". If the selected genome is not in the supported list, the default genome will be used. |
63
+
| signature_database | String | Path to the input set of known mutational signatures (only in case that COSMIC reference signatures are not used), a tab delimited file that contains the signature matrix where the rows are mutation types and columns are signature IDs. |
64
+
| exclude_signature_subgroups | List | Removes the signatures corresponding to specific subtypes to improve refitting (only available when using default COSMIC reference signatures). The usage is explained below. The default value is None, which corresponds to use all COSMIC signatures. |
65
+
| export_probabilities | Boolean | Defines if the probability matrix per mutational context for all samples is created. The default value is False. |
66
+
| export_probabilities_per_mutation | Boolean | Defines if the probability matrices per mutation for all samples are created. Only available when `input_type` is "vcf". The default value is False. |
67
+
| make_plots | Boolean | Toggle on and off for making and saving plots. The default value is False. |
68
+
| sample_reconstruction_plots | Boolean | Toggle on and off for making and saving sample reconstruction plots. The default value is False. |
69
+
| verbose | Boolean | Prints detailed statements. The default value is False. |
70
+
71
+
72
+
73
+
### <aname="subgroups"></a> Signature Subgroups
74
+
75
+
When using COSMIC reference signatures, some subgroups of signatures can be removed to improve the refitting analysis. To use this feature, the `exclude_signature_subgroups` parameter should be added, following the sintax below:
| Parameter | Variable Type | Parameter Description |
126
-
| --------------------- | -------- |-------- |
127
-
|**samples**| String | Path to input file for `input_type`:<ul><li>"matrix"</li><li>"seg:TYPE"</li></ul> Path to input folder for `input_type`:<ul><li>"vcf"</li></ul>|
128
-
|**output**| String | Path to the output folder. |
129
-
|**input_type**| String | The type of input:<br><ul><li>"matrix": used for table format inputs using a tab-separated file where the rows are mutation types and the columns are sample IDs.</li><li>"vcf": used for mutation calling file inputs (VCFs, MAFs or simple text files).</li><li>"seg:TYPE": used for a multi-sample segmentation file for copy number analysis. Please check the required format at https://github.com/AlexandrovLab/SigProfilerMatrixGenerator#copy-number-matrix-generation. The accepted callers for TYPE are the following {"ASCAT", "ASCAT_NGS", "SEQUENZA", "ABSOLUTE", "BATTENBERG", "FACETS", "PURPLE", "TCGA"}. For example, when using segmentation file from BATTENBERG then set input_type to "seg:BATTENBERG".</li></ul> The default value is "matrix".|
130
-
|**context_type**| String| Required context type if `input_type` is "vcf". `context_type` takes which context type of the input data is considered for assignment. Valid options include "96", "288", "1536", "DINUC", and "ID". The default value is "96".|
131
-
|**signatures**| String | Path to a tab delimited file that contains the signature table where the rows are mutation types and colunms are signature IDs. |
132
-
|**genome_build**| String | The reference genome build. List of supported genomes: "GRCh37", "GRCh38", "mm9", "mm10" and "rn6". The default value is "GRCh37". If the selected genome is not in the supported list, the default genome will be used. |
133
-
|**cosmic_version**| Float | Takes a positive float among 1, 2, 3, 3.1, 3.2 and 3.3. Defines the version of the COSMIC reference signatures. The default value is 3.3. |
134
-
|**new_signature_thresh_hold**| Float | Parameter in cosine similarity to declare a new signature. Applicable for decompose_fit only. The default value is 0.8. |
135
-
|**exclude_signature_subgroups**| List | Removes the signatures corresponding to specific subtypes for better fitting. The usage is given above. The default value is None. |
136
-
|**exome**| Boolean | Defines if the exome renormalized signatures will be used. The default value is False. |
137
-
|**export_probabilities**| Boolean | Defines if the probability matrix per mutational context for all samples is created. The default value is True. |
138
-
|**export_probabilities_per_mutation**| Boolean | Defines if the probability matrices per mutation for all samples are created. Only available when `input_type` is "vcf". The default value is False. |
139
-
|**make_plots**| Boolean | Toggle on and off for making and saving all plots. The default value is True. |
140
-
|**verbose**| Boolean | Prints statements. The default value is False. |
141
-
142
-
143
113
144
114
145
-
## Examples
146
-
147
-
### SPA analysis - Example for a matrix
115
+
## <aname="examples"></a> Examples
148
116
117
+
### Using mutation calling files (VCFs) as input
149
118
150
119
```python
151
-
#import modules
152
120
import SigProfilerAssignment as spa
153
121
from SigProfilerAssignment import Analyzer as Analyze
154
122
155
-
#set directories and paths to signatures and samples
## <aname="denovo"></a> _De novo_ extraction of mutational signatures downstream analysis
159
+
Additional functionalities for downstream analysis of _de novo_ extraction of mutational signatures are also available as part of SigProfilerAssignment, including assignment of _de novo_ extracted mutational signatures and decomposition of _de novo_ signatures using a known set of signatures. More information can be found on the wiki page at https://osf.io/mz79v/wiki/5.%20Advanced%20mode/.
160
+
245
161
## <aname="copyright"></a> Copyright
246
162
This software and its documentation are copyright 2022 as a part of the SigProfiler project. The SigProfilerAssignment framework is free software and is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
0 commit comments