MSMuSig2 is a tool to identify driver genes in Microsatellite-Unstable (MSI) cancers. It analyzes recurrent MS-indel mutation counts using statistical models (Exponential, Gaussian, Weibull, Log-Normal) to find significant mutations.
-
Clone the repo:
git clone [https://github.com/yourusername/MSMuSig2.git](https://github.com/yourusername/MSMuSig2.git) cd MSMuSig2 -
Install R packages: Run this command in your R session:
install.packages(c("magrittr", "minpack.lm", "ggplot2", "gridExtra", "MASS", "dplyr", "data.table", "fitdistrplus", "survival", "car", "ggrepel", "ggpubr", "optparse", "metap"))
The main script is scripts/analysis.R.
Your input CSV file must contain the following columns:
CHRSTARTPATTERNREFERENCE_REPEATShgnc_symbolCOUNTS
Rscript scripts/analysis.R --input data/your_data.csv --output results/
-ior--input: (Required) Path to your input CSV file.-oor--output: (Optional) Path to the output directory. (Default:results)-mor--models: (Optional) Comma-separated list of models (e.g.,lognormal,weibull). (Default:lognormal,exponential)-tor--threshold: (Optional) Minimum motif occurrence count to analyze. (Default:45)
All results are saved to the output directory (default: results/).
Key Outputs:
- Raw Results:
<model>.csv(e.g.,lognormal.csv)- Contains all p-values for every mutation.
- Aggregated Results:
<model>_aggregated.csv- Contains gene-level combined p-values (using Fisher's method).
- Significant Hits:
Significant_<model>.csvandSignificant_<model>_aggregated.csv- Filtered lists of significant mutations and genes (adjusted p-value < 0.1).
- Plots:
QQplot_<model>.svgandIndels_Models_fit.svg- Visualizations of p-value distributions and model fits.
- Model Comparison:
Indels_Models_fit.csv- AIC/BIC scores for each model fit.
This project is licensed under the MIT License.
Hagay Ladany, Dr. Yosef Maruvka.
For questions, please open an issue or contact [email protected].