Take a closer look at AI algorithms that focus on interpretability or explainability, especially those that can be applied to malware classification. Explore different feature extraction methods that help reduce data dimensionality, improve accuracy, and lower the false positive rate (FPR). Based on your findings, suggest a combination of AI models and feature extraction techniques that could boost detection accuracy while keeping FPR low. Run experiments on the EMBER dataset and compare your results with those reported in references.
| Task | Date | Status |
|---|---|---|
| Read papers on existing work to understand what we will be working with | ~30.10.2024 | ✅ |
| Research various decision tree types and libraries in Python | ~07.03.2025 | ✅ |
| Train a model on small reduced dataset, measure performance | ~14.03.2025 | ✅ |
| Try to load the largest dataset, either locally or explore cloud solutions | ~10.04.2025 | ✅ |
| Train model on the largest dataset, reproduce model from referenced paper and compare results | ~19.04.2025 | ✅ |
| Design an algorithm for section anonymization | ~26.04.2025 | ✅ |
| Train a model on anonymized dataset | ~27.04.2025 | ✅ |
| Setup a solution for experiment tracking | ~28.04.2025 | ✅ |
| Create a skeleton of text part, write few pages | ~10.05.2025 | ✅ |
| Fixed issue in anonymization algorithm and retrained models | ~12.06.2025 | ✅ |
| Became familiar with PyTorch and used it to train a neural network | ~13.06.2025 | ✅ |
| Tried multiple TREPAN libraries and conducted experiments | ~15.06.2025 | ✅ |
| Experimented with post-hoc explainability methods, specifically SHAP and LIME | ~17.06.2025 | ✅ |
| Written several pages of thesis covering theoretical background | ~12.09.2025 | ✅ |
| Experimented with two other TREPAN libraries | ~14.09.2025 | ✅ |
| Trained surrogate decision tree model on predicitons of neural network | ~02.10.2025 | ✅ |
| Creating graphs and visualizations of results | ~18.10.2025 | ✅ |
| Written chapter "Experiments" that describes results | ~11.11.2025 | ✅ |
| Tried to analyse mislcassified examples, tSNE dimensionality reduction | ~02.12.2025 | ✅ |