An advanced machine learning project focused on automated pulsar detection in radio astronomy data. The project implements and compares multiple classification algorithms to address the challenge of identifying pulsars among large volumes of radio telescope data with high false-positive rates.
- Implementation of four machine learning algorithms:
- Logistic Regression (with Ridge, Lasso, and Elastic Net regularization)
- K-Nearest Neighbors (K-NN)
- Support Vector Machines (Linear and RBF kernels)
- Random Forest
- Comprehensive data preprocessing pipeline
- Advanced model evaluation and comparison
- Handling of class imbalance
- Performance optimisation through hyperparameter tuning
- Best performing model: Random Forest
- 98.32% accuracy
- 0.899 Kappa score
- Lowest false-negative rate (26)
- Notable performances:
- LASSO regression: 97.45% accuracy
- SVM (Linear): 97.08% accuracy
- K-NN: Highest specificity (93.32%)
HTRU2 Dataset features:
- 8 numerical features from pulsar candidates
- 17,898 total observations
- Class imbalance (~91% non-pulsars)
- Data preprocessing:
- Feature scaling
- Class balancing
- Parameter optimisation
- Model evaluation:
- 10-fold cross-validation
- ROC curve analysis
- Comprehensive metric comparison
- Python
- Scikit-learn
- Caret
- NumPy
- Pandas
- Matplotlib/Seaborn
- Transparent methodology
- Reproducible results
- Ethical implications in astronomical research
- Scientific collaboration considerations
-
Implementation of Convolutional Neural Networks
-
Log transformations for linear models
-
Integration with larger datasets
-
Enhanced feature engineering
Click on the thumbnail to view the Demonstration video
