A machine learning–powered geochemical classification tool designed to analyze and predict the type of mining samples.
This project leverages machine learning algorithms to automatically classify geochemical samples into three categories:
- Sterile – No economic interest
- Potential – Moderate mining potential
- Ore – High economic value
The system bases its predictions on 36 geochemical features.
- Automatic classification of geochemical samples
- Threshold analysis to refine classification criteria
- Smart preprocessing with handling of missing or censored values
- Algorithm benchmarking (SVM, Random Forest, etc.)
- Real-time predictions on new samples
- Detailed reports with performance metrics
src/
├── train.py # Model training
├── infer.py # Predictions on new samples
├── preprocess.py # Data preprocessing
├── analyze_ag.py # Silver-focused analysis
├── analyze_thresholds.py # Threshold optimization
├── geochemical_analysis.py # General geochemical analyses
├── potential_analysis.py # Analysis of potential samples
└── silver_analysis.py # Silver-specific analysis
- Python 3.8+
- pip
pip install -r requirements.txtscikit-learn– Machine learning algorithmspandas– Data handlingnumpy– Numerical computationsjoblib– Model serializationmatplotlib– Data visualization
python src/train.pyThis trains multiple models and saves the best one under models/geochem_pipeline.joblib.
from src.infer import predict_sample
# Example input (36 geochemical values)
sample = [57.86, 16.46, 7.96, 0.92, 3.39, 3.29, 0.11, 0.74, 0.23, 2.77,
5.5, 1400, 117, 1287, 0.9, 20, 4, 91, 119, 123, 10, 3, 49, 12,
43, 68, 65, 1.76, 9, 40, 20, 110, 23, 17, 164, 4.46]
prediction = predict_sample(sample)
print(f"Classification: {prediction}")python src/analyze_thresholds.py# General analysis
python src/geochemical_analysis.py
# Silver analysis
python src/silver_analysis.py
# Potential sample analysis
python src/potential_analysis.pyThe model uses 36 geochemical parameters, including:
SiO₂, Al₂O₃, Fe₂O₃, MgO, CaO, Na₂O, K₂O, TiO₂, P₂O₅, MnO
Cu, Pb, Zn, Ag, Au, As, Sb, Bi, Cd, Co, Cr, Ni, Mo, W, Sn, V, Ba, Sr, etc.
Typical results:
- Accuracy: >85%
- Recall: >80%
- F1-score: >82%
Detailed metrics are stored in the training reports.
Samples must be provided as a list of 36 values in the exact expected order.
The preprocessing step automatically:
- Converts decimal commas → points
- Handles values like
<0.1→ converted to half (0.05) - Fills missing values → 0.0
Configurable parameters in the scripts:
use_raw_data=True→ Enables preprocessing of raw inputtest_size=0.3→ Share of data reserved for testingrandom_state=42→ Seed for reproducibility
TEST - Ore Sample:
Normalized data:
Min: -1.234
Max: 2.156
Mean: 0.045
Class probabilities:
Sterile: 0.123
Potential: 0.234
Ore: 0.643
Result: Ore
Contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-feature) - Commit your changes (
git commit -m 'Add new feature') - Push your branch (
git push origin feature/new-feature) - Open a Pull Request
This project is under the MIT License. See the LICENSE file for details.
For questions or suggestions, please open an issue on GitHub.
⭐ If you find this project useful, don’t forget to give it a star!