In this repository, I will be sharing R and Python command-line scripts that I have learned and developed in the field of machine learning. So many algorithms and methods have been developed in machine learning that it has become quite difficult to keep track of all of them. For this reason, I will share some of the ML exercises I’ve done based on topics I’ve encountered and learned from various sources. The repository is open to contributions. Please feel free to share your ideas via Pull Requests or Issues.
- Splitting Data into Training and Test Data Sets | Data:
iris - k-Nearest Neighbors Algorithm (Binary) | Data:
breast_cancer - CART - Decision Tree (Classification) | Data:
iris - Random Forest (Classification) | Data:
faces - Random Forest (Regression) | Data:
california_housing - Simple Linear Regression | Data:
diabetes - Multiple Linear Regression | Data:
diabetes - Logistic Regression (Binary) | Data:
heart_disease - Polynomial Regression | Data:
california_housing - MLP Neural Network (Regression) | Data:
Hitters - Lasso Regression (L1 Regularization) | Data:
Hitters - Ridge Regression (L2 Regularization) | Data:
Hitters - Elastic Net | Data:
Hitters - Naive Bayes | Data:
iris - Support Vector Machines (Classification) | Data:
digits - k-Means Clustering | Data: Generated via
sklearn.datasets - Hierarchical Clustering | Data: Generated via
sklearn.datasets - DBSCAN | Data: Generated via
sklearn.datasets - Principal Component Analysis (PCA) | Data:
iris - Linear Discriminant Analysis (LDA) | Data:
MNIST - t-Distributed Stochastic Neighbor Embedding (t-SNE) | Data:
MNIST - PCA versus LDA | Data:
iris - Grid Search CV versus Randomized Search CV (Over kNN) | Data:
iris - Grid Search CV versus Randomized Search CV (Over Decision Trees) | Data:
iris - Grid Search CV versus Randomized Search CV (Over SVM) | Data:
iris - Clustering Algorithms Comparison | Data: Generated via
sklearn.datasets
- Splitting Data into Training and Test Data Sets | Data:
iris - k-Nearest Neighbors Algorithm | Data:
iris - Bagging | Data:
airquality - Bagging | Data:
penguins - k-Means Clustering | Data:
USArrests - Simple Linear Regression | Data:
diabetes - Multiple Linear Regression | Data:
diabetes
Note. Contributions are very welcome. Please do not hesitate to contribute.