This project demonstrates various data preprocessing and visualization techniques using a traffic accidents dataset. The dataset is downloaded from OpenML (data_id=42803) and analyzed using Python libraries such as Pandas, Matplotlib, Seaborn, and Missingno.
The dataset is used is here
https://www.openml.org/search?type=data&sort=runs&id=42803&status=active- Python 3.x
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Missingno
You can install the required libraries using pip:
pip install pandas numpy scikit-learn matplotlib seaborn missingno- Clone the repository:
git clone https://github.com/your-username/EDA-analysis.git- Navigate to the project directory:
cd EDA-analysis- Open the Jupyter notebook:
jupyter notebook- Run the notebook
edamone.ipynbto see the data analysis in action.
- The dataset is downloaded from OpenML.
- Basic exploration includes displaying random samples, dataset size, and data types.
- Conversion of data types for specific columns.
- Descriptive statistics for non-numerical features.
- Computation of unique values for numerical features.
- Identification and removal of duplicate rows.
- Visualization and handling of missing values.
- Various plots for numerical and non-numerical features.
- Analysis of numerical and categorical features.
- Identification of continuous and discrete features.
- Pair plots and strip plots.
- Computation and visualization of correlation matrix.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to contribute to this project by submitting issues or pull requests. Happy analyzing!