Skip to content

An advanced data analysis on Traffic Accidents by using dataset by OPENML

Notifications You must be signed in to change notification settings

vinod-polinati/EDA-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Traffic Accidents Analysis using Correlation Analysis & Feature Analysis

This project demonstrates various data preprocessing and visualization techniques using a traffic accidents dataset. The dataset is downloaded from OpenML (data_id=42803) and analyzed using Python libraries such as Pandas, Matplotlib, Seaborn, and Missingno.

Dataset

The dataset is used is here

https://www.openml.org/search?type=data&sort=runs&id=42803&status=active

Requirements

  • Python 3.x
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • Seaborn
  • Missingno

You can install the required libraries using pip:

pip install pandas numpy scikit-learn matplotlib seaborn missingno

Usage

  1. Clone the repository:
git clone https://github.com/your-username/EDA-analysis.git
  1. Navigate to the project directory:
cd EDA-analysis
  1. Open the Jupyter notebook:
jupyter notebook
  1. Run the notebook edamone.ipynb to see the data analysis in action.

Notebook Overview

Data Download and Basic Exploration

  • The dataset is downloaded from OpenML.
  • Basic exploration includes displaying random samples, dataset size, and data types.

Data Type Conversion and Descriptive Statistics

  • Conversion of data types for specific columns.
  • Descriptive statistics for non-numerical features.

Unique Values and Duplicates

  • Computation of unique values for numerical features.
  • Identification and removal of duplicate rows.

Missing Values

  • Visualization and handling of missing values.

Data Visualization

  • Various plots for numerical and non-numerical features.

Feature Analysis

  • Analysis of numerical and categorical features.
  • Identification of continuous and discrete features.
  • Pair plots and strip plots.

Correlation Analysis

  • Computation and visualization of correlation matrix.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Feel free to contribute to this project by submitting issues or pull requests. Happy analyzing!

About

An advanced data analysis on Traffic Accidents by using dataset by OPENML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published