Hi! I'm Ali (PmD) PourMohammad! This Project is just a warm-up and mostly has an educational and training purpose.
data_description.txthas information about datatrain.csvandHouse.csvare for train and evaluate a modeltest.csvdoesn't have outcomes and it's for evaluation on the kaggle site- and
housePricePrediction.ipynphas codes that I've written to solve this problem
- Drop columns with high outliers
- Drop some data with outlier values
- Drop ID col
- Convert all 'NaN' Values in 14 categorical Columns to 'NOT' (For example : "No Basement" stored as "NaN")
- Drop columns which have most null values
- Convert all categorical columns to numeric:
- Find correlated columns and drop one of each
- get_dummy all remain categorical columns