Otomoto Data Mining

We want to scrape car advertisements and build models capable of predicting price based on car details.

Data Storage

Data is saved as Parquet files in the data/ directory, with each page of results stored as page_XXX.parquet for efficient storage and analysis.

classes per step:
- 14 cls -> 20_000 step,
- 10 cls -> 30_000 step,
- 7 cls -> 40_000 step,
min_price=20_000
max_price=300_000

*Above steps are for the linear bins. Finally, in the models logarithmic split has been chosen but basing number of bins from linear one.

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.vscode		.vscode
analysis		analysis
data		data
models		models
scripts		scripts
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt