thesis-dataset

Dataset Collection for Thesis.

Features

Raw dataset (captions & comments) for Balinese text classification: instagram.json
Some of the Instagram account source:
Example scraping data (captions & comments) from various social media: dataset.json
Notebook for scraping data: notebook.ipynb
Notebook for pre-processing dataset: preprocessing.ipynb
- Dataset will be anotated by Penyuluh Bahasa Bali: dataset.xlsx
- Dataset annotation result: thesis-ml/dataset

This project mainly using Selenium and BeautifulSoup4 for scraping data from various social media. So, you basically need:

git clone https://github.com/putuwaw/thesis-dataset.git

pip install -r requirements.txt

pip install ipykernel selenium beautifulsoup4 pydantic pandas linggapy openpyxl requests

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chromedriver		chromedriver
dataset.json		dataset.json
dataset.xlsx		dataset.xlsx
instagram.json		instagram.json
notebook.ipynb		notebook.ipynb
preprocessing.ipynb		preprocessing.ipynb
requirements.txt		requirements.txt