-
Notifications
You must be signed in to change notification settings - Fork 4
Oto com crawler #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Oto com crawler #10
Conversation
.gitignore
Outdated
| @@ -1 +1,2 @@ | |||
| /venv/ | |||
| /carcrawlvenv/ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toi se chinh cai nay de no nhan cac file co duoi la venv sau, k can explicit define nnay
crawlers/OtoComVnCrawler.py
Outdated
| def __init__(self, url): | ||
| super().__init__() | ||
| self.url = url | ||
| # self.log.info('Crawling %s' % self.url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nen bo comment sau debug cho de
| content = req.content | ||
| self.soup = BeautifulSoup(content.decode('utf-8', 'ignore'), 'html.parser') | ||
|
|
||
| def _replace_all(self, text, dic): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
co ham replace all trong common.normalize_price, import dung cho tien
| txt = self._remove_words(l.text.strip()) | ||
| info.append(txt) | ||
|
|
||
| if len(txt) == 8: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc qua ve zip(dict()) de k phai code dai
crawlers/OtoComVnCrawler.py
Outdated
| car['seats'] = int(car['seats']) | ||
| car['type'] = self._normalize_type(car['type']) | ||
| car['brand'] = self._get_brand(car['brand']) | ||
| # car.pop('city') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xoa cmt neu k can/dung
| @@ -0,0 +1,1516 @@ | |||
| href | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check cac file csv co href o dau sau doc file kho xu ly
data/otocomvn/otocomvn_data/kia.json
Outdated
| "source_url": "https://oto.com.vn/mua-ban-xe-kia-cerato-ha-noi/20-at-premium-san-xuat-2019-aidxc15513154", | ||
| "seats": 5, | ||
| "name": " Kia Cerato 2.0 AT Premium 2019 - 600 triệu\r\n ", | ||
| "year": " 2019", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
year, co type la int
data/otocomvn/otocomvn_data/kia.json
Outdated
| "seats": 5, | ||
| "name": " Kia Cerato 2.0 AT Premium 2019 - 600 triệu\r\n ", | ||
| "year": " 2019", | ||
| "typeOfCar": " Sedan", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lower text, co space o dau, nho normalize
data/otocomvn/otocomvn_data/kia.json
Outdated
| "price": 705000000, | ||
| "source_url": "https://oto.com.vn/mua-ban-xe-kia-sorento-ha-noi/2017-22dath-full-xang-cuc-moi-aidxc5561637", | ||
| "seats": 7, | ||
| "name": " Kia Sorento 2WD 2.2 DATH 2017 - 705 triệu\r\n ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chu y space o dau
schema/schemas.py
Outdated
| 'additionalProperties': False | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xoa dong cuoi de k them vao schema
Adding OtoComVnCrawler + otocomvn data folder