Skip to content

Conversation

@KieuSonTung
Copy link
Collaborator

Adding OtoComVnCrawler + otocomvn data folder

.gitignore Outdated
@@ -1 +1,2 @@
/venv/
/carcrawlvenv/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toi se chinh cai nay de no nhan cac file co duoi la venv sau, k can explicit define nnay

def __init__(self, url):
super().__init__()
self.url = url
# self.log.info('Crawling %s' % self.url)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nen bo comment sau debug cho de

content = req.content
self.soup = BeautifulSoup(content.decode('utf-8', 'ignore'), 'html.parser')

def _replace_all(self, text, dic):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

co ham replace all trong common.normalize_price, import dung cho tien

txt = self._remove_words(l.text.strip())
info.append(txt)

if len(txt) == 8:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc qua ve zip(dict()) de k phai code dai

car['seats'] = int(car['seats'])
car['type'] = self._normalize_type(car['type'])
car['brand'] = self._get_brand(car['brand'])
# car.pop('city')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xoa cmt neu k can/dung

@@ -0,0 +1,1516 @@
href
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check cac file csv co href o dau sau doc file kho xu ly

"source_url": "https://oto.com.vn/mua-ban-xe-kia-cerato-ha-noi/20-at-premium-san-xuat-2019-aidxc15513154",
"seats": 5,
"name": " Kia Cerato 2.0 AT Premium 2019 - 600 triệu\r\n ",
"year": " 2019",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

year, co type la int

"seats": 5,
"name": " Kia Cerato 2.0 AT Premium 2019 - 600 triệu\r\n ",
"year": " 2019",
"typeOfCar": " Sedan",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower text, co space o dau, nho normalize

"price": 705000000,
"source_url": "https://oto.com.vn/mua-ban-xe-kia-sorento-ha-noi/2017-22dath-full-xang-cuc-moi-aidxc5561637",
"seats": 7,
"name": " Kia Sorento 2WD 2.2 DATH 2017 - 705 triệu\r\n ",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chu y space o dau

'additionalProperties': False
}
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xoa dong cuoi de k them vao schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants