-
Notifications
You must be signed in to change notification settings - Fork 11
Open
Labels
enhancementNew feature or requestNew feature or request
Description
You have to follow the below-mentioned steps to process further :
i. Sampled 1M data points because of computing and memory limitations.
ii. Separated code-snippets from Body
iii. Removed Special characters from Question title and description (not in code)
iv. Removed stop words (Except ‘C’)
v. Removed HTML Tags using Regular Expressions
vi. Converted all the characters into small letters
vii. Used SnowballStemmer to stem the words
Below we can find the example questions after preprocessed.
And now you have to create a new database called ‘Processed.db’ and loaded the preprocessed data into it.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request