Conversation
….warn for warning message
…ting related tests
…like "XD" and ":D"
…; Add tests for 'num_named_entity'
xehu
left a comment
There was a problem hiding this comment.
Looks really good! Just a few small comments/questions.
| """ | ||
| emoji_pattern = r'[:;]-?\)+' | ||
| emojis = re.findall(emoji_pattern, text) | ||
| # emoji_pattern = r'[:;]-?\)+' |
There was a problem hiding this comment.
note to remove the commented out pattern before putting this in
Note: it looks like this branch crashes locally.This can be reproduced by installing the Here's what the error looks like: |
|
It seems like the source of the crash is that we're getting a The original input dataframe only has two conversation ID's -- I and J. For some weird reason, we're getting three: I, J, and |
|
This issue was because I tried to optimize RAM usage, so instead of concatenating data frames in the for looop, I write all intermediate dfs to the output file in append mode. As a result, if the output file exists and we need to overwrite it, it won't be deleted but become longer. The commit above resolves this. Now we append intermediate dfs to a list and concatenate only once at the end. It also saves RAM while making minimal changes. |
| batch_df = get_sentiment(batch) | ||
| batch_sentiments_df = pd.concat([batch_sentiments_df, batch_df], ignore_index=True) | ||
| batch_df = get_sentiment(batch, model_bert, device) | ||
| batch_df.to_csv(output_path, mode='a', header=first, index=False) |
There was a problem hiding this comment.
Note to self - we're now appending here; Emily to run vector tests locally and confirm that everything passes
Added tests for all columns in feature dict, Closes #220, Closes #359