A simple Naive Bayes classifier solution to profanity detection.
This project is now available under the permissive MIT license. I am testing it with a few real online communities and may continue to make changes in the near future, but this was a project for an Artificial Intelligence course I took, so it's for the most part "unmaintained". Feel free to open issues or suggest PRs as necessary, just please recognize that my free time to review and adopt changes is limited as I work full-time.
Many online communities contain realtime online chat functionality. Oftentimes, human moderation is not sufficient to address the volume of messages sent in these communities (sometimes >100 messages a second in some large multiplayer game communities!), and as such automated filtering is necessary to address this shortfall and prevent obvious/low-effort insincere content.
In realtime chat environments, filter performance is still an important criteria, as messages cannot be deleted once they have been sent, and latency in messages being broadcasted to other users is a critical concern. Because of this, more sophisticated filtering techniques are not always feasible.
Within many online gaming communities specifically, the most frequent filtering option is still Regex filtering based on a wordlist. This project hopes to provide a slightly more sophisticated and accurate filtering strategy for these communities, with the goals of being performant, accurate, and easy to integrate.
You can train your own filtering probability map if you wish. This project includes an example of doing such in
the src/main/.../train folder of the project, which creates a dataset in the format ProfanityPilot can efficiently
read using
the profanity-check
cleaned dataset, which itself sources data
from t-davidson/hate-speech-and-offensive-language
and Wikipedia comments from a Kaggle competition.
If you wish to use this dataset, it is bundled with the application in a form that is compressed. This compression is not necessary, but is done to avoid all the mean words from showing up in search indexing.
If you use the standard setup of the provided dataset & BayesianClassifier, choosing a threshold upon which to filter messages is an important challenge.
The default provided threshold (0.4) is based on some trial and error and analysis of a ROC curve upon a testing dataset. The table used for these recommendations is below:
| Threshold | Bayes Precision | Bayes Recall |
|---|---|---|
| 0 | 0.04685816 | 0.61516554 |
| 0.1 | 0.59925835 | 0.92604892 |
| 0.2 | 0.63765601 | 0.92443169 |
| 0.3 | 0.66296294 | 0.92308927 |
| 0.4 | 0.6830973 | 0.92159876 |
| 0.5 | 0.70138771 | 0.91981686 |
| 0.6 | 0.7184104 | 0.91795951 |
| 0.7 | 0.73605678 | 0.91584284 |
| 0.8 | 0.75751988 | 0.91304348 |
| 0.9 | 0.78607407 | 0.90838086 |
| 1 | 1 | 0.80013995 |
Below are some additional recommendations:
Threshold recommendation: ~0.3 - ~0.4
Comments:
For a child-friendly community, it is important to prioritize catching as much profane content as possible to create a safe environment for children. Because of this, I suggest a lower threshold that results in higher recall (catching more profanity) at the cost of lower precision (more false positives).
Threshold recommendation: ~0.4 - ~0.5
Comments:
For an all-ages-friendly community, it's essential to strike a balance between ensuring a welcoming environment for all users and not over-filtering content. Choosing a value in this threshold strikes a balance between precision and recall, making sure that most profane content is caught while minimizing false positives.
Threshold recommendation: ~0.55 - ~0.6
Comments:
In an adult-friendly online community, a balance between precision and recall is desired. Adults can handle some level of profanity, but it's still important to filter out explicit content that goes beyond the community's standards. The recommended thresholds offer a decent balance between precision and recall, maintaining a moderate level of filtering while keeping false positives in check.