This is a labeled dataset for a sentiment analysis. The original dataset is from kaggle.
The original data is somewhat poor, meaning that the given label might not actually fit the true emotion of the text. We are slowly going through the data to clean it. While doing that we created a fourth sentiment neutral
.
- removing Mojibake and features that are clearly no WhatsApp Status (mostly residue from scrapping)
- relabeling
- removing racist, homophobic and abusive comments
We label a text as happy
if it sounds clearly happy and has a positive connotation and expresses light hearted and whimsical thoughts.
We label a text as angry
if the text sounds angry, sarcastic or sassy. And it is usually accompanied by strong language such as curse words and aggresive wording.
To be labeled as sad
the text expresses fear or insecurity and usually has a dark or gloomy tone.
Status that are neither happy
or sad
nor express any particular sentiment are labeled as neutral
. It can be advice or facts.
We also try to add more features by scrapping them from different websites that offer inspiration for WhatsApp Status.