Hate speech dataset annotated for Portuguese

Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were collected on Twitter, from 1,156 distinct users and classified as containing hate speech using a hierarchical structure of classes. A multiclass and multilabel approach was considered. Two different formats of the dataset are provided, plus the hierarchy of classes. The text of the tweets is omitted in this dataset due to the conditions and terms of the Twitter API.

Dati un resursi

dataset dummy classesCSV
CSV file containing the dataset as a matrix with dummy variables for each...
Izpētīt
- Vairāk informācijas
- Lejupielādēt
graph hierarchical classesCSV
The classes follow a hierarchical organization. This hierarchy is represented...
Izpētīt
- Vairāk informācijas
- Lejupielādēt
dataset annotator classesCSV
First version of the dataset with the raw annotator classification. CSV file...
Izpētīt
- Vairāk informācijas
- Lejupielādēt
readme.txtTXT
Izpētīt
- Vairāk informācijas
- Lejupielādēt

Papildus informācija

Lauks	Vērtība
Autors	Paula Fortuna
Versija	2.0
Pēdējā atjaunināšana	novembris 15, 2018, 10:40 (UTC)
Izveidots	jūlijs 26, 2017, 13:10 (UTC)
DOI	http://doi.org/10.23728/b2share.9005efe2d6be4293b63c3cffd4cf193e
dc.Coverage.Temporal	8 to 9 March 2017
dc.Date	9 of March 2017
dc.Format	*.csv
dc.Format.Extent	1,10MB
dc.Language	PT
dc.Publisher	INESC TEC
dc.Relation	Master´s thesis: FORTUNA, Paula (2017). Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes. Porto: Faculdade de Engenharia da Universidade do Porto
dc.Type	Tweets and classes taxonomy