Classification of online health messages

Classification of online health messages

The dataset has 487 annotated messages taken from Medhelp, an online health forum with several health communities (https://www.medhelp.org/). It was built in a master thesis entitled "Automatic categorization of health-related messages in online health communities" of the Master in Informatics and Computing Engineering of the Faculty of Engineering of the University of Porto. It expands a dataset created in a previous work [see Relation metadata] whose objective was to propose a classification scheme to analyze messages exchanged in online health forums.

A website was built to allow the classification of additional messages collected from Medhelp. After using a Python script to scrape the five most recent discussions from popular forums (https://www.medhelp.org/forums/list), we sampled 285 messages from them to annotate. Each message was classified three times by anonymous people in 11 categories from April 2022 until the end of May 2022. For each message, the rater picked the categories associated with the message and its emotional polarity (positive, neutral, and negative).

Our dataset is organized in two CSV files, one containing information regarding the 885 (=3*285) classifications collected via crowdsourcing (CrowdsourcingClassification.csv) and the other containing the 487 messages with their final and consensual classifications (FinalClassification.csv).

The readMe file provides detailed information about the two .csv files.

Data and Resources

Additional Info

Field Value
Source Medhelp
Author João Abelha, Carla Teixeira Lopes
Last Updated Ağustos 28, 2023, 08:27 (UTC)
Created Temmuz 6, 2022, 12:55 (UTC)
Citation Abelha, J., Lopes, C. T. (2022). Classification of online health messages [Data set]. INESC TEC. https://doi.org/10.25747/DG9G-A217
DOI https://doi.org/10.25747/DG9G-A217
dc.Coverage.Temporal April - May 2022
dc.File.Size 476 kb
dc.Format .csv
dc.Relation Carla Teixeira Lopes and Bárbara Guimarães Da Silva. "A classification scheme for analyses of messages exchanged in online health forums." Proceedings of the The Information Behaviour Conference (ISIC 2018). 2018.