Wikipedia information quality assessment

Dataset from the second part of the Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment as health information source), at FEUP, in 2021. It contains the data collected to assess Wikipedia health-related articles for the 1000 most viewed articles listed by WikiProject Medicine, in English.

The MediaWiki API was used to collect the current state of the article’s contents and its metadata, revision history, language links, internal wiki links, and external links. Data not available through the API was obtained from the article’s markup. Besides the 7 metrics defined by Stvilia et al., other four proposed metrics and respective features were assessed.

This dataset can be used to analyze quality, but also other quantitative aspects of health-related articles from EnglishWikipedia.

Data and Resources

Additional Info

Field Value
Source Wikipedia - Creative Commons Attribution-ShareAlike 3.0 Unported License
Author Luis Couto and Carla Teixeira Lopes
Last Updated April 30, 2024, 12:45 (UTC)
Created July 29, 2021, 16:07 (UTC)
Citation COUTO, L., LOPES, C.T., DOMINGUES, G. Wikipedia information quality assessment [dataset]. 29 July 2021. INESC TEC research data repository. DOI: https://doi.org/10.25747/wfzk-h937
Contributor Gil Domingues
Coverage World Wide Web; English Wikipedia.
DOI https://doi.org/10.25747/wfzk-h937
Date of Creationg July, 2021
Format *.csv
Language EN
Relation Couto, L., & Lopes, C. T. (2021, April). Assessing the quality of health-related Wikipedia articles with generic and specific metrics. In Companion Proceedings of the Web Conference 2021 (pp. 640-647). DOI:https://dl.acm.org/doi/10.1145/3442442.3452355
Size 287 KB
Software Excel or equivalent spreadsheet software
Temporal Coverage 04.01.2021
Type Features and metrics results for Wikipedia health articles assessment
Type of Instrument Developed scripts in Python by Luis Couto and Gil Domingues, Wikimedia API