Dataset from the second part of my Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment as health information source), at FEUP, in 2021. It contains the data collected to assess Wikipedia health-related articles for the 1000 most viewed articles listed by WikiProject Medicine, in English.

The MediaWiki API was used to collect the current state of the article’s contents and its metadata, revision history, language links, internal wiki links, and external links. Data not available through the API was obtained from the article’s markup. Besides the 7 metrics defined by Stvilia et al., other four proposed metrics and respective features were assessed.

This dataset can be used to analyze quality, but also other quantitative aspects of health-related articles from EnglishWikipedia.

Source Wikipedia - Creative Commons Attribution-ShareAlike 3.0 Unported License
Author Luis Couto and Carla Teixeira Lopes
Last Updated September 13, 2021, 06:32 (Europe/Lisbon)
Created July 29, 2021, 17:07 (Europe/Lisbon)
CiteAs COUTO, L., LOPES, C.T., DOMINGUES, G. Wikipedia information quality assessment [dataset]. 29 July 2021. INESC TEC research data repository. DOI:
dc.Contributor Gil Domingues
dc.Coverage.Spatial World Wide Web; English Wikipedia.
dc.Coverage.Temporal 04.01.2021
dc.Date July, 2021
dc.Format *.csv
dc.Format.Extent 287 KB
dc.Language EN
dc.Publisher FEUP
dc.Relation Couto, L., & Lopes, C. T. (2021, April). Assessing the quality of health-related Wikipedia articles with generic and specific metrics. In Companion Proceedings of the Web Conference 2021 (pp. 640-647). DOI:
dc.Type Features and metrics results for Wikipedia health articles assessment
ddi.Software Excel or equivalent spreadsheet software
ddi.TypeInstrument Developed scripts in Python by Luis Couto and Gil Domingues, Wikimedia API