Wikipedia information quality comparison between idioms

Source code and dataset from the first part of my Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment as health information source), at FEUP, in 2021. It contains the data collected to assess Wikipedia health-related articles, for the 1000 most viewed articles for the English Wikipedia, listed by WikiProject Medicine. The following idioms were assessed: English, Chinese, Hindi, Arabic, Bengali, French, Russian, Portuguese, Urdu, Indonesian, German, Japanese, Turkish, Persian, Korean, Italian, Greek, Hebrew, and Catalan. We have selected idioms available on Wikipedia with at least 100 million speakers as a native or second idiom. We also extended this collection to six other idioms for their cultural or medical importance.

First, all articles written in English were collected from the mentioned list. Data for articles written in other idioms other than English was obtained by following the idiom link in each of the English articles, and each of them was iteratively collected, using the MediaWiki API.

This dataset can be used to analyze quality, but also other quantitative aspects of health-related articles from Wikipedia, in different idioms.

Data and Resources

Additional Info

Field Value
Source Wikipedia - Creative Commons Attribution-ShareAlike 3.0 Unported License
Author Luis Couto and Carla Teixeira Lopes
Last Updated September 13, 2021, 06:29 (Europe/Lisbon)
Created July 29, 2021, 16:56 (Europe/Lisbon)
CiteAs COUTO, L., LOPES, C.T., DOMINGUES, G. Wikipedia information quality comparison between idioms [dataset]. 29 July 2021. INESC TEC research data repository. DOI: https://doi.org/10.25747/ep0v-en19
DOI https://doi.org/10.25747/ep0v-en19
dc.Contributor Gil Domingues
dc.Coverage.Spatial World Wide Web; Wikipedia, in the next idioms: English, Chinese, Hindi, Arabic, Bengali, French, Russian, Portuguese, Urdu, Indonesian, German, Japanese, Turkish, Persian, Korean, Italian, Greek, Hebrew and Catalan.
dc.Coverage.Temporal 04.01.2021
dc.Date July, 2021
dc.Format *.csv; *.py
dc.Format.Extent 2.9 MB
dc.Language EN
dc.Publisher FEUP
dc.Relation Couto, L., & Lopes, C. T. (2021, September). Equal opportunities in the access to quality online health information? Amulti-lingual study on Wikipedia. OpenSym 2021 . DOI:https://doi.org/10.1145/3479986.3480000; Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde - Tese: https://sigarra.up.pt/feup/pt/pub_geral.show_file?pi_doc_id=306430
dc.Type Features and metrics results for Wikipedia health articles assessment
ddi.Software Excel or equivalent spreadsheet software
ddi.TypeInstrument Developed scripts in Python by Luis Couto and Gil Domingues, Wikimedia API