Source

URL: https://rdm.inesctec.pt/dataset/fc2d2154-2056-4681-8b42-226c65017a34/resource/b7e4e8c9-575d-4cda-9e2f-1df36eeb39b4/download/source.zip

Folder with python scripts used to collect and process data, namely:

get_dataset_lang.py: Generates the dataset from Wikipedia's list of most popular health articles;

ApiInterface.py: Makes requests to the wikimedia API. Insert data into the database with the help of DatabaseInterface.py;

DatabaseInterface.py: Manages the connection to the database;

DatasetProcessor.py: Process each article present in the dataset, from its title, using ApiInterface;

GetAdmins.py: Updates admin users in database from admin list;

CheckBrokenArticles.py: Registers articles that do not exist: from links, even those that are not in the dataset but appeared in the database as links;

CleanWikitextContent.py: Writes the content of an article to a file in wikitext format. Removes the markdown and write the clear text to another file. Calculates the Flesch and Kincaid values ​​(readability), as well as the length of the article;

ParseImagesFromWikitext.py: Gets the number of assets that are images, using wikitext content;

CSVGen.py: Calculates the metrics and writes them to a CSVGen.csv file;

VolatilityCalc.py: Calculates the volatility of the items and writes the results to a volatility.csv file;

GetWpAdmins.py: Registers users who are on the WikiProject Medicine admin list;

GetTranslated.py: Registers the articles that are present in the list of articles translated by the Health Translation TaskForce;

GetSections.py: Registers the sections of articles that are part of the list of sections recommended by WikiProject Medicine;

GetReputatedLinks.py: Registers the links that are on the list of links suggested by the National Institute of Health;

GatherMedicineTemplates.py: Creates file with all relevant medicine templates;

GenerateMedicineMetrics.py: Creates file with all relevant medicine templates.

Implemented for all languages, although only English was used/tested;

ParseTemplatesFromContent.py: Loads the medicine templates into the database. Developed for languages ​​other than English, but only this one has been used/tested;

ParseMedicalInfoboxValues.py: Inserts the medical infobox values ​​into the database;

PutInfoboxInDatabase.py: Places all relevant infoboxes in the database;

StripInfoFromContent.py: Gets the number of tables in the article - not used/tested. Inserts the medical indices into the database;

CategoryUpdate.py: Inserts categories of analyzed articles into the database and create links between articles - categories;

GetAssess.py: Inserts the classification of articles according to the WikiProject Medicine review lists;

GenerateMedicineMetrics.py: Calculates the specific proposed metrics and measures, and received the results in a medicine_metrics.csv file

There are no views created for this resource yet.

Additional Information

Field Value
Last updated September 13, 2021
Created September 13, 2021
Format ZIP
License Creative Commons Attribution Share-Alike
created6 days ago
formatZIP
idb7e4e8c9-575d-4cda-9e2f-1df36eeb39b4
last modified6 days ago
on same domain1
package idfc2d2154-2056-4681-8b42-226c65017a34
revision idd12ab337-22aa-4c45-b669-67d941bfdce0
stateactive
url typeupload