Source
Folder with python scripts used to collect and process data, namely:
get_dataset_lang.py: Generates the dataset from Wikipedia's list of most popular health articles;
ApiInterface.py: Makes requests to the wikimedia API. Insert data into the database with the help of DatabaseInterface.py;
DatabaseInterface.py: Manages the connection to the database;
DatasetProcessor.py: Process each article present in the dataset, from its title, using ApiInterface;
GetAdmins.py: Updates admin users in database from admin list;
CheckBrokenArticles.py: Registers articles that do not exist: from links, even those that are not in the dataset but appeared in the database as links;
CleanWikitextContent.py: Writes the content of an article to a file in wikitext format. Removes the markdown and write the clear text to another file. Calculates the Flesch and Kincaid values (readability), as well as the length of the article;
ParseImagesFromWikitext.py: Gets the number of assets that are images, using wikitext content;
CSVGen.py: Calculates the metrics and writes them to a CSVGen.csv file;
VolatilityCalc.py: Calculates the volatility of the items and writes the results to a volatility.csv file;
GetWpAdmins.py: Registers users who are on the WikiProject Medicine admin list;
GetTranslated.py: Registers the articles that are present in the list of articles translated by the Health Translation TaskForce;
GetSections.py: Registers the sections of articles that are part of the list of sections recommended by WikiProject Medicine;
GetReputatedLinks.py: Registers the links that are on the list of links suggested by the National Institute of Health;
GatherMedicineTemplates.py: Creates file with all relevant medicine templates;
GenerateMedicineMetrics.py: Creates file with all relevant medicine templates.
Implemented for all languages, although only English was used/tested;
ParseTemplatesFromContent.py: Loads the medicine templates into the database. Developed for languages other than English, but only this one has been used/tested;
ParseMedicalInfoboxValues.py: Inserts the medical infobox values into the database;
PutInfoboxInDatabase.py: Places all relevant infoboxes in the database;
StripInfoFromContent.py: Gets the number of tables in the article - not used/tested. Inserts the medical indices into the database;
CategoryUpdate.py: Inserts categories of analyzed articles into the database and create links between articles - categories;
GetAssess.py: Inserts the classification of articles according to the WikiProject Medicine review lists;
GenerateMedicineMetrics.py: Calculates the specific proposed metrics and measures, and received the results in a medicine_metrics.csv file
Pro tento zdroj ještě nejsou vytvořena žádná zobrazení
Další informace
Pole | Hodnota |
---|---|
Datum aktualizace dat | 13. září 2021 |
Datum aktualizace metadat | 13. září 2021 |
Vytvořeno | 13. září 2021 |
Formát | ZIP |
Licence | Creative Commons Attribution Share-Alike |
Datastore active | False |
Has views | False |
Id | b7e4e8c9-575d-4cda-9e2f-1df36eeb39b4 |
Package id | fc2d2154-2056-4681-8b42-226c65017a34 |
Position | 0 |
State | active |
Url type | upload |