Multi-Language Datasets
عنوان الموقع URL: https://rdm.inesctec.pt/dataset/24f17b48-304f-4c07-8f5d-2c9b62e25730/resource/14dac21d-c8d9-4972-a8e2-5b306f8e64a7/download/multi-language-datasets.zip
These datasets were designed for assessing and comparing our model's performance across different Wikipedia versions. We used MediaWiki's API to obtain random Wikipedia articles of any quality, so they are extremely unbalanced. They have the same structure as "Default Dataset", but without the Network features. Quality values were obtained from the respective Wikipedia's content quality scale. - multi-en-11096-csrh.csv: English dataset, contains 11096 articles. - multi-en-11195-csrh.csv: French dataset, contains 11195 articles. - multi-en-10525-csrh.csv: Portuguese dataset, contains 10525 articles. - multi-en-10341-csrh.csv: Russian dataset, contains 10341 articles.
لا توجد صيغ عرض منشأة لهذا المورد بعد.
معلومات إضافية
حقل | القيمة |
---|---|
آخر تحديث للبيانات | 27 يونيو 2022 |
آخر تحديث للبيانات الوصفية | 27 مايو 2024 |
أنشئت | 27 يونيو 2022 |
تنسيق | ZIP |
الترخيص | Creative Commons Attribution |
Datastore active | False |
Has views | False |
Id | 14dac21d-c8d9-4972-a8e2-5b306f8e64a7 |
Package id | 24f17b48-304f-4c07-8f5d-2c9b62e25730 |
Position | 6 |
State | active |
Url type | upload |