-
Labadain-ZSRunS: Sparse and Zero-Shot Dense Retrieval Runs with...
1. Overview Labadain-ZSRunS is a dataset consisting of run files produced by classical sparse and zero-shot dense retrieval models, resulted from the experiments on Tetun ad-hoc... -
Labadain-Stopwords: A Curated List of 160 Tetun Stopwords
Labadain-Stopwords is a curated list of 160 Tetun stopwords, compiled from the Labadain-30k+ dataset and validated by native speakers. It is well-suited for various Tetun... -
Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are... -
Twitter profiles with related topics and websites
This dataset contains two files created for the dissertation "A Social Media Tool for Domain-Specific Information Retrieval - A Case Study in Human Trafficking" by Tito Griné...
També podeu accedir a aquest registre usant l'API API (vegeu Documentació de la API).