-
Labadain-ZSRunS: Sparse and Zero-Shot Dense Retrieval Runs with...
1. Overview Labadain-ZSRunS is a dataset consisting of run files produced by classical sparse and zero-shot dense retrieval models, resulted from the experiments on Tetun ad-hoc... -
LabadainLog-17k+: Search Logs from Tetun-Speaking Users Across Chat, Web,...
1. Overview LabadainLog-17k+ is a dataset of interaction logs in Tetun, collected from three different platforms: Labadain Chat (16,952 prompts): An LLM-powered conversational... -
Labadain-Stopwords: A Curated List of 160 Tetun Stopwords
Labadain-Stopwords is a curated list of 160 Tetun stopwords, compiled from the Labadain-30k+ dataset and validated by native speakers. It is well-suited for various Tetun... -
Labadain-Avaliadór : A Test Collection for Tetun Ad-hoc Text Retrieval
The Labadain-Avaliadór dataset is a test collection developed for the ad-hoc retrieval task. It comprises 59 topics, 33,550 documents, and 5,900 query-document relevance... -
Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are...
Du kan også tilgå dette register med API (se API-dokumenter).