INESC TEC - Organizations

LabadainLog-17k+: Search Logs from Tetun-Speaking Users Across Chat, Web,...

1. Overview LabadainLog-17k+ is a dataset of interaction logs in Tetun, collected from three different platforms: Labadain Chat (16,952 prompts): An LLM-powered conversational...

Labadain-ZSRunS: Sparse and Zero-Shot Dense Retrieval Runs with...

1. Overview Labadain-ZSRunS is a dataset consisting of run files produced by classical sparse and zero-shot dense retrieval models, resulted from the experiments on Tetun ad-hoc...

ZIP

Labadain-Stopwords: A Curated List of 160 Tetun Stopwords

Labadain-Stopwords is a curated list of 160 Tetun stopwords, compiled from the Labadain-30k+ dataset and validated by native speakers. It is well-suited for various Tetun...

TXT

Labadain-Avaliadór : A Test Collection for Tetun Ad-hoc Text Retrieval

The Labadain-Avaliadór dataset is a test collection developed for the ad-hoc retrieval task. It comprises 59 topics, 33,550 documents, and 5,900 query-document relevance...

Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset

Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are...

TXT
PYTHON

5 datasets found

LabadainLog-17k+: Search Logs from Tetun-Speaking Users Across Chat, Web,...

Labadain-ZSRunS: Sparse and Zero-Shot Dense Retrieval Runs with...

Labadain-Stopwords: A Curated List of 160 Tetun Stopwords

Labadain-Avaliadór : A Test Collection for Tetun Ad-hoc Text Retrieval

Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset