-
LabadainLog-17k+: Search Logs from Tetun-Speaking Users Across Chat, Web,...
1. Overview LabadainLog-17k+ is a dataset of interaction logs in Tetun, collected from three different platforms: Labadain Chat (16,952 prompts): An LLM-powered conversational... -
Labadain-Stopwords: A Curated List of 160 Tetun Stopwords
Labadain-Stopwords is a curated list of 160 Tetun stopwords, compiled from the Labadain-30k+ dataset and validated by native speakers. It is well-suited for various Tetun... -
Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are...
Vous pouvez également accéder à ce catalogue en utilisant l'API (cf. Documentation de l'API).