Welcome to the INESC TEC research data repository.
This data repository showcases datasets produced or used by INESC TEC researchers and their partners. It is an embodiment of our institutional commitment to Open Data in research.

CS: Computer Science
The Computer Science Cluster mission is to contribute to the understanding of computing, to the rigorous development...
-
Semantic representation of the Registos de Baptismos da Paróquia de Aldoar...
This dataset comprises mappings of archival records from the National Archives of Portugal to the RiC-O (Records in Contexts Ontology) framework, namely the baptism registries... -
Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are...

INESC TEC
The Institute for Systems and Computer Engineering, Technology and Science – INESC TEC is an Associate Laboratory...
-
Labadain-Stopwords: A Curated List of 160 Tetun Stopwords
Labadain-Stopwords is a curated list of 160 Tetun stopwords, compiled from the Labadain-30k+ dataset and validated by native speakers. It is well-suited for various Tetun... -
Labadain-Avaliadór : A Test Collection for Tetun Ad-hoc Text Retrieval
The Labadain-Avaliadór dataset is a test collection developed for the ad-hoc retrieval task. It comprises 59 topics, 33,550 documents, and 5,900 query-document relevance...