Dataset - CKAN

Interface Element Frequencies in Search Engine Results Pages (SERPs) Across...

This dataset contains the data produced for the dissertation ""User Interface Variations in Search Engine Results Pages Across Types of Search Queries and Search Engines"". The...
- CSV
- HTML
Dataset of synthetic clinical notes in European Portuguese generated using...

This dataset was generated using an open-source large language model and carefully curated prompts, simulating realistic clinical narratives while ensuring no real patient data...
- TXT
- ZIP
- CSV
Wikipedia and Simple Wikipedia Lead Section Pairs for Nine Categories

The dataset (categorized_dataset folder) contains 9 files in .csv format, each a collection of 10,000 lead section pairs sourced from Wikipedia (https://www.wikipedia.org/) and...
- CSV
- TXT
- ZIP
Metadata and Analysis of Clinical Information Extraction Publications Using...

This dataset contains all the data collected on all the papers analyzed in our publication, entitled "Harnessing Large Language Models for Clinical Information Extraction: A...
- .XLSX
- CSV
- TXT
- ZIP
Content Analysis of Publications in Experimental domains

This dataset support the proposal of manual content analysis as an approach to streamline the data curator workflow. We have performed manual context analysis over publications...
- CSV
- TXT
Typewritten Digital Representations of Portuguese Cultural Heritage...

The dataset has typewritten Portuguese documents extracted from the Arquivo Nacional da Torre do Tombo (https://digitarq.arquivos.pt/). It includes records from two fonds of the...
- ZIP
- CSV
Manual Transcriptions of Typewritten Digital Representations of Portuguese...

The dataset includes manual transcriptions of typewritten digital representations of Portuguese cultural heritage documents from the 20th century, extracted from the Arquivo...
- ZIP
- CSV
- TXT
Immersive Learning Thematic Network Data

Information for a database where practices, strategies and uses of Immersive Learning are connected with works in the field. These Practices, Strategies and Uses were found...
- XLSX
- CSV
- ZIP
Matrix profile analysis of Dansgaard-Oeschger events in palaeoclimate time series

This dataset includes all the datafiles and computational notebooks required to reproduce the work reported in the paper “Characterisation of Dansgaard-Oeschger events in...
- TXT
- R
- IPYNB
- CSV
- ZIP
ArchOnto ontology representation of Portuguese archival description units...

The content of the datasets include an excerpt of the ArchOnto ontology representation of the DigitArq baptisms records from Bragança District Archive and passports. These...
- OWL
- RDF
- CSV
Twitter profiles with related topics and websites

This dataset contains two files created for the dissertation "A Social Media Tool for Domain-Specific Information Retrieval - A Case Study in Human Trafficking" by Tito Griné...
- CSV
Wikipedia information quality assessment

Dataset from the second part of the Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment as health...
- CSV
Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature...

This is the result dataset related to the article entitled "Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review", which is a systematic...
- XLSX
- CSV
- PDF
- ZIP
- TXT
Classification of online health messages

Classification of online health messages The dataset has 487 annotated messages taken from Medhelp, an online health forum with several health communities...
- TXT
- CSV
Wikipedia information quality comparison between idioms

Source code and dataset from the first part of my Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment...
- ZIP
- CSV
SIGARRA News Corpus

This dataset was taken from the SIGARRA information system at the University of Porto (UP). Every organic unit has its own domain and produces academic news. We collected a...
- CSV
- ZIP
- XML
Hate speech dataset annotated for Portuguese

Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were...
- CSV
- TXT

Du kan också komma åt katalogen via API (se API-dokumentation).

17 dataset hittades