-
Wikipedia and Simple Wikipedia Lead Section Pairs for Nine Categories
The dataset (categorized_dataset folder) contains 9 files in .csv format, each a collection of 10,000 lead section pairs sourced from Wikipedia (https://www.wikipedia.org/) and... -
Metadata and Analysis of Clinical Information Extraction Publications Using...
This dataset contains all the data collected on all the papers analyzed in our publication, entitled "Harnessing Large Language Models for Clinical Information Extraction: A... -
Content Analysis of Publications in Experimental domains
This dataset support the proposal of manual content analysis as an approach to streamline the data curator workflow. We have performed manual context analysis over publications... -
Matrix profile analysis of Dansgaard-Oeschger events in palaeoclimate time series
This dataset includes all the datafiles and computational notebooks required to reproduce the work reported in the paper “Characterisation of Dansgaard-Oeschger events in... -
Geographically distributed solar power time series
This dataset contains electrical energy hourly time series from 44 small-PV (households) units located in the same region, with installed capacity ranging between 1.1 and 3.7... -
Classification of online health messages
Classification of online health messages The dataset has 487 annotated messages taken from Medhelp, an online health forum with several health communities... -
Solar power forecasting: measurements and numerical weather predictions
This dataset contains hourly power measurements from a 16.32 kW peak PV power plant located on the North of Portugal (Smart Grid and Electric Vehicle Lab – SGEVL, at INESC TEC),... -
Hate speech dataset annotated for Portuguese
Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were...