-
Wikipedia and Simple Wikipedia Lead Section Pairs for Nine Categories
The dataset (categorized_dataset folder) contains 9 files in .csv format, each a collection of 10,000 lead section pairs sourced from Wikipedia (https://www.wikipedia.org/) and... -
Metadata and Analysis of Clinical Information Extraction Publications Using...
This dataset contains all the data collected on all the papers analyzed in our publication, entitled "Harnessing Large Language Models for Clinical Information Extraction: A... -
Content Analysis of Publications in Experimental domains
This dataset support the proposal of manual content analysis as an approach to streamline the data curator workflow. We have performed manual context analysis over publications... -
Gamma radiation monitoring at the Azores ENA-ARM station (Graciosa Island)
Gamma-ray total counts in counts/minute (cpm), every 15-minutes, from the Gamma Radiation Monitoring campaign at the ENA site (Graciosa, Azores). Data were collected to study... -
ISAD(G) Descriptions of Archival Records With Entity Annotation
This dataset contains long text ISAD(G) fields from records from the Arquivo Nacional da Torre do Tombo annotated with entities. It was built to evaluate the effectiveness of... -
Manual Transcriptions of Typewritten Digital Representations of Portuguese...
The dataset includes manual transcriptions of typewritten digital representations of Portuguese cultural heritage documents from the 20th century, extracted from the Arquivo... -
Matrix profile analysis of Dansgaard-Oeschger events in palaeoclimate time series
This dataset includes all the datafiles and computational notebooks required to reproduce the work reported in the paper “Characterisation of Dansgaard-Oeschger events in... -
Analysis of baptism, marriage and death registers belonging to the District...
The content of the datasets has description units related to baptisms, marriages and deaths from the District Archive of Guarda; baptisms, marriages and deaths from the District... -
Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature...
This is the result dataset related to the article entitled "Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review", which is a systematic... -
Radon data from ENVRIplus TNA campaign RELECT at SMEAR II – HYYTIÄLÄ...
Radon concentration measurements (in Bq/m3) every 2-hours. -
Labadain-30k+: A Monolingual Tetun Document-Level Audited Dataset
Labadain-30k+ is a monolingual Tetun dataset containing 33,550 documents spanning from June 2001 to September 2023, excluding the years 2004 and 2005, for which no documents are... -
Images annotated according to their content: a study on the description of...
Data description is a fundamental step in Research Data Management (RDM). When it comes to images, the challenge is increased, as they have characteristics that differentiate... -
Evolution of Web search engine interfaces through SERP screenshots and HTML...
This dataset was extracted for a study on the evolution of Web search engine interfaces since their appearance. The well-known list of “10 blue links” has evolved into richer... -
Classification of online health messages
Classification of online health messages The dataset has 487 annotated messages taken from Medhelp, an online health forum with several health communities... -
Radon concentration (Bq.m-3) from INESC TEC station (Porto). Updated monthly.
The dataset consists on measurements every 6-hours of radon concentration on the roof of INESC TEC main building. This dataset has Jupyter notebook for visualization (for better... -
Meteorological data from LFC/FEUP station
The dataset consists on measurements every 5-minutes from the LFC/FEUP meteorological station, including air temperature (in degrees C), relative humidity (%), atmospheric... -
Atmospheric electric field from INESC TEC station (Porto)
The dataset consists on 1-min measurements of the atmospheric electric field by a CS110 field mill installed on the roof of INESC TEC main building. This dataset has Jupyter... -
Gamma radiation from INESC TEC station (Porto)
The dataset consists on measurements of the total number of gamma rays counted by a NaI(Tl) scintillator on the roof of INESC TEC main building. This dataset has Jupyter... -
Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12
The Simple English Wikipedia Link Graph with Clickstream Transitions is a gzipped GML file representing the hyperlink graph of the Simple English Wikipedia. It was prepared... -
Music streaming and playlisting activity from music streaming service
This dataset is taken from the portuguese music social network "Palco Principal", that gathers non-mainstream musicians with fans. The website allows free music streaming and...