-
Wikipedia information quality comparison between idioms
Source code and dataset from the first part of my Master Dissertation - "Avaliação da qualidade da Wikipédia enquanto fonte de informação em saúde" (Wikipedia quality assessment... -
Research data management in image format - Survey
These datasets were the result of research on research data management in image format. Based on the data collected, it was possible to study the practices and habits in the... -
Meteorological data from LFC/FEUP station
The dataset consists on measurements every 5-minutes from the LFC/FEUP meteorological station, including air temperature (in degrees C), relative humidity (%), atmospheric... -
Atmospheric electric field from INESC TEC station (Porto)
The dataset consists on 1-min measurements of the atmospheric electric field by a CS110 field mill installed on the roof of INESC TEC main building. This dataset has Jupyter... -
Gamma radiation from INESC TEC station (Porto)
The dataset consists on measurements of the total number of gamma rays counted by a NaI(Tl) scintillator on the roof of INESC TEC main building. This dataset has Jupyter... -
SIGARRA News Corpus
This dataset was taken from the SIGARRA information system at the University of Porto (UP). Every organic unit has its own domain and produces academic news. We collected a... -
Simple English Wikipedia Link Graph with Clickstream Transitions 2018-12
The Simple English Wikipedia Link Graph with Clickstream Transitions is a gzipped GML file representing the hyperlink graph of the Simple English Wikipedia. It was prepared... -
Music streaming and playlisting activity from music streaming service
This dataset is taken from the portuguese music social network "Palco Principal", that gathers non-mainstream musicians with fans. The website allows free music streaming and... -
Gamma radiation data from ENVRIplus TNA campaign RELECT at SMEAR II –...
Gamma radiation measurements in counts/minute (cpm) every 5-minutes. -
UrbanSense environmental monitoring
The UrbanSense project is the environmental monitoring part of the Smart City initiative at the city of Porto, Portugal. This dataset contains observational data collected at 23... -
Electric field data from ENVRIplus TNA campaign RELECT at SMEAR II –...
Vertical electric field measurements (in V/m), 1-min averages. -
Hate speech dataset annotated for Portuguese
Portuguese Hate Speech Twitter Dataset is a dataset of Twitter messages manually annotated for Hate Speech using a hierarchical structure of classes. 5,668 messages were... -
HAREM NER Models for OpenNLP, Stanford CoreNLP, spaCy, NLTK
Pre-trained models for named entity recognition in Portuguese, using the categories, types and subtypes of the Second HAREM dataset as entity classes. -
SIGARRA News Corpus NER Models for OpenNLP, Stanford CoreNLP, spaCy, NLTK
Pre-trained models for named entity recognition in Portuguese, using the following entity classes: Hora (Hour), Evento (Event), Organizacao (Organization), Curso (Course),...