Text2Story Lusa

The Text2Story Lusa dataset contains 357 news articles published in European Portuguese by the Lusa news agency mostly between October 2020 and December 2020. The articles are in text format (.txt) and include: publication date, location, headline and content. Also included is a JSON file containing all news articles. This dataset was initially developed in the context of the project "Text2Story: Extracting journalistic narratives from text and representing them in a narrative modeling language" / NORTE-01-0145-FEDER-03185.

To request access to this dataset please fill out this form (Text2Story Lusa - Request Form) and send it to: joao.a.castro@inesctec.pt

If you use this resource, please use the following citations (paper and dataset):

Nunes, S., Jorge, A., Amorim, A., Sousa, H., Leal, A., Silvano, P., Cantante, I.& Campos, R. (2024). Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024).

Nunes, S., Jorge, A., Leal, A., Amorim, E., Sousa, H., Cantante, I., Silvano, P., & Campos, R. (2023). Text2Story Lusa [Data set]. INESC TEC. https://doi.org/10.25747/ET95-BX90

Data and Resources

Additional Info

Field Value
Author Sérgio Nunes, Alípio Jorge, António Leal, Evelin Amorim, Hugo Sousa, Inês Cantante, Purificação Silvano, Ricardo Campos
Last Updated April 29, 2024, 13:22 (UTC)
Created May 16, 2023, 10:32 (UTC)
Citation Nunes, S., Jorge, A., Leal, A., Amorim, E., Sousa, H., Cantante, I., Silvano, P., & Campos, R. (2023). Text2Story Lusa [Data set]. INESC TEC. https://doi.org/10.25747/ET95-BX90
Contributor Lusa, Agência de Notícias de Portugal, S.A.
Creation Date 2023-01-07
DOI https://doi.org/10.25747/et95-bx90
File Size 400 kb
Format json
Language European Portuguese
Temporal Coverage 2020 - 2021
Type News articles in text format