Manual Transcriptions of Typewritten Digital Representations of Portuguese Cultural Heritage Documents from the 20th Century

The dataset includes manual transcriptions of typewritten digital representations of Portuguese cultural heritage documents from the 20th century, extracted from the Arquivo Nacional Torre do Tombo (ANTT) (https://digitarq.arquivos.pt/). It has records from two funds: the General Administration of National Treasury (DGFP) and the National Secretariat of Information (SNI). The manual transcriptions of the digital representations will be used for the text recognition evaluation process to create tools that can facilitate the work of cultural heritage professionals by extracting and describing specific objects in an automated way. The digital representations were extracted from a dataset previously developed (https://rdm.inesctec.pt/dataset/cs-2022-004). We selected 708 out of 12.041 typewritten digital representations from 5 document types: letters, structured reports, non-structured reports, processes covers, and theatre plays covers. The dataset has 162 letters, 328 structured reports, 98 non-structured reports, 60 processes covers, and 60 theatre plays covers.

Data and Resources

Additional Info

Field Value
Source https://calenda.org/236139?file=1
Author Margarida Falcão
Last Updated June 8, 2022, 14:04 (Europe/Lisbon)
Created June 8, 2022, 13:55 (Europe/Lisbon)
Citation Falcão, M. (2022). Manual Transcriptions of Typewritten Digital Representations of Portuguese Cultural Heritage Documents from the 20th Century [Data set]. INESC TEC. https://doi.org/10.25747/WPNA-JE39
DOI https://doi.org/10.25747/wpna-je39
dc.Contributor EPISA Project team
dc.Coverage.Spatial Arquivo Nacional da Torre do Tombo; Palácio Nacional da Ajuda
dc.Coverage.Temporal 1910 - 1974
dc.Created.Date June 2022
dc.Format .zip with *.tif .zip with *txt, .csv, .txt
dc.Language PT, EN
dc.Relation https://rdm.inesctec.pt/dataset/cs-2022-004
dc.Type Manual Transcriptions