Manual Transcriptions of Typewritten Digital Representations of Portuguese Cultural Heritage Documents from the 20th Century We built this dataset in the context of the EPISA project (https://www.inesctec.pt/en/projects/episa), more specifically, in the context of the task "Document mining for automatic metadata records". The dataset includes manual transcriptions of typewritten digital representations of Portuguese cultural heritage documents from the 20th century, extracted from the Arquivo Nacional Torre do Tombo (ANTT) (https://digitarq.arquivos.pt/). It includes records from two funds: the General Administration of National Treasury (DGFP) and the National Secretariat of Information (SNI). The manual transcriptions of the digital representations will be used for the text recognition evaluation process to create tools that can facilitate the work of cultural heritage professionals by extracting and describing specific objects in an automated way. We extracted the digital representations from a previously developed dataset (https://rdm.inesctec.pt/dataset/cs-2022-004). We selected 708 out of 12.041 typewritten digital representations from 5 document types: letters, structured reports, non-structured reports, processes covers, and theatre plays covers. Each digital representation is associated with a specific series. We randomly selected 5% of the digital representations of each series associated with each document type. The process covers typology is the only exception. In this typology, we decided to consider more representations (75% of each series) given the small number of representations associated with this typology in the initial dataset. The dataset has 162 letters, 328 structured reports, 98 non-structured reports, 60 processes covers, and 60 theatre plays covers. The selected typed digital representations were manually transcribed with the support of the OCR2edit application (https://www.ocr2edit.com/pt/converter-para-txt), following transcription rules to normalize the process: [...] to define the rejected reading of words or parts of words and (?) used in case of doubt in reading (https://calenda.org/236139?file=1). The dataset is divided into three parts: the digital representations in a zip file containing the 708 pages of digital representation in .tif format, organized by document type. The transcriptions in a zip file containing 708 manual transcriptions of typewritten digital representations in .txt format. At last, the classification of digital representations in the CSV file, including the following attributes: filename and document_type.