"Harnessing Large Language Models for Clinical Information Extraction: A Systematic Literature Review" Dataset This folder contains the information retrieved related to the article "Harnessing Large Language Models for Clinical Information Extraction: A Systematic Literature Review", which is a systematic literature review of the use of LLMs on Clinical Information Extraction. The dataset includes the files described below. We also describe their CSV columns, when applicable. - Full.xlsx: This file contains a spreadsheet with the entire dataset, for users who do not wish to extract the raw data. - Identification.csv: This file contains information about the databases and queries used in the review, and the number of identified papers. - Duplicates.csv: This file contains information about the duplicate files encountered during database querying. - Id: Assigned paper Id - Title: Paper title - Source: Database that returned the paper. We used DUPLICATE whenever the paper existed previously, along with the ID to the first appearance - URL: Paper URL - Title.csv: This file contains information about the papers that advanced to the title screening stage (does not include citation tracking). - Id: Assigned paper Id - Title: Paper title - Source: Database that returned the paper - URL: Paper URL - Include: Whether or not to advance the paper to the abstract screening stage - Abstract.csv: This file contains information abouth the papers that advanced to the abstract screening stage (does not include citation tracking). - Id: Assigned paper Id - Title: Paper title - Abstract: Paper abstract - URL: Paper URL - Source: Database that returned the paper - Type of publication: Publication type of the paper - Include: Whether or not to advance the paper to the eligibility stage - Eligibility.csv: This file contains information about the papers that advancet do the eligibility stage (does not include citation tracking). - Id: Assigned paper Id - Title: Paper title - Abstract: Paper abstract - URL: Paper URL - Source: Database that returned the paper - Type of publication: Publication type of the paper - Uses LLMs: Whether the paper respects inclusion criteria I1 - Performs IE: Whether the paper respects inclusion criteria I2 - Based on Clinical Field: Whether the paper respects inclusion criteria I3 - Include: Whether or not to advance the paper to the inclusion stage - Reason for Exclusion: Context on the motive for excluding the paper - Results.csv: This file contains the results of database querying stage. - Id: Assigned paper Id - Title: Paper title - Abstract: Paper abstract - URL: Paper URL - Source: Database that returned the paper - Type of publication: Publication type of the paper - Backwards-Citations.csv: This file contains all the backwards citations found in all articles gathered during database querying. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Forwards-Citations.csv: This file contains all the forwards citations found in all articles gathered during databased querying. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation-Duplicates.csv: This file contains information about all the duplicate files encountered during citation tracking. Since the duplicates were discarded automatically, if a paper is present here, it means it is not a duplicate file. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation-Title.csv: This file contains information about the papers that advanced to the title screening stage from citation tracking. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation Type: Whether the paper was found via forwards or backwards citation tracking - Include: Whether or not to advance the paper to the abstract screening stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No - Citation-Abstract.csv: This file contains information about the papers that advanced to the abstract screening stage from citation tracking. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation Type: Whether the paper was found via forwards or backwards citation tracking - Include: Whether or not to advance the paper to the eligibility stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No - Citation-Full.csv: This file contains information about the papers that advanced to the full text screening stage from citation tracking. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation Type: Whether the paper was found via forwards or backwards citation tracking - Include: Whether or not to advance the paper to the inclusion stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No - Citation-Results.csv: This file contains the results of database querying stage. - Id: Assigned paper Id - Title: Paper title - URL: Paper URL - SourceID: Id of the paper from database querying that returned the paper - Citation Type: Whether the paper was found via forwards or backwards citation tracking - Features.csv: This file contains the data extracted from each article. - Id: Assigned paper Id - Title: Paper title - Year: Publication year - Publication Venue: Type of publication venue - Venue Name: Name of the publication venue - Citations: Number of total citations - Citations/Year: Number of citations per year since publication - Authors: Names of authors - Abstract: Paper abstract - Keywords: Paper Keywords - Language: Languages the paper works on - Task: Tasks the paper tackles. When dealing with multiple tasks, & is used as a separator - Methods (Best Model): Approach used in the paper with the best results - Metrics: Metrics used in the paper - Results: Results per metric. If the reported metrics are Precision, Recall, and F-Score, only the F-Score is shown. - Data Source: Datasets used for evaluation - Notes: Notes to provide context and additional information when necessary