"Harnessing Large Language Models for Clinical Information Extraction: A Systematic Literature Review" Dataset

This folder contains the information retrieved related to the article "Harnessing Large Language Models for Clinical Information Extraction: A Systematic Literature Review", which is a systematic literature review of the use of LLMs on Clinical Information Extraction. The dataset includes the files described below. We also describe their CSV columns, when applicable.

- Full.xlsx: This file contains a spreadsheet with the entire dataset, for users who do not wish to extract the raw data.

- Identification.csv: This file contains information about the databases and queries used in the review, and the number of identified papers.

- Duplicates.csv: This file contains information about the duplicate files encountered during database querying.
  - Id: Assigned paper Id
  - Title: Paper title
  - Source: Database that returned the paper. We used DUPLICATE whenever the paper existed previously, along with the ID to the first appearance
  - URL: Paper URL

- Title.csv: This file contains information about the papers that advanced to the title screening stage (does not include citation tracking).
  - Id: Assigned paper Id
  - Title: Paper title
  - Source: Database that returned the paper
  - URL: Paper URL
  - Include: Whether or not to advance the paper to the abstract screening stage

- Abstract.csv: This file contains information abouth the papers that advanced to the abstract screening stage (does not include citation tracking).
  - Id: Assigned paper Id
  - Title: Paper title
  - Abstract: Paper abstract
  - URL: Paper URL
  - Source: Database that returned the paper
  - Type of publication: Publication type of the paper
  - Include: Whether or not to advance the paper to the eligibility stage

- Eligibility.csv: This file contains information about the papers that advancet do the eligibility stage (does not include citation tracking).
  - Id: Assigned paper Id
  - Title: Paper title
  - Abstract: Paper abstract
  - URL: Paper URL
  - Source: Database that returned the paper
  - Type of publication: Publication type of the paper
  - Uses LLMs: Whether the paper respects inclusion criteria I1
  - Performs IE: Whether the paper respects inclusion criteria I2
  - Based on Clinical Field: Whether the paper respects inclusion criteria I3
  - Include: Whether or not to advance the paper to the inclusion stage
  - Reason for Exclusion: Context on the motive for excluding the paper

- Results.csv: This file contains the results of database querying stage.
  - Id: Assigned paper Id
  - Title: Paper title
  - Abstract: Paper abstract
  - URL: Paper URL
  - Source: Database that returned the paper
  - Type of publication: Publication type of the paper

- Backwards-Citations.csv: This file contains all the backwards citations found in all articles gathered during database querying.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper

- Forwards-Citations.csv: This file contains all the forwards citations found in all articles gathered during databased querying.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper

- Citation-Duplicates.csv: This file contains information about all the duplicate files encountered during citation tracking. Since the duplicates were discarded automatically, if a paper is present here, it means it is not a duplicate file.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper

- Citation-Title.csv: This file contains information about the papers that advanced to the title screening stage from citation tracking. 
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper
  - Citation Type: Whether the paper was found via forwards or backwards citation tracking
  - Include: Whether or not to advance the paper to the abstract screening stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No

- Citation-Abstract.csv: This file contains information about the papers that advanced to the abstract screening stage from citation tracking.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper
  - Citation Type: Whether the paper was found via forwards or backwards citation tracking
  - Include: Whether or not to advance the paper to the eligibility stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No

- Citation-Full.csv: This file contains information about the papers that advanced to the full text screening stage from citation tracking.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper
  - Citation Type: Whether the paper was found via forwards or backwards citation tracking
  - Include: Whether or not to advance the paper to the inclusion stage. If a paper passed through the duplicate check but was found here, 'DUPLICATE' is used instead. For all effects, it counts as a No

- Citation-Results.csv: This file contains the results of database querying stage.
  - Id: Assigned paper Id
  - Title: Paper title
  - URL: Paper URL
  - SourceID: Id of the paper from database querying that returned the paper
  - Citation Type: Whether the paper was found via forwards or backwards citation tracking

- Features.csv: This file contains the data extracted from each article.
  - Id: Assigned paper Id
  - Title: Paper title
  - Year: Publication year
  - Publication Venue: Type of publication venue
  - Venue Name: Name of the publication venue
  - Citations: Number of total citations
  - Citations/Year: Number of citations per year since publication
  - Authors: Names of authors
  - Abstract: Paper abstract
  - Keywords: Paper Keywords
  - Language: Languages the paper works on
  - Task: Tasks the paper tackles. When dealing with multiple tasks, & is used as a separator
  - Methods (Best Model): Approach used in the paper with the best results
  - Metrics: Metrics used in the paper
  - Results: Results per metric. If the reported metrics are Precision, Recall, and F-Score, only the F-Score is shown.
  - Data Source: Datasets used for evaluation
  - Notes: Notes to provide context and additional information when necessary