Skip to main content

Table 1 Performance of named entity extraction using different LLM models on a sample of 1000 experimental paragraphs

From: Suitability of large language models for extraction of high-quality chemical reaction dataset from patent literature

LLM/results

GPT-3.5

Gemini 1.0 Pro

Claude 2.1

Llama 2-13B

Successfully atom mapped

929

920

889

594

Product IUPAC Errora

9

4

29

34

Missing atomb

53

74

76

354

RXN mapper errorc

4

2

6

18

Extraction time (in mins)d

75

72

135

81

  1. aThe Product IUPAC Error count refers to the number of product IUPACs which have not been successfully converted to SMILES. bMissing atom count refers to the number of reactions at least one reactant IUPAC conversion is failed, leading to missing product atoms error. cRXN mapper error count refers to the number of reactions in which an empty list is returned for either reactants or products. dExtraction time (in mins) refers to the amount of time taken by each LLM for extracting reaction entities from 1000 reaction containing paragraphs. The extraction time observed for each LLM may vary depending on the load on the API and the compute resources available for self-hosted models