Figure 4

Schematic workflow for extraction and interpretation of chemical reactions in patents. Stage 1 -the patent is identified and downloaded. Stage 2-the document is deflattened and segmented. Stage 3-various tools (OPSIN, OSRA, OSCAR3) are used to identify key elements in the reaction and convert them to semantic form. Stage 4-ChemicalTagger is applied to the language of the chemical reaction to determine the roles and processes. Where successful, the extracted information is converted to reactions expressed in CML.