Skip to main content
Fig. 3 | Journal of Cheminformatics

Fig. 3

From: Fifteen years of ChEMBL and its role in cheminformatics and drug discovery

Fig. 3

Heatmaps showing the relationships between the identified topics and the top words associated with those topics, which were derived from topic modeling based on articles in PubMed that contain the term “ChEMBL” in either title or abstract. Darker colour indicates a higher word weight (higher prevalence). Words within a topic are ordered by increasing word weight from left to right. The left heatmap is based on 421 articles published between 2010 and 2019; the right heatmap is based on 511 articles published between 2020 and 2024. Topic modelling was performed using Latent Dirichlet Allocation (sklearn package in python) by retrieving 5 topics and 10 words, respectively, for each time period. Each topic is represented by a list of words initially, with weights indicating how important each word is to the topic. Further a Sentence Transformer model (sentence-transformers library in python) was used to match a predefined set of single representative terms using word embeddings and cosine similarity

Back to article page