The ScispaCy Clinical Term Extraction and Snomed CT Synonymy Elimination From Clinical Data For Clustering: A Novel Study

Authors

Keywords:

Clinical document, SNOMED CT ontology, Red-black tree

Abstract

A clinical document is a written or electronic record that encompasses details regarding a patient’s medical procedure, clinical trial, or test outcomes. Standard information mining approaches have challenges in clustering clinical documents due to their unstructured nature. This work introduced a new approach for grouping clinical documents to address problems related to synonymy,
abbreviation extension, and extraction of key features. The clinical document collection for coronary artery disease consists of 1304 records obtained from 296 patients. These records have been chosen for preprocessing with the aim of removing any irregularities. The scispaCy model extracts relevant information after a simple letter-matching algorithm identifies and extends abbreviations.
Furthermore, the features are examined using SNOMED CT ontology to eradicate medical terms that have similar meanings. The TF-IDF method is employed to convert the recovered features into vectors. The BERT model’s word embeddings were employed in this study to represent features. Nevertheless, the TF-IDF model surpasses the BERT model in performance. The clustering process utilises an enhanced k-means algorithm that incorporates the Red-Black Tree data structure. The recommended strategy was evaluated with several existing clustering algorithms in this study. It has been found that the proposed method produces clusters with higher scores for Normalised Mutual Information (NMI) and accuracy. Based on the results of this investigation, the model has the ability to detect
individuals with similar diseases and provide assistance to healthcare professionals.

Downloads

Download data is not yet available.

Published

14-11-2024

How to Cite

Jasila, E., Saleena, . N., & Nazeer, K. A. A. (2024). The ScispaCy Clinical Term Extraction and Snomed CT Synonymy Elimination From Clinical Data For Clustering: A Novel Study. Communications in Mathematics and Applications, 15(2). Retrieved from https://rgnpublications.com/journals/index.php/cma/article/view/2939

Issue

Section

Research Article