The ScispaCy Clinical Term Extraction and Snomed CT Synonymy Elimination From Clinical Data For Clustering: A Novel Study
Keywords:
Clinical document, SNOMED CT ontology, Red-black treeAbstract
A clinical document is a written or electronic record that encompasses details regarding a patient’s medical procedure, clinical trial, or test outcomes. Standard information mining approaches have challenges in clustering clinical documents due to their unstructured nature. This work introduced a new approach for grouping clinical documents to address problems related to synonymy,
abbreviation extension, and extraction of key features. The clinical document collection for coronary artery disease consists of 1304 records obtained from 296 patients. These records have been chosen for preprocessing with the aim of removing any irregularities. The scispaCy model extracts relevant information after a simple letter-matching algorithm identifies and extends abbreviations.
Furthermore, the features are examined using SNOMED CT ontology to eradicate medical terms that have similar meanings. The TF-IDF method is employed to convert the recovered features into vectors. The BERT model’s word embeddings were employed in this study to represent features. Nevertheless, the TF-IDF model surpasses the BERT model in performance. The clustering process utilises an enhanced k-means algorithm that incorporates the Red-Black Tree data structure. The recommended strategy was evaluated with several existing clustering algorithms in this study. It has been found that the proposed method produces clusters with higher scores for Normalised Mutual Information (NMI) and accuracy. Based on the results of this investigation, the model has the ability to detect
individuals with similar diseases and provide assistance to healthcare professionals.
Downloads
Published
How to Cite
Issue
Section
License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a CCAL that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.