Traditional Chinese Medicine Symptom Text Classification Based on Label Mask
Main Article Content
Abstract
Traditional Chinese Medicine (TCM) symptom text classification refers to the use of computer technology and TCM texts to analyze content and identify different symptom categories, thereby automatically predicting patients’ descriptions of their feelings and physical examination results. However, TCM clinical texts are often lengthy and complex, with valuable insights often obscured by noise or redundancy. Additionally, TCM texts contain many obscure words and specialized terms, hindering traditional models from accurately interpreting TCM terminology. To resolve these challenges, this paper first creates a template using label masking, which is incorporated into the input sequence fed into the BERT model. A KAN linear layer is then applied to classify the contextual features extracted by BERT, with the KAN layer adjusting the activation function to better align with the data and enhance classification accuracy. After the original label prediction, a CorNet module is introduced to effectively mine the correlations between labels. F Lastly, the MLM task is included in the training phase, helping the model not only estimate label distributions but also predict labels from masked input positions. This enhances the training process and further improves the model’s robustness. Experiments on various datasets show that the proposed model is highly effective and generalizable for real-world clinical data symptom text classification.