HMM-BiMM: Hidden Markov Model-based word segmentation via improved Bi-directional Maximal Matching algorithm

Text segmentation
DOI: 10.1016/j.compeleceng.2021.107354 Publication Date: 2021-08-09T17:50:54Z
ABSTRACT
Abstract Combining with the Hidden Markov Model and Bi-directional Maximal Matching algorithm, a new word segmentation algorithm, HMM-BiMM, is presented. In terms of the sub-dictionary matching, it can implement a fast word segmentation. After segmenting the text by the Bi-directional Maximal Matching (BiMM), the remaining text connected by the remaining single words will be segmented again by the strategy of the Hidden Markov Model (HMM). By the HMM, this algorithm can realize the dictionary dynamic update by the new segmentation words and improve the segmentation accuracy accordingly. Compared with five representative algorithms in the real-world clinical text (symptom), we show that the HMM-BiMM algorithm achieves the highest efficiency and accuracy for symptom text segmentation. In detail, this algorithm has around 3% in precision and 70% in running time improved to the BiMM.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (25)
CITATIONS (11)