Improved BIO-Based Chinese Automatic Abstract-Generation Model

Benchmark (surveying) Sample (material)
DOI: 10.1145/3643695 Publication Date: 2024-02-06T07:02:50Z
ABSTRACT
With its unique information-filtering function, text summarization technology has become a significant aspect of search engines and question-and-answer systems. However, existing models that include the copy mechanism often lack ability to extract important fragments, resulting in generated content suffers from thematic deviation insufficient generalization. Specifically, Chinese automatic using traditional generation methods loses semantics because reliance on word lists. To address these issues, we proposed novel BioCopy for task. By training tags predictive words reducing probability distribution range glossary, enhanced generate continuous segments, which effectively solves above problems. Additionally, applied reinforced canonicality inputs obtain better model results, making share sub-network weight parameters sparsing output reduce space prediction. further improve model’s performance, calculated bilingual evaluation understudy (BLEU) score English dataset CNN/DailyMail filter thresholds difficulty separation dependence list. We fully fine-tuned LCSTS task conducted small-sample experiments CSL dataset. also ablation The experimental results demonstrate optimized can learn semantic representation original than other performs well with small sample sizes.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (34)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....