Automatic document metadata extraction using support vector machines
Header
DOI:
10.5555/827140.827146
Publication Date:
2003-05-27
AUTHORS (6)
ABSTRACT
Automatic metadata generation provides scalability and usability for digital libraries their collections. Machine learning methods offer robust adaptable automatic extraction. We describe a support vector machine classification-based method extraction from header part of research papers show that it outperforms other on the same task. The first classifies each line into one or more 15 classes. An iterative convergence procedure is then used to improve classification by using predicted class labels its neighbor lines in previous round. Further done seeking best chunk boundaries line. found discovery use structural patterns data domain based word clustering can performance. appropriate feature normalization also greatly improves Our was originally designed quality Citeseer [S. Lawrence et al., (1999)] EbizSearch [Y. Petinot (2003)]. believe be generalized libraries.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....