A comparison of statistical association measures for identifying dependency-based collocations in various languages.
Association (psychology)
Adjective
Word Association
Parallel corpora
DOI:
10.18653/v1/w19-5107
Publication Date:
2019-09-12T18:39:56Z
AUTHORS (3)
ABSTRACT
This paper presents an exploration of different statistical association measures to automatically identify collocations from corpora in English, Portuguese, and Spanish. To evaluate the impact metrics we manually annotated with three syntactic patterns (adjective-noun, verb-object nominal compounds). We took advantage PARSEME 1.1 Shared Task by selecting a subset 155k tokens referred languages, which 1,526 corresponding Lexical Functions according Meaning-Text Theory. Using resulting gold-standard, have carried out comparison between frequency data several well-known measures, both symmetric asymmetric. The results show that combination dependency triples raw information is as powerful best most languages. Furthermore, despite asymmetric behaviour collocations, directional approaches perform worse than ones extraction these phraseological combinations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (5)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....