A comparison of statistical association measures for identifying dependency-based collocations in various languages.

Association (psychology) Adjective Word Association Parallel corpora
DOI: 10.18653/v1/w19-5107 Publication Date: 2019-09-12T18:39:56Z
ABSTRACT
This paper presents an exploration of different statistical association measures to automatically identify collocations from corpora in English, Portuguese, and Spanish. To evaluate the impact metrics we manually annotated with three syntactic patterns (adjective-noun, verb-object nominal compounds). We took advantage PARSEME 1.1 Shared Task by selecting a subset 155k tokens referred languages, which 1,526 corresponding Lexical Functions according Meaning-Text Theory. Using resulting gold-standard, have carried out comparison between frequency data several well-known measures, both symmetric asymmetric. The results show that combination dependency triples raw information is as powerful best most languages. Furthermore, despite asymmetric behaviour collocations, directional approaches perform worse than ones extraction these phraseological combinations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (5)