NFDI4DS | UHH-SEMS - Publication Details

Marcos García

ORCID: 0000-0002-6557-0210

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5035151979

Research Areas

Natural Language Processing Techniques
Topic Modeling
Semantic Web and Ontologies
Text Readability and Simplification
linguistics and terminology studies
Lexicography and Language Studies
Tracheal and airway disorders
Second Language Acquisition and Learning
Speech and dialogue systems
Spanish Linguistics and Language Studies
Sentiment Analysis and Opinion Mining
Cancer Immunotherapy and Biomarkers
Translation Studies and Practices
Cancer Genomics and Diagnostics
Galician and Iberian cultural studies
Basque language and culture studies
Linguistic Studies and Language Acquisition
Language, Metaphor, and Cognition
Language and cultural evolution
Linguistic Variation and Morphology
Sports and Physical Education Studies
Authorship Attribution and Profiling
Web Data Mining and Analysis
Mathematics, Computing, and Information Processing
Interpreting and Communication in Healthcare

Universidade de Santiago de Compostela
2012-2024

Center for Research in Molecular Medicine and Chronic Diseases
2015-2024

University of Alicante
2022

Universidad Rey Juan Carlos
2017-2021

Universidade Federal do Rio Grande do Sul
2021

University of Sheffield
2021

Universidade da Coruña
2016-2020

Secretaria da Educação do Estado da Bahia
2020

San Antonio College
2020

Universitat Politècnica de Catalunya
2006-2019

Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets

OPENALEX - Publications

Pablo Gamallo Marcos García

This article describes a strategy based on naive-bayes classifier for detecting the polarity of English tweets.The experiments have shown that best performance is achieved by using binary between just two sharp categories: positive and negative.In addition, in order to detect tweets with without polarity, system makes use very basic rule searchs words within analysed tweets/texts.When provided lexicon multiwords it achieves 63% F-score.

10.3115/v1/s14-2026 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2014-01-01

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding

OPENALEX - Publications

Harish Tayyar Madabushi Edward Gow-Smith Marcos García Carolina Scarton Marco Idiart and 1 more

Harish Tayyar Madabushi, Edward Gow-Smith, Marcos Garcia, Carolina Scarton, Marco Idiart, Aline Villavicencio. Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022). 2022.

10.18653/v1/2022.semeval-1.13 article EN cc-by Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) 2022-01-01

Probing for idiomaticity in vector space models

OPENALEX - Publications

Marcos García Tiago Kramer Vieira Carolina Scarton Marco Idiart Aline Villavicencio

Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio. Proceedings of the 16th Conference European Chapter Association for Computational Linguistics: Main Volume. 2021.

10.18653/v1/2021.eacl-main.310 article EN cc-by 2021-01-01

A comparison of statistical association measures for identifying dependency-based collocations in various languages.

OPENALEX - Publications

Marcos García Marcos García Salido Margarita Alonso Ramos

This paper presents an exploration of different statistical association measures to automatically identify collocations from corpora in English, Portuguese, and Spanish. To evaluate the impact metrics we manually annotated with three syntactic patterns (adjective-noun, verb-object nominal compounds). We took advantage PARSEME 1.1 Shared Task by selecting a subset 155k tokens referred languages, which 1,526 corresponding Lexical Functions according Meaning-Text Theory. Using resulting...

10.18653/v1/w19-5107 article EN cc-by 2019-01-01

Acceso a la justicia y perspectiva de género en sectores sociales vulnerables del Valle de Uco, Mendoza

OPENALEX - Publications

Rubén Alberto Ippoliti Diego Olaiz Marcos García Dolores Godoy Rocío Lorenzo

La zona del Valle de Uco en la Provincia Mendoza, abarca tres departamentos: Tupungato, San Carlos y Tunuyán. intención este artículo es analizar el acceso a justicia que tienen los sectores denominamos “vulnerables”, tales como: tercera edad, las personas jóvenes, minorías sexuales, con escasos recursos económicos, gente trabajadora informalidad mujeres, entre muchos otros. En último caso ante avances legislación argentina, creemos perspectiva género encierra, por sí misma, una...

10.59872/icu.v8i12.534 article ES Investigación Ciencia y Universidad 2025-02-27

LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction

OPENALEX - Publications

Pablo Gamallo Marcos García César Piñeiro Rodrigo Martínez-Castaño Juan C. Pichel

This paper presents LinguaKit, a multilingual suite of tools for analysis, extraction, annotation and linguistic correction, as well its integration into Big Data infrastructure. LinguaKit allows the user to perform different tasks such PoS-tagging, syntactic parsing, coreference resolution (among others), including applications relation sentiment summarization, extraction multiword expressions, or entity linking DBpedia. Most modules work in four languages: Portuguese, Spanish, English,...

10.1109/snams.2018.8554689 article EN 2018-10-01

LinguaKit: uma ferramenta multilingue para a análise linguística e a extração de informação

OPENALEX - Publications

Pablo Gamallo Marcos García

Este artigo apresenta LinguaKit, uma suite multilingue de ferramentas análise, extração, anotação e correção linguísticas. LinguaKit permite realizar tarefas tão diversas como a lematização, etiquetagem morfossintática ou análise sintática (entre outras), incluindo também aplicações para sentimentos (ou minaria opiniões), extração termos multipalavra, concetual ligação recursos enciclopédicos tais DBpedia. A maior parte dos módulos funcionam quatro variedades linguísticas: português,...

10.21814/lm.9.1.243 article PT cc-by Linguamática 2017-06-28

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels

OPENALEX - Publications

Marcos García Tiago Kramer Vieira Carolina Scarton Marco Idiart Aline Villavicencio

Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, Aline Villavicencio. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.212 article EN cc-by 2021-01-01

Investigating Idiomaticity in Word Representations

OPENALEX - Publications

Wei He Tiago Kramer Vieira Marcos García Carolina Scarton Marco Idiart and 1 more

Abstract Idiomatic expressions are an integral part of human languages, often used to express complex ideas in compressed or conventional ways (e.g. eager beaver as a keen and enthusiastic person). However, their interpretations may not be straightforwardly linked the meanings individual components isolation this have impact for compositional approaches. In paper, we investigate what extent word representation models able go beyond combinations capture multiword expression idiomaticity some...

10.1162/coli_a_00546 article EN cc-by-nc-nd Computational Linguistics 2024-11-19

Using bilingual word-embeddings for multilingual collocation extraction

OPENALEX - Publications

Marcos García Marcos García Salido Margarita Alonso Ramos

This paper presents a new strategy for multilingual collocation extraction which takes advantage of parallel corpora to learn bilingual word-embeddings. Monolingual candidates are retrieved using Universal Dependencies, while the distributional models then applied search equivalents elements each in target languages. The proposed method extracts not only with direct translation between languages, but also other cases where collocations two languages literal translations other. Several...

10.18653/v1/w17-1703 article EN cc-by 2017-01-01

New treebank or repurposed? On the feasibility of cross-lingual parsing of Romance languages with Universal Dependencies

OPENALEX - Publications

Marcos García Carlos Gómez‐Rodríguez Miguel Á. Alonso

Abstract This paper addresses the feasibility of cross-lingual parsing with Universal Dependencies (UD) between Romance languages, analyzing its performance when compared to use manually annotated resources target languages. Several experiments take into account factors such as lexical distance source and varieties, impact delexicalization, combination different treebanks or adaptation language, among others. The results these evaluations show that direct application a parser from one...

10.1017/s1351324917000377 article EN Natural Language Engineering 2017-10-06

Coming Soon ...