Grzegorz Chrupała

ORCID: 0000-0001-9498-6912
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Speech and dialogue systems
  • Speech Recognition and Synthesis
  • Authorship Attribution and Profiling
  • Hate Speech and Cyberbullying Detection
  • Domain Adaptation and Few-Shot Learning
  • Subtitles and Audiovisual Media
  • Text Readability and Simplification
  • Neural Networks and Applications
  • Software Engineering Research
  • Explainable Artificial Intelligence (XAI)
  • Semantic Web and Ontologies
  • Advanced Image and Video Retrieval Techniques
  • Language Development and Disorders
  • Text and Document Classification Technologies
  • Advanced Text Analysis Techniques
  • Advanced Research in Systems and Signal Processing
  • Language, Metaphor, and Cognition
  • Web Data Mining and Analysis
  • Spam and Phishing Detection
  • Handwritten Text Recognition Techniques
  • Algorithms and Data Compression
  • Mobile Crowdsensing and Crowdsourcing

Tilburg University
2014-2023

Institute for Cognitive Science Studies
2018-2021

University of Antwerp
2021

University of Copenhagen
2018

Microsoft Research (United Kingdom)
2018

Microsoft Research Montréal (Canada)
2018

Dublin City University
2006-2015

Saarland University
2010-2012

Japan External Trade Organization
2005

Universitat Rovira i Virgili
2003

We present novel methods for analyzing the activation patterns of recurrent neural networks from a linguistic point view and explore types structure they learn. As case study, we use standard standalone language model, multi-task gated network architecture consisting two parallel pathways with shared word embeddings: The Visual pathway is trained on predicting representations visual scene corresponding to an input sentence, Textual predict next in same sentence. propose method estimating...

10.1162/coli_a_00300 article EN cc-by-nc-nd Computational Linguistics 2017-09-11

We present a visually grounded model of speech perception which projects spoken utterances and images to joint semantic space. use multi-layer recurrent highway network the temporal nature speech, show that it learns extract both form meaning-based linguistic knowledge from input signal. carry out an in-depth analysis representations used by different components trained encoding aspects tends become richer as we go up hierarchy layers, whereas form-related language initially increase then...

10.18653/v1/p17-1057 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2017-01-01

Analysis methods which enable us to better understand the representations and functioning of neural models language are increasingly needed as deep learning becomes dominant approach in NLP. Here we present two based on Representational Similarity (RSA) Tree Kernels (TK) allow directly quantify how strongly information encoded activation patterns corresponds represented by symbolic structures such syntax trees. We first validate our case a simple synthetic for arithmetic expressions with...

10.18653/v1/p19-1283 preprint EN cc-by 2019-01-01

Tweets often contain a large proportion of abbreviations, alternative spellings, novel words and other non-canonical language.These features are problematic for standard language analysis tools it can be desirable to convert them canonical form.We propose text normalization model based on learning edit operations from labeled data while incorporating induced unlabeled via character-level neural embeddings.The embeddings generated using an Simple Recurrent Network.We find that enriching the...

10.3115/v1/p14-2111 article EN cc-by 2014-01-01

We study the representation and encoding of phonemes in a recurrent neural network model grounded speech. use which processes images their spoken descriptions, projects visual auditory representations into same semantic space. perform number analyses on how information about individual is encoded MFCC features extracted from speech signal, activations layers model. Via experiments with phoneme decoding discrimination we show that are most salient lower model, where low-level signals...

10.18653/v1/k17-1037 preprint EN cc-by 2017-01-01

Abstract The Empirical Methods in Natural Language Processing (EMNLP) 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing understanding the inner-workings representations acquired by neural models of language. Approaches included: systematic manipulation input networks investigating impact on their performance, testing whether interpretable knowledge can be decoded from intermediate networks, proposing modifications network architectures...

10.1017/s135132491900024x article EN Natural Language Engineering 2019-07-01

This survey provides an overview of the evolution visually grounded models spoken language over last 20 years. Such are inspired by observation that when children pick up a language, they rely on wide range indirect and noisy clues, crucially including signals from visual modality co-occurring with utterances. Several fields have made important contributions to this approach modeling or mimicking process learning language: Machine Learning, Natural Language Speech Processing, Computer Vision...

10.1613/jair.1.12967 article EN cc-by Journal of Artificial Intelligence Research 2022-02-18

Community Question Answering websites (CQA) offer a new opportunity for users to provide, search and share knowledge. Although the idea of receiving direct, targeted response question sounds very attractive, quality itself can have an important effect on likelihood getting useful answers. High questions improve CQA experience therefore it is essential forums better understand what characterizes that are more appealing forum community. In this survey, we review existing research in websites....

10.1145/2830544.2830547 article EN ACM SIGKDD Explorations Newsletter 2015-09-29

Grzegorz Chrupała, Ákos Kádár, Afra Alishahi. Proceedings of the 53rd Annual Meeting Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 2015.

10.3115/v1/p15-2019 article EN 2015-01-01

Sign language machine translation (SLMT)—the task of automatically translating between sign and spoken languages or languages—is a complex within the field NLP. Its multi-modal non-linear nature require joint efforts (SL) linguists, technical experts, SL users. Effective user involvement is challenge that can be addressed through co-creation. Co-creation has been formally defined in many fields, e.g., business, marketing, educational, others; however, NLP particular SLMT, there no formal,...

10.3390/info16040290 article EN cc-by Information 2025-04-04

Tokenization is widely regarded as a solved problem due to the high accuracy that rulebased tokenizers achieve. But rule-based are hard maintain and their rules language specific. We show highaccuracy word sentence segmentation can be achieved by using supervised sequence labeling on character level combined with unsupervised feature learning. evaluated our method three languages obtained error rates of 0.27 ‰ (English), 0.35 (Dutch) 0.76 (Italian) for best models.

10.18653/v1/d13-1146 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2013-01-01

A widespread approach to processing spoken language is first automatically transcribe it into text. An alternative use an end-to-end approach: recent works have proposed learn semantic embeddings of from images with captions, without intermediate transcription step. We propose multitask learning exploit existing transcribed speech within the setting. describe a three-task architecture which combines objectives matching captions corresponding images, text, and text images. show that addition...

10.18653/v1/p19-1647 preprint EN cc-by 2019-01-01

This paper describes the DCU-UVT team's participation in Language Identification Code-Switched Data shared task Workshop on Computational Approaches to Code Switching.Wordlevel classification experiments were carried out using a simple dictionary-based method, linear kernel support vector machines (SVMs) with and without contextual clues, k-nearest neighbour approach.Based these experiments, we select our SVM-based system clues as final present results for Nepali-English Spanish-English datasets.

10.3115/v1/w14-3915 article EN cc-by 2014-01-01

Self-attention weights and their transformed variants have been the main source of information for analyzing token-to-token interactions in Transformer-based models. But despite ease interpretation, these are not faithful to models’ decisions as they only one part an encoder, other components encoder layer can considerable impact on mixing output representations. In this work, by expanding scope analysis whole block, we propose Value Zeroing, a novel context score customized Transformers...

10.18653/v1/2023.eacl-main.245 article EN cc-by 2023-01-01

The majority of research on extracting missing user attributes from social media profiles use costly hand-annotated labels for supervised learning. Distantly methods exist, although these generally rely knowledge gathered using external sources. This paper demonstrates the effectiveness gathering distant self-reported gender Twitter simple queries. We confirm reliability this query heuristic by comparing with manual annotation. Moreover, supervision, we demonstrate competitive model...

10.18653/v1/w17-4407 article EN cc-by 2017-01-01

Angeliki Lazaridou, Grzegorz Chrupała, Raquel Fernández, Marco Baroni. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.

10.18653/v1/n16-1043 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

Given the fast development of analysis techniques for NLP and speech processing systems, few systematic studies have been conducted to compare strengths weaknesses each method. As a step in this direction we study case representations phonology neural network models spoken language. We use two commonly applied analytical techniques, diagnostic classifiers representational similarity analysis, quantify what extent activation patterns encode phonemes phoneme sequences. manipulate factors that...

10.18653/v1/2020.acl-main.381 preprint EN cc-by 2020-01-01

Abstract Recent computational models of the acquisition spoken language via grounding in perception exploit associations between and visual modalities learn to represent speech data a joint vector space. A major unresolved issue from point ecological validity is training data, typically consisting images or videos paired with descriptions what depicted. Such setup guarantees an unrealistically strong correlation data. In real world coupling linguistic modality loose, often confounded by...

10.1162/tacl_a_00498 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

Learning word representations has recently seen much success in computational linguistics. However, assuming sequences of tokens as input to linguistic analysis is often unjustified. For many languages segmentation a non-trivial task and naturally occurring text sometimes mixture natural language strings other character data. We propose learn directly from raw by training Simple recurrent Network predict the next text. The network uses its hidden layer evolve abstract it sees. To demonstrate...

10.48550/arxiv.1309.4628 preprint EN other-oa arXiv (Cornell University) 2013-01-01

Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate detail which conditions affect the performance of this type grounded language learning model. We show that multilingual training improves over bilingual training, and low-resource languages benefit from with higher-resource languages. demonstrate a model can be trained equally well on either translations or comparable sentence pairs, annotating...

10.18653/v1/k18-1039 article EN cc-by 2018-01-01
Coming Soon ...