- Natural Language Processing Techniques
- Geographic Information Systems Studies
- Topic Modeling
- Data Management and Algorithms
- Semantic Web and Ontologies
- Web Data Mining and Analysis
- Multimodal Machine Learning Applications
- Advanced Text Analysis Techniques
- Advanced Image and Video Retrieval Techniques
- Human Mobility and Location-Based Analysis
- Sentiment Analysis and Opinion Mining
- Data-Driven Disease Surveillance
- Information Retrieval and Search Behavior
- Biomedical Text Mining and Ontologies
- Data Quality and Management
- Expert finding and Q&A systems
- Data Visualization and Analytics
- Domain Adaptation and Few-Shot Learning
- Automated Road and Building Extraction
- Motivation and Self-Concept in Sports
- Recommender Systems and Techniques
- Mobile Crowdsensing and Crowdsourcing
- Text and Document Classification Technologies
- Speech and dialogue systems
- Context-Aware Activity Recognition Systems
University of Lisbon
2016-2025
Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento
2016-2025
Instituto Superior Técnico
2012-2024
Artificial Intelligence in Medicine (Canada)
2024
Instituto Politécnico de Lisboa
2007-2023
Institute for Systems Engineering and Computers
2015-2023
Universidade Estadual Paulista (Unesp)
2022-2023
University of Copenhagen
2021-2023
Universidade Federal do Pará
2023
Universitat de les Illes Balears
2020-2022
The analysis of human location histories is currently getting an increasing attention, due to the widespread usage geopositioning technologies such as GPS, and also online location-based services that allow users share this information. Tasks prediction movement can be addressed through these data, in turn offering support for more advanced applications, adaptive mobile with proactive context-based functions. This paper presents hybrid method predicting mobility on basis Hidden Markov Models...
Recent advances in image captioning have focused on scaling the data and model size, substantially increasing cost of pretraining finetuning. As an alternative to large models, we present Smallcap, which generates a caption conditioned input related captions retrieved from datastore. Our is lightweight fast train, as only learned parameters are newly introduced cross-attention layers between pre-trained CLIP encoder GPT-2 decoder. Smallcap can transfer new domains without additional...
Abstract This survey article describes previous research addressing text‐based document geocoding, i.e. the task of predicting geospatial coordinates latitude and longitude, that best correspond to an entire document, based on its textual contents. We describe (1) early geocoding systems use heuristics over place names mentioned in text (e.g. cities states), (2) probabilistic language modeling approaches, where generative models are built for different regions world (usually considering a...
Image captioning and cross-modal retrieval are examples of tasks that involve the joint analysis visual linguistic information. In connection to remote sensing imagery, these can help non-expert users in extracting relevant Earth observation information for a variety applications. Still, despite some previous efforts, development application vision language models domain have been hindered by relatively small size available datasets used studies. this work, we propose RS-CapRet, Vision...
Toponym matching, i.e. pairing strings that represent the same real-world location, is a fundamental problemfor several practical applications. The current state-of-the-art relies on string similarity metrics, either specifically developed for matching place names or integrated within methods combine multiple metrics. However, these all rely common sub-strings in order to establish similarity, and they do not effectively capture character replacements involved toponym changes due...
Inspired by retrieval-augmented language generation and pretrained Vision Language (V&L) encoders, we present a new approach to image captioning that generates sentences given the input set of captions retrieved from datastore, as opposed alone. The encoder in our model jointly processes using V&L BERT, while decoder attends multimodal representations, benefiting extra textual evidence captions. Experimental results on COCO dataset show can be effectively formulated this perspective. Our...
This paper discusses the problem of automatically identifying language a given Web document. Previous experiments in guessing focused on analyzing "coherent" text sentences, whereas this work was validated texts from Web, often presenting harder problems. Our "guessing" software uses well-known n-gram based algorithm, complemented with heuristics and new similarity measure. Both fast robust, has been use for past two years, as part crawler search engine. Experiments show that it achieves...
Semi-supervised bootstrapping techniques for relationship extraction from text iteratively expand a set of initial seed relationships while limiting the semantic drift.We research using word embeddings to find similar relationships.Experimental results show that relying on achieves better performance task extracting four types collection newswire documents when compared with baseline TF-IDF relationships.
The field of Spatial Humanities has advanced substantially in the past years. identification and extraction toponyms spatial information mentioned historical text collections allowed its use innovative ways, making possible application analysis mapping these places with Geographic Information Systems. For instance, automated place name is nowadays Named Entity Recognition (NER) systems. Statistical NER methods based on supervised learning, particular, are highly successful modern datasets....
This paper addresses document indexing and retrieval using geographical location. It discusses possible structures result ranking algorithms, surveying known approaches showing how they can be combined to build an effective Geo-IR system.
This paper describes an approach for resolving user identifiers in the context of social networks, using techniques from area duplicate record detection [1]. We reduce identity resolution problem into a binary classification task, where goal is to classify pairs as either belonging same person or not. The are represented feature vectors that combine multiple sources similarity (e.g. between profile information, descriptions people's interests, and friend lists). report on thorough evaluation...
Expert finding is an information retrieval task that concerned with the search for most knowledgeable people respect to a specific topic, and based on documents describe people's activities. The involves taking user query as input returning list of who are sorted by their level expertise query. Despite recent interest in area, current state-of-the-art techniques lack principled approaches optimally combining different sources evidence. This article proposes two frameworks multiple estimators...
Several tasks related to geographical information retrieval and the sciences involve toponym matching, that is, problem of matching place names share a common referent. In this article, we present results wide-ranging evaluation on performance different string similarity metrics over task. We also report experiments involving usage supervised machine learning for combining multiple metrics, which has natural advantage avoiding manual tuning thresholds. Experiments with very large dataset...
We present the approach followed by INESC-ID in SemEval 2015 Twitter Sentiment Analysis challenge, subtask E. The goal was to determine strength of association terms with positive sentiment.Using two labeled lexicons, we trained a regression model predict sentiment polarity and intensity words phrases.Terms were represented as word embeddings induced an unsupervised fashion from corpus tweets.Our system attained top ranking submission, attesting general adequacy proposed approach.
Remote sensing image captioning involves generating a concise textual description for an input aerial image. The task has received significant attention, and several recent proposals are based on neural encoder-decoder models. Most previous methods trained to generate discrete outputs corresponding word tokens that match the reference sentences word-by-word, thereby optimizing generation locally at token-level instead of globally sentence-level. This paper explores alternative method...
This paper describes an approach for performing recognition and resolution of place names mentioned over the descriptive metadata records typical digital libraries. Our exploits evidence provided by existing structured attributes within to support name resolution, in order achieve better results than just using lexical from textual values these attributes. In records, is very often insufficient this task, since short sentences simple expressions are predominant. implementation uses a...