- Personal Information Management and User Behavior
- Topic Modeling
- Data Quality and Management
- Context-Aware Activity Recognition Systems
- Artificial Intelligence in Law
- Sentiment Analysis and Opinion Mining
- Natural Language Processing Techniques
- Hate Speech and Cyberbullying Detection
- Computational and Text Analysis Methods
- Text and Document Classification Technologies
- Climate Change and Health Impacts
- Imbalanced Data Classification Techniques
- Cognitive Computing and Networks
- Business Process Modeling and Analysis
- Web Data Mining and Analysis
- Chemical and Physical Properties of Materials
- Advanced Text Analysis Techniques
- Information Retrieval and Search Behavior
- Opportunistic and Delay-Tolerant Networks
- Language, Metaphor, and Cognition
- Service-Oriented Architecture and Web Services
- FinTech, Crowdfunding, Digital Finance
UniBrasil Centro Universitário
2024
Universidade Federal do Amazonas
2022-2023
Rutgers, The State University of New Jersey
2014-2022
Universidade Federal Fluminense
2002
Social Media platforms, vital for debate and communication, also grapple with misinformation hateful comments. This work examines the detection of hate speech in Portuguese, contemplating its unique linguistic cultural nuances. Leveraging Transformer-based models different training activation strategies, eight variations architecture, size, pre-training corpora are evaluated. Our findings show that, even though large generative enhanced prompts exhibited promising results, tuned small...
Digital storage now acts as an archive of the memories users worldwide, keeping record data well context in which was acquired. The massive amount available and fact that it is fragmented across many services (e.g., Facebook) devices laptop) make very difficult for to find specific pieces information they remember having stored or accessed. Unifying this into a single set includes contextual would allow much better indexing searching personal information. Thus, we have developed extraction...
A large number of personal digital traces is constantly generated or available online from a variety sources, such as social media, calendars, purchase history, etc. These data are fragmented and highly heterogeneous, raising the need for an integrated view user's activities. Prior research in Personal Information Management focused mostly on creating static model world (objects their relationships). We argue that dynamic also helpful making sense collections related documents, propose...
A significant challenge in the legal domain is to organize and summarize a constantly growing collection of documents, uncovering hidden topics, or themes, that later can support tasks such as case retrieval judgment prediction. This massive amount digital combined with inherent complexity judiciary systems worldwide, presents promising scenario for Machine Learning solutions, mainly those taking advantage all advancements area Natural Language Processing (NLP). It this Jusbrasil, largest...
ABSTRACT Digital traces of our lives are now constantly produced by various connected devices, internet services and interactions. Our actions result in a multitude heterogeneous data objects, or traces, kept locations the cloud on local devices. Users have very few tools to organize, understand, search digital they produce. We propose simple but flexible model aggregate, find personal information within collection user's traces. uses as basic dimensions six questions: what, when, where,...
In Brazil, some cases of hate speech can be qualified as a crime. However, identifying and categorizing offensive comments among the vast number interactions on social media is complex. Automatic detection sensitive content an expanding field, but it faces obstacles due to subtleties language varied forms expression. Brazil's rich cultural diversity, shaped by its experiences, culture, traditions, history colonization, introduces additional challenges. This linguistic diversity plays crucial...
As Redes Sociais, que desempenham um papel significativo no debate e na comunicação moderna, enfrentam o desafio contemporâneo do grande volume desordenado de conteúdo nocivo, como discurso ódio desinformação. Este artigo aborda a detecção em português, considerando suas particularidades linguísticas nuances culturais. Utilizando-se modelos derivados Transformers, juntamente com diversas estratégias treinamento ativação, são investigados nove variações arquitetura, tamanho corpora...
Este artigo descreve uma abordagem baseada em tópicos para o problema de recuperação casos jurídicos (legal case retrieval). O método consiste duas fases: filtragem e ordenação. Na primeira fase, técnica modelagem é aplicada todo conjunto dados selecionar um inicial candidatos cada consulta. segunda função ordenação usada produzir lista ordenada relevantes a consulta fornecida. Resultados experimentais obtidos utilizando três diferentes funções ordenação, com coleções idiomas, indicam que...
Várias métricas de avaliação para geração texto foram propostas nos últimos anos. No entanto, muitas questões surgiram sobre o quão bem elas podem avaliar a acurácia e qualidade do gerado. Neste trabalho, estudamos como algumas das mais populares se comportam ao lidar com tarefa sumarização no domínio jurídico em Português. Mais especificamente, avaliamos cinco -- ROUGE, BERTScore, BARTScore, BLEURT MoverScore --, usando um dataset contendo 892 acórdãos Superior Tribunal Justiça. Cada item é...
Digital traces of our lives are now constantly produced by various connected devices, internet services and interactions. Our actions result in a multitude heterogeneous data objects, or traces, kept locations the cloud on local devices. Users have very few tools to organize, understand, search digital they produce. We propose simple but flexible model aggregate, find personal information within collection user's traces. uses as basic dimensions six questions: what, when, where, who, why,...
Personal digital traces are constantly produced by connected devices, internet services and interactions. These typically small, heterogeneous stored in various locations the cloud or on local making it a challenge for users to interact with search their own data. By adopting multidimensional data model based six natural questions -- what, when, where, who, why how represent unify personal traces, we can propose learning-to-rank approach using state of art LambdaMART algorithm...
Personal digital traces are constantly produced by connected devices, internet services and interactions.These typically small, heterogeneous stored in various locations the cloud or on local making it a challenge for users to interact with search their own data.By adopting multidimensional data model based six natural questions -what, when, where, who, why howto represent unify personal traces, we propose learning-to-rank approach using state of art LambdaMART algorithm frequency-based...
O desempenho de programas paralelos é frequentemente afetado por diferentes fatores dinâmicos desequilíbrio carga. Um fator muito comum, presente nos ambientes não dedicados, a existência outros processos concorrendo com aplicação paralela pelos recursos computacionais. A heterogeneidade e variação desta carga externa impede que seja feita uma distribuição prévia equilibrada das tarefas da paralela. uso estratégia balanceamento adequada fundamemal para redução dos efeitos causados este...
Sentiment analysis in tweets is a research field of great importance, mainly due to the popularity Twitter. However, collecting and annotating an expensive time-consuming task, making that some domains have only limited set labeled data. A promising strategy handle this issue leverage rich data select instances enrich target datasets. This paper proposes different strategies for selecting from source datasets order improve performance classifiers trained with dataset. Different approaches...