Álan L. V. Guedes

ORCID: 0000-0003-0110-9975
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimedia Communication and Technology
  • Video Analysis and Summarization
  • Image and Video Quality Assessment
  • Human Pose and Action Recognition
  • Speech and dialogue systems
  • Advanced Image and Video Retrieval Techniques
  • Usability and User Interface Design
  • Peer-to-Peer Network Technologies
  • Advanced Image Processing Techniques
  • Web Applications and Data Management
  • Music and Audio Processing
  • Face recognition and analysis
  • Multimodal Machine Learning Applications
  • Telecommunications and Broadcasting Technologies
  • Image and Signal Denoising Methods
  • Digital Media Forensic Detection
  • Education and Digital Technologies
  • Video Surveillance and Tracking Methods
  • Speech and Audio Processing
  • Online Learning and Analytics
  • Biometric Identification and Security
  • Video Coding and Compression Technologies
  • Educational Games and Gamification
  • Power Systems and Technologies
  • Subtitles and Audiovisual Media

Pontifical Catholic University of Rio de Janeiro
2015-2024

University College London
2023

Universidade Federal da Paraíba
2011-2012

Traditionally, most multimedia content has been developed to stimulate two of the human senses, i.e., sight and hearing. Due recent technological advancements, however, innovative services have that provide more realistic, immersive, engaging experiences audience. Omnidirectional (i.e., 360-degree) video, for instance, is becoming increasingly popular. It allows viewer navigate full 360-degree view a scene from specific point. In particular, when consumed through head-mounted displays,...

10.1109/mmsp.2019.8901743 article EN 2019-09-01

The recent availability of consumer-level head-mounted displays and omnidirectional cameras has been driving an explosion 360 video content. Transforming the original recorded content in meaningful interactive multimedia presentations that support viewers tasks such as learning, entertainment, telepresence, however, is not trivial requires new tools. Such tools must provide easy-to-use authoring model for integration different media objects active user interface elements.In this paper, based...

10.1109/icmew46912.2020.9105958 article EN 2020-06-09

Most of information technologies courses are focused on exposing the traditional static data (e.g., SGBD) processing model and lack presenting stream model. Such can be applied to fields such as Internet Things, busyness finances, logistics smart cities) industry. Complex Event Processing consists a programming approach handle Data Stream Processing. It provide primitives process detect occurrence patterns in streams. This short course has objective complex event (CEP) means dealing with...

10.1145/3323503.3345028 article EN 2019-10-10

Methods based on Deep Learning became state-of-the-art in several Multimedia challenges. However, there is a gap of professionals to perform the industry. Therefore, this tutorial aims present grounds and ways develop applications that uses methods for video analysis tasks. Likewise, an opportunity students information technology qualify themselves. The main focus short course fundamentals technologies such DL.

10.1145/3323503.3345029 article EN 2019-10-10

Recent advances in technologies for speech, touch and gesture recognition have given rise to a new class of user interfaces that does not only explore multiple modalities but also allows interacting users. Even so, current declarative multimedia languages e.g. HTML, SMIL, NCL?support limited forms input (mainly keyboard mouse) single user. In this paper, we aim at studying how the NCL language could take advantage those technologies. To do revisit model behind NCL, named NCM (Nested Context...

10.1145/2976796.2976869 article EN 2016-11-07

An ergonomic evaluation is an observation of a person in order to identify musculoskeletal disorders (WMSDs) caused by prolonged or repeated harmful poses that adopts during work tasks. Nowadays, ergonomist other health professional perform such evaluations based on set posture rules and checklists, which can be subjective thus lead erroneous risk classifications. Moreover, this usually the patient environment. In make those more objective concise we propose method using mobile depth sensor....

10.1145/3323503.3349550 article EN 2019-10-10

The broad use of video capture and services for its storage transmission has enabled the production a massive volume data. This usage presents challenge in controlling type content that is loaded these services. Internet slang NSFW (Not Safe For Work) often used as warning media contain inappropriate content, such nudity, intense sexuality, violence, gore or other potentially disturbing subject matter. Convolutional Neural Network (CNNs) architectures, ConvNets, have become primary method...

10.1145/3323503.3360625 article EN 2019-10-10

Face recognition systems are present in many modern solutions and thousands of applications our daily lives. However, current not easily scalable, especially when it comes to the addition new targeted people. We propose a cluster-matching-based approach for face video. In approach, we use unsupervised learning cluster faces both dataset videos selected recognition. Moreover, design matching heuristic associate clusters sets that is also capable identifying belongs non-registered person. Our...

10.1145/3428658.3430967 preprint EN 2020-11-25

Anime character stickers consist of a detailed illustration characters that represents emotions. Generally, message apps let users save received stickers, but in some cases, the task manual searching for specific sticker may be frustrating when large amount them are stored. In this work, we propose CNN-based tool emotion indexing stickers. We built dataset with 12.668 labeled 3 classes (Sad, Happy and Angry). experiments, our model achieves 84.01% global f1-score. Additionally, describe...

10.1109/ism46123.2019.00071 article EN 2019-12-01

Many recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from lossy JPEG compression technique. Most them are built upon processing made on spatial domain. In this work, we propose a image decoder that is purely based frequency-to-frequency domain: it reads quantized DCT coefficients received low-quality bitstream and, using deep learning-based model, predicts missing in order recompose same with enhanced...

10.1145/3428658.3430966 article EN 2020-11-25

A large number of videos are uploaded on educational platforms every minute. Those responsible for any sensitive media by their users. An automated detection system to identify pornographic content could assist human workers pre-selecting suspicious videos. In this paper, we propose a multimodal approach adult detection. We use two Deep Convolutional Neural Networks extract high-level features from both image and audio sources video. Then, concatenate those evaluate the performance...

10.5753/cbie.sbie.2020.1253 article EN Anais do XXXI Simpósio Brasileiro de Informática na Educação (SBIE 2020) 2020-11-24

The recent availability of HMD devices (Head-Mounted Display) spurred advances in Virtual Reality and Augmented Reality. It allowed scenarios immersion users realistic environments, e.g., synthesized 3D environments for games simulation, omnidirectional videos, namely 360° videos. These are commonly used conjunction with home computers or controlled (e.g., museums). However, despite these scenarios, current interactive TV systems still focus on using content two-dimensional layouts. could...

10.1145/3428658.3430972 article EN 2020-11-25

A very large amount of multimedia data is continually being shared through social networks. In these public spaces, administrators are legally responsible for moderating and controlling the content uploaded or posted to their platforms. However, traffic media in some private such as chat rooms messaging platforms, example, often protected, sometimes by end-to-end encryption, therefore not subject this kind monitoring, which makes them prone spread inappropriate pornography, violence other...

10.1145/3470482.3479639 article EN 2021-09-28

Due to the extensive use of video-sharing platforms and services for their storage, amount such media on internet has become massive. This volume data makes it difficult control kind content that may be present in video files. One main concerns regarding is if an inappropriate subject matter, as nudity, violence, or other potentially disturbing content. More than telling a either appropriate inappropriate, also important identify which parts contain content, preserving would discarded simple...

10.48550/arxiv.1911.03974 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Audio quality degradation can have many causes. For musical applications, this fragmentation may lead to highly unpleasant experiences. Restoration algorithms be employed reconstruct missing parts of the audio in a similar way as for image reconstruction --- an approach called inpainting. Current state-of-the art methods inpainting cover limited scenarios, with well-defined gap windows and little variety genres. In work, we propose Deep-Learning-based (DL-based) method accompanied by dataset...

10.1145/3470482.3479635 article EN 2021-09-28

Recent advances in hardware and software technologies have given rise to a new class of human-computer interfaces that both explores multiple modalities allows for collaborating users. When compared the development traditional single-user WIMP (windows, icons, menus, pointer)-based applications, however, applications supporting seamless integration multimodal-multiuser interactions bring specification runtime requirements. With aim assisting multimedia integrate interactions, this paper: (1)...

10.1145/3342558.3345400 article EN 2019-09-19

Before the COVID-19 pandemic, video was already one of main media used on internet. During conferencing services became even more important, coming to be instruments enable most social and professional human activities. Given distancing policies, people are spending time using these online for working, learning, also leisure Videoconferencing software standard communication home-office remote learning. Nevertheless, there still a lot issues addressed platforms, many different aspects...

10.5753/webmedia_estendido.2020.13082 article EN 2020-11-30

Recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from lossy JPEG/MPEG compression technique. Most them are built upon processing made on spatial domain. In this work, we propose a MPEG video decoder that is purely based frequency-to-frequency domain: it reads quantized DCT coefficients received low-quality I-frames bitstream and, using deep learning-based model, predicts missing in order recompose same...

10.1109/ism.2020.00012 article EN 2020-12-01

Discovering and accessing specific content within educational video bases is a challenging task, mainly because of the abundance its diversity. Recommender systems are often used to enhance ability find select content. But, recommendation mechanisms, especially those based on textual information, exhibit some limitations, such as being error-prone manually created keywords or due imprecise speech recognition. This paper presents method for generating recommendations using deep face-features...

10.1109/ism.2020.00034 article EN 2020-12-01

Advances in interactive digital TV have enabled the introduction of application scenarios that explore Internet content and multiple device interaction. However, authorship interoperability for such is hampered by diversity technologies devices involved. This paper presents a software architecture portable store based on H.761 ITU recommendation IPTV services. The concept implemented as Ginga-NCL application, which retrieves executes other applications. description proposed architecture,...

10.1145/2526188.2526239 article EN 2013-11-05

This paper proposes an approach to integrate multimodal events--both user-generated, e.g., audio recognizer, motion sensors; and user-consumed, speech synthesizer, haptic synthesizer--into programming languages for the declarative specification of multimedia applications. More precisely, it presents extensions NCL (Nested Context Language) language. is standard language development interactive applications Brazilian Digital TV ITU-T Recommendation IPTV services. extended with features are...

10.1145/2820426.2820436 article EN 2015-10-27
Coming Soon ...