Roger Zimmermann

ORCID: 0000-0002-7410-2590
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Video Analysis and Summarization
  • Image and Video Quality Assessment
  • Data Management and Algorithms
  • Peer-to-Peer Network Technologies
  • Caching and Content Delivery
  • Multimedia Communication and Technology
  • Video Coding and Compression Technologies
  • Particle physics theoretical and experimental studies
  • Advanced Data Storage Technologies
  • Topic Modeling
  • Multimodal Machine Learning Applications
  • Image Retrieval and Classification Techniques
  • Automated Road and Building Extraction
  • Speech and Audio Processing
  • Music and Audio Processing
  • High-Energy Particle Collisions Research
  • Geographic Information Systems Studies
  • Human Mobility and Location-Based Analysis
  • Traffic Prediction and Management Techniques
  • Sentiment Analysis and Opinion Mining
  • Advanced Text Analysis Techniques
  • Advanced Database Systems and Queries
  • Video Surveillance and Tracking Methods
  • Network Traffic and Congestion Control

National University of Singapore
2016-2025

Institute for Infocomm Research
2024

Singapore University of Technology and Design
2022

Indraprastha Institute of Information Technology Delhi
2018-2020

Indian Institute of Technology Delhi
2018-2020

Midas Multispeciality Hospital
2020

Bloomberg (United States)
2018-2020

Bridge University
2020

Indian Institute of Technology Guwahati
2018

Allen Institute for Artificial Intelligence
2018

Multimodal Sentiment Analysis is an active area of research that leverages multimodal signals for affective understanding user-generated videos. The predominant approach, addressing this task, has been to develop sophisticated fusion techniques. However, the heterogeneous nature creates distributional modality gaps pose significant challenges. In paper, we aim learn effective representations aid process fusion. We propose a novel framework, MISA, which projects each two distinct subspaces....

10.1145/3394171.3413678 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Emotion recognition in conversations is crucial for the development of empathetic machines. Present methods mostly ignore role inter-speaker dependency relations while classifying emotions conversations. In this paper, we address recognizing utterance-level dyadic conversational videos. We propose a deep neural framework, termed memory network, which leverages contextual information from conversation history. The framework takes multimodal approach comprising audio, visual and textual...

10.18653/v1/n18-1193 article EN cc-by 2018-01-01

In this survey, we present state-of-the-art bitrate adaptation algorithms for HTTP adaptive streaming (HAS). As a key distinction from other approaches, the in HAS are chiefly executed at each client, i.e., distributed manner. The objective of these is to ensure high quality experience (QoE) viewers presence bandwidth fluctuations due factors like signal strength, network congestion, reconvergence events, etc. While such common public Internet, they can also occur home networksor even...

10.1109/comst.2018.2862938 article EN IEEE Communications Surveys & Tutorials 2018-08-03

Emotion recognition in conversations is crucial for building empathetic machines. Present works this domain do not explicitly consider the inter-personal influences that thrive emotional dynamics of dialogues. To end, we propose Interactive COnversational memory Network (ICON), a multimodal emotion detection framework extracts features from conversational videos and hierarchically models self- inter-speaker into global memories. Such memories generate contextual summaries which aid...

10.18653/v1/d18-1280 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2018-01-01

The recent rapid development of urbanization and Internet things (IoT) encourages more research on Smart City in which computing devices are widely distributed huge amount dynamic real-time data collected processed. Although vast volume available for extracting new living patterns making urban plans, efficient processing instant decision still key issues, especially emergency situations requesting quick response with low latency. Fog Computing, as the extension Cloud enables tasks...

10.1109/bigmm.2016.53 article EN 2016-04-01

Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria. Proceedings of the 57th Annual Meeting Association for Computational Linguistics. 2019.

10.18653/v1/p19-1455 article EN cc-by 2019-01-01

In the last decade, smart contract security issues lead to tremendous losses, which has attracted increasing public attention both in industry and academia.Researchers have embarked on efforts with logic rules, symbolic analysis, formal analysis achieve encouraging results vulnerability detection tasks.However, existing tools are far from satisfactory.In this paper, we attempt utilize deep learning-based approach, namely bidirectional long-short term memory mechanism (BLSTM-ATT), aiming...

10.1109/access.2020.2969429 article EN cc-by IEEE Access 2020-01-01

Air pollution is a crucial issue affecting human health and livelihoods, as well one of the barriers to economic growth. Forecasting air quality has become an increasingly important endeavor with significant social impacts, especially in emerging countries. In this paper, we present novel Transformer termed AirFormer predict nationwide China, unprecedented fine spatial granularity covering thousands locations. decouples learning process into two stages: 1) bottom-up deterministic stage that...

10.1609/aaai.v37i12.26676 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Photo aesthetic quality evaluation is a fundamental yet under addressed task in computer vision and image processing fields. Conventional approaches are frustrated by the following two drawbacks. First, both local global spatial arrangements of regions play an important role photo aesthetics. However, existing rules, e.g., visual balance, heuristically define which distribution among salient aesthetically pleasing. Second, it difficult to adjust cues from multiple channels automatically...

10.1109/tip.2014.2303650 article EN IEEE Transactions on Image Processing 2014-01-31

HTTP adaptive streaming (HAS) is being adopted with increasing frequency and becoming the de-facto standard for video streaming. However, client-driven, on-off adaptation behavior of HAS results in uneven bandwidth competition this exacerbated when a large number clients share same bottleneck network link compete available bandwidth. With each client independently strives to maximize its individual bandwidth, which leads decrease end-user quality experience (QoE). The causes scalability...

10.1145/2964284.2964332 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Anticipating the future motions of 3D articulate objects is challenging due to its non-linear and highly stochastic nature. Current approaches typically represent skeleton an object as a set joints, which unfortunately ignores relationship between fails encode fine-grained anatomical constraints. Moreover, conventional recurrent neural networks, such LSTM GRU, are employed model motion contexts, inherently have difficulties in capturing long-term dependencies. To address these problems, we...

10.1109/cvpr.2019.01024 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Accurate and infrastructure-free indoor positioning can be very useful in a variety of applications. However, most existing approaches (e.g., WiFi infrared-based methods) for localization heavily rely on infrastructure, which is neither scalable nor pervasively available. In this paper, we propose novel tracking approach, termed VMag, that does not require any infrastructure assistance. The user localized while simply holding smartphone. To the best our knowledge, proposed method first...

10.1109/tmm.2016.2636750 article EN IEEE Transactions on Multimedia 2016-12-07

HTTP adaptive streaming (HAS) is receiving much attention from both industry and academia as it has become the de facto approach to stream media content over Internet. Recently, we proposed a architecture called SDNDASH [1] address HAS scalability issues including video instability, quality of experience (QoE) unfairness, network resource underutilization, while maximizing per player QoE. While was significant step forward, there were three unresolved limitations: 1) did not scale well when...

10.1109/tmm.2017.2733344 article EN IEEE Transactions on Multimedia 2017-07-28

Debanjan Mahata, John Kuriakose, Rajiv Ratn Shah, Roger Zimmermann. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 2018.

10.18653/v1/n18-2100 article EN cc-by 2018-01-01

The literature in automated sarcasm detection has mainly focused on lexical, syntactic and semantic-level analysis of text. However, a sarcastic sentence can be expressed with contextual presumptions, background commonsense knowledge. In this paper, we propose CASCADE (a ContextuAl SarCasm DEtector) that adopts hybrid approach both content context-driven modeling for online social media discussions. For the latter, aims at extracting information from discourse discussion thread. Also, since...

10.48550/arxiv.1805.06413 preprint EN cc-by-sa arXiv (Cornell University) 2018-01-01

Given input images, scene graph generation (SGG) aims to produce comprehensive, graphical representations describing visual relationships among salient objects. Recently, more efforts have been paid the long tail problem in SGG; however, imbalance fraction of missing labels different classes, or reporting bias, exacerbating is rarely considered and cannot be solved by existing debiasing methods. In this paper we show that, due labels, SGG can viewed as a "Learning from Positive Unlabeled...

10.1145/3474085.3475297 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Urban flow prediction benefits smart cities in many aspects, such as traffic management and risk assessment. However, a critical prerequisite for these is having fine-grained knowledge of the city. Thus, unlike previous works that are limited to coarse-grained data, we extend horizon urban fine granularity which raises specific challenges: 1) predominance inter-grid transitions observed data makes it more complicated capture spatial dependencies among grid cells at global scale; 2) very...

10.1145/3442381.3449792 article EN 2021-04-19

Deep learning models are modern tools for spatio-temporal graph (STG) forecasting. Though successful, we argue that data scarcity is a key factor limiting their recent improvements. Meanwhile, contrastive has been an effective method providing self-supervision signals and addressing in various domains. In view of this, one may ask: can leverage the additional from to alleviate scarcity, so as benefit STG forecasting? To answer this question, present first systematic exploration on...

10.1145/3557915.3560939 article EN Proceedings of the 30th International Conference on Advances in Geographic Information Systems 2022-11-01

Spatio-temporal graph neural networks (STGNN) have emerged as the dominant model for spatio-temporal (STG) forecasting. Despite their success, they fail to intrinsic uncertainties within STG data, which cripples practicality in downstream tasks decision-making. To this end, paper focuses on probabilistic forecasting, is challenging due difficulty modeling and complex ST dependencies. In study, we present first attempt generalize popular de-noising diffusion models STGs, leading a novel...

10.1145/3589132.3625614 article EN 2023-11-13

Panoptic Scene Graph Generation (PSG) parses objects and predicts their relationships (predicate) to connect human language visual scenes. However, different preferences of annotators semantic overlaps between predicates lead biased predicate annotations in the dataset, i.e. for same object pairs. Biased make PSG models struggle constructing a clear decision plane among predicates, which greatly hinders real application models. To address intrinsic bias above, we propose novel framework...

10.1609/aaai.v38i4.28098 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24
Coming Soon ...