- Multimodal Machine Learning Applications
- Sentiment Analysis and Opinion Mining
- Human Pose and Action Recognition
- Video Surveillance and Tracking Methods
- Topic Modeling
- Image Retrieval and Classification Techniques
- Advanced Image and Video Retrieval Techniques
- Natural Language Processing Techniques
- Visual Attention and Saliency Detection
- Domain Adaptation and Few-Shot Learning
- Video Analysis and Summarization
- Complex Network Analysis Techniques
- Text and Document Classification Technologies
- Anomaly Detection Techniques and Applications
- Opinion Dynamics and Social Influence
- Machine Learning in Healthcare
- Misinformation and Its Impacts
- Media Influence and Health
- Digital Marketing and Social Media
- Generative Adversarial Networks and Image Synthesis
- Gait Recognition and Analysis
- Human-Animal Interaction Studies
- Privacy-Preserving Technologies in Data
- Face and Expression Recognition
- Text Readability and Simplification
Bellevue Hospital Center
2023-2025
CE Technologies (United Kingdom)
2024
Microsoft Research (United Kingdom)
2020-2023
Microsoft (United States)
2018-2022
University of Rochester
2013-2021
Dalian University of Technology
2011
Automatically generating a natural language description of an image has attracted interests recently both because its importance in practical applications and it connects two major artificial intelligence fields: computer vision processing. Existing approaches are either top-down, which start from gist convert into words, or bottom-up, come up with words describing various aspects then combine them. In this paper, we propose new algorithm that combines through model semantic attention. Our...
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment to develop systems predict political elections, measure economic indicators, and so on. Recently, users are increasingly using images videos express their opinions share experiences. such large scale visual can help better extract sentiments toward events or topics, as those in image tweets, that prediction from complementary analysis....
Psychological research results have confirmed that people can different emotional reactions to visual stimuli. Several papers been published on the problem of emotion analysis. In particular, attempts made analyze and predict people's reaction towards images. To this end, kinds hand-tuned features are proposed. The reported several carefully selected labeled small image data sets promise such features. While recent successes many computer vision related tasks due adoption Convolutional...
Predicting the future health information of patients from historical Electronic Health Records (EHR) is a core research task in development personalized healthcare. Patient EHR data consist sequences visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this are to model temporality high dimensionality sequential interpret prediction results. Existing work solves problem by employing...
Automatically generating a natural language description of an image has attracted interests recently both because its importance in practical applications and it connects two major artificial intelligence fields: computer vision processing. Existing approaches are either top-down, which start from gist convert into words, or bottom-up, come up with words describing various aspects then combine them. In this paper, we propose new algorithm that combines through model semantic attention. Our...
Visual content analysis has always been important yet challenging. Thanks to the popularity of social networks, images become an convenient carrier for information diffusion among online users. To understand patterns and different aspects images, we need interpret first. Similar textual content, also carry levels sentiment their viewers. However, from text, where can use easily accessible semantic context information, how extract image remains quite In this paper, propose prediction...
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment to develop systems predict political elections, measure economic indicators, and so on. Recently, users are increasingly using images videos express their opinions share experiences. such large scale visual can help better extract sentiments toward events or topics, as those in image tweets, that prediction from complementary analysis....
The goal of diagnosis prediction task is to predict the future health information patients from their historical Electronic Healthcare Records (EHR). most important and challenging problem design an accurate, robust interpretable predictive model. Existing work solves this by employing recurrent neural networks (RNNs) with attention mechanisms, but these approaches suffer data sufficiency problem. To obtain good performance insufficient data, graph-based models are proposed. However, when...
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment to develop systems predict political elections, measure economic indicators, and so on. Recently, users are increasingly using additional images videos express their opinions share experiences. such large-scale visual can help better extract sentiments toward events or topics. Motivated by the needs leverage multimedia analysis, we...
Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of objects. In this paper, we propose TransMOT, which leverages powerful graph transformers to efficiently model spatial and temporal among TransMOT is capable effectively a large number by arranging trajectories tracked targets detection candidates as set sparse weighted graphs, constructing transformer encoder layer, decoder layer based graphs. Through end-to-end learning, can exploit clues directly...
Visual sentiment analysis, which studies the emotional response of humans on visual stimuli such as images and videos, has been an interesting challenging problem. It tries to understand high-level content data. The success current models can be attributed development robust algorithms from computer vision. Most existing try solve problem by proposing either features or more complex models. In particular, whole image video are main proposed inputs. Little attention paid local areas, we...
Predicting the risk of potential diseases from Electronic Health Records (EHR) has attracted considerable attention in recent years, especially with development deep learning techniques. Compared traditional machine models, based approaches achieve superior performance on prediction task. However, none existing work explicitly takes prior medical knowledge (such as relationships between and corresponding factors) into account. In domain, is usually represented by discrete arbitrary rules....
Chatbot has become an important solution to rapidly increasing customer care demands on social media in recent years. However, current work chatbot for ignores a key impact user experience - tones. In this work, we create novel tone-aware that generates toned responses requests media. We first conduct formative research, which the effects of tones are studied. Significant and various influences different uncovered study. With knowledge tones, design deep learning based takes tone information...
Sentiment analysis is crucial for extracting social signals from media content. Due to huge variation in media, the performance of sentiment classifiers using single modality (visual or textual) still lags behind satisfaction. In this paper, we propose a new framework that integrates textual and visual information robust analysis. Different previous work, believe should be treated jointly structural fashion. Our system first builds semantic tree structure based on sentence parsing, aimed at...
Improving the generalization ability of Deep Neural Networks (DNNs) is critical for their practical uses, which has been a longstanding challenge. Some theoretical studies have uncovered that DNNs preferences some frequency components in learning process and indicated this may affect robustness learned features. In paper, we propose Frequency Filtering (DFF)for domain-generalizable features, first endeavour to explicitly modulate different transfer difficulties across domains latent space...
Strong Artificial Intelligence (Strong AI) or General (AGI) with abstract reasoning ability is the goal of next-generation AI. Recent advancements in Large Language Models (LLMs), along emerging field Multimodal (MLLMs), have demonstrated impressive capabilities across a wide range multimodal tasks and applications. Particularly, various MLLMs, each distinct model architectures, training data, stages, been evaluated broad MLLM benchmarks. These studies have, to varying degrees, revealed...
Sentiment analysis of online user generated content is important for many social media analytics tasks. Researchers have largely relied on textual sentiment to develop systems predict political elections, measure economic indicators, and so on. Recently, users are increasingly using additional images videos express their opinions share experiences. such large-scale visual can help better extract sentiments toward events or topics. Motivated by the needs leverage multimedia analysis, we...
Psychological research results have confirmed that people can different emotional reactions to visual stimuli. Several papers been published on the problem of emotion analysis. In particular, attempts made analyze and predict people's reaction towards images. To this end, kinds hand-tuned features are proposed. The reported several carefully selected labeled small image data sets promise such features. While recent successes many computer vision related tasks due adoption Convolutional...
Real estate appraisal, which is the process of estimating price for real properties, crucial both buys and sellers as basis negotiation transaction. Traditionally, repeat sales model has been widely adopted to estimate price. However, it depends design calculation a complex economic related index, challenging accurately. Today, brokers provide easy access detailed online information on properties their clients. We are interested in from these large amounts easily accessed data. In...
Sentiment analysis on large-scale social media data is important to bridge the gaps between contents and real world activities including political election prediction, individual public emotional status monitoring analysis, so on. Although textual sentiment has been well studied based platforms such as Twitter Instagram, of role extensive emoji uses in remains light. In this paper, we propose a novel scheme for with extra attention emojis. We first learn bi-sense embeddings under positive...
Unsupervised domain adaptive person re-identification (ReID) has been extensively investigated to mitigate the adverse effects of gaps. Those works assume target data can be accessible all at once. However, for real-world streaming data, this hinders timely adaptation changing statistics and sufficient exploitation increasing samples. In paper, address more practical scenarios, we propose a new task, Lifelong Un-supervised Domain Adaptive (LUDA) ReID. This is challenging because it requires...
Initializing an effective dictionary is indispensable step for sparse representation. In this paper, we focus on the selection problem with objective to select a compact subset of basis from original training data instead learning new matrix as models do. We first design model via l2,0 norm. For optimization, propose two methods: one standard forward-backward greedy algorithm, which not suitable large-scale problems; other based gradient cues at each forward iteration and speeds up process...
Identifying user attributes from their social media activities has been an active research topic. The ability to predict such as age, gender, and interests is essential for personalization recommender systems. Most of the techniques proposed this purpose utilize textual content created by a user, while multimedia gained popularity in networks. In paper, we propose novel algorithm infer user's gender using images posted on different
Automatic image captioning has recently approached human-level performance due to the latest advances in computer vision and natural language understanding. However, most of current models can only generate plain factual descriptions about content a given image. for human beings, caption writing is quite flexible diverse, where additional dimensions, such as emotion, humor styles, are often incorporated produce emotional, or appealing captions. In particular, we interested generating...