- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Computer Graphics and Visualization Techniques
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Natural Language Processing Techniques
- Topic Modeling
- Video Analysis and Summarization
- Image Retrieval and Classification Techniques
- Image and Signal Denoising Methods
- Domain Adaptation and Few-Shot Learning
- Speech and dialogue systems
- Aesthetic Perception and Analysis
- Human Motion and Animation
- Handwritten Text Recognition Techniques
- Advanced Vision and Imaging
- Sentiment Analysis and Opinion Mining
- Video Surveillance and Tracking Methods
- Seismic Imaging and Inversion Techniques
- Advanced Technologies in Various Fields
- Hate Speech and Cyberbullying Detection
- 3D Surveying and Cultural Heritage
- Anomaly Detection Techniques and Applications
- Drilling and Well Engineering
- 3D Shape Modeling and Analysis
Jingdong (China)
2020-2025
Meizu (China)
2025
JDSU (United States)
2024
China University of Petroleum, East China
2020-2021
Tohoku University
2007
Yiwei Wei, Shaozu Yuan, Ruosong Yang, Lei Shen, Zhangmeizhi Li, Longbiao Wang, Meng Chen. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.
The significance of visual emotion distribution learning (VEDL) has surged, particularly with the growing inclination to convey emotions through images. key VEDL lies in capturing both low- and high-level features within same content, thus promoting model for salient subtle awareness. To learn involved images, most previous works coarse semantic knowledge unbiased filtering. Consequently, they focus on entire scene suffer from redundancy semantic-irrelevant information, which diminishes...
Human conversations are complicated and building a human-like dialogue agent is an extremely challenging task. With the rapid development of deep learning techniques, data-driven models become more prevalent which need huge amount real conversation data. In this paper, we construct large-scale scenario Chinese E-commerce corpus, JDDC, with than 1 million multi-turn dialogues, 20 utterances, 150 words. The dataset reflects several characteristics human-human conversations, e.g., goal-driven,...
Previous works on font generation mainly focus the standard print fonts where character's shape is stable and strokes are clearly separated. There rare research brush hand-writing generation, which involves holistic structure changes complex transfer. To address this issue, we propose a novel GAN-based image translation model by integrating skeleton information. We first extract from training images, then design an encoder to corresponding features. A self-attentive refined attention module...
Multimodal sarcasm detection, aiming to detect the ironic sentiment within multimodal social data, has gained substantial popularity in both natural language processing and computer vision communities. Recently, graph-based studies by drawing sentimental relations have made notable advancements. However, they neglected exploiting global semantic congruity from existing instances facilitate prediction, which ultimately hinders model's performance. In this paper, we introduce a new inference...
In this paper, we propose a novel bicubic method for digital image interpolation. Since the conventional does not consider local features, interpolated images obtained by often have blurring problem. proposed adopts both asymmetry features and gradient of an in interpolation processing. Experimental results show that can obtain high accuracy images.
Image caption based on reinforcement learning (RL) methods has achieved significant success recently. Most of these take CIDEr score as the reward algorithm to compute gradients, thus refining image baseline model. However, is not sole criterion judge quality a generated caption. In this paper, Hierarchical Attention Fusion (HAF) model presented for RL, where multi-level feature maps Resnet are integrated with hierarchical attention. Revaluation network (REN) exploited revaluating by...
Emotion plays a critical role in calligraphy composition, which makes the artwork impressive and have soul. However, previous research on generation all neglected emotion as major contributor to artistry of calligraphy. Such defects prevent them from generating aesthetic, stylistic, diverse artworks, but only static handwriting font library instead. To address this problem, we propose novel cross-modal approach generate stylistic Chinese driven by different emotions automatically. We firstly...
In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image generation, Polaca takes the classic poetry input and outputs artistic image corresponding It is equipped three different modules complete whole piece of artwork: first one text-to-image module image, second an stylistic calligraphy third fusion fuse two images into aesthetic artwork.
Applying deformable transformer for dense video captioning has achieved great success recently. However, only explores local-perspective perception by attending to a small set of key sampling points, which will make the decoder short-sighted and generate semantically incoherent contradictory captions long video. In this paper, we propose novel Multi-Perspective Perception Network improve problem. We first introduce hierarchical temporal-spatial summary method global-perspective context each...
Recently, Visual Question Answering(VQA), which is required to generate the answer by understanding both visual and textual content, has attracted considerable research interest. Most existing works extract features with CNN network learn its feature embedding an attention mechanism. However, this mechanism may ignore interaction between entities in image, a fuzzy impact on generation. To better explore relationship different novel Nested Attention Network Graph Filtering (NANGF) proposed....
Supplementing product attribute information is a critical step for E-commerce platforms, which further benefits various downstream tasks, including recommendation, search, and knowledge graph construction. Intuitively, the visual available on e-commerce platforms can effectively function as primary source certain attributes. However, existing works either extract values solely from textual descriptions or leverage limited (e.g., image features optical character recognition tokens) to assist...
In this paper, we present a novel system (denoted as Polaca) to generate poetic Chinese landscape painting with calligraphy. Unlike previous single image-to-image generation, Polaca takes the classic poetry input and outputs artistic image corresponding It is equipped three different modules complete whole piece of artwork: first one text-to-image module image, second an stylistic calligraphy third fusion fuse two images into aesthetic artwork.
We present a novel Chinese calligraphy artwork composition system (MaLiang) which can generate aesthetic, stylistic and diverse images based on the emotion status from input text. Different previous research, it's first work to endow synthesis with ability express fickle emotions composite whole piece of discourse-level instead single character images. The consists three modules: detection, image generation, layout prediction. As creative form interactive art, MaLiang has been exhibited in...
In this paper, we use a modified Gaussian filter to improve enlargement accuracy of the arbitrary scale LP method, which is based on Laplacian pyramid representation (so called “LP method”). The parameters proposed algorithm are extracted through theoretical analysis and an experimental estimation. Experimental results show that effective for method.
Sketch plays a critical role in the human art creation process. As one of functions sketch, text-to-sketch may help artists to catch fleeting inspirations efficiently. Different from traditional text2image tasks, sketches consist only set sparse lines and depend on very strict edge information, which requires model understand text descriptions accurately control shape texture fine-grained granularity. However, there was rare previous research challenging text2sketch task. In this paper, we...