- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- VLSI and FPGA Design Techniques
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
- Embedded Systems Design Techniques
- VLSI and Analog Circuit Testing
- Advanced Vision and Imaging
- Low-power high-performance VLSI design
- Generative Adversarial Networks and Image Synthesis
- Anomaly Detection Techniques and Applications
- Image Retrieval and Classification Techniques
- Handwritten Text Recognition Techniques
- Image Enhancement Techniques
- Speech Recognition and Synthesis
- Advanced Computational Techniques and Applications
- Music and Audio Processing
- Image and Video Quality Assessment
- Interconnection Networks and Systems
- Advanced Neural Network Applications
- 3D IC and TSV technologies
- Robotic Path Planning Algorithms
- Rough Sets and Fuzzy Logic
- Sentiment Analysis and Opinion Mining
Tianjin University
2016-2025
Wuhan University of Technology
2013-2024
Hangzhou Dianzi University
2024
General Hospital of Shenyang Military Region
2024
Adobe Systems (United States)
2019-2024
Zhejiang University
2024
Shanghai Jiao Tong University
2019-2024
Academy of Art University
2024
Ningbo University
2024
Beijing Institute of Technology
2014-2023
Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground background distributions. In this paper, we present novel deep-learning-based algorithm which much better understanding of objectness can reduce just few clicks. Our transforms user-provided positive negative clicks into two Euclidean distance maps are then concatenated with RGB channels images compose (image,...
Internet of Vehicles (IoV), when empowered by aerial communications, provides vehicles with seamless connections and proximate computing services. The unpredictable network dynamics aerial-assisted IoV pose challenges to the resource allocation. In this article, dynamic digital twin (DT) is established capture time-varying supply demands, so that unified scheduling allocation can be performed. We design a two-stage incentive mechanism for based on Stackelberg game where DT or road side units...
Image captioning is one of the primary goals in computer vision which aims to automatically generate natural descriptions for images. Intuitively, human visual system can notice some stimulating regions at first glance, and then volitionally focus on interesting objects within region. For example, a free-form sentence about "boy-catch-baseball", region involving "boy" "baseball" could be attended guide salient object discovery word-by-word generation. Till now, previous works mainly rely...
An algorithm based on independent component analysis (ICA) is introduced for P300 detection. After ICA decomposition, P300-related components are selected according to the a priori knowledge of spatio-temporal pattern, and clear peak reconstructed by back projection ICA. Applied dataset IIb BCI Competition 2003, achieved an accuracy 100% in detection within five repetitions.
Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches. However, it challenging to fully leverage different due practical challenges such as varying levels noise and conflicts between modalities. Existing methods do not adopt a joint approach capturing synergies while simultaneously filtering resolving on per sample basis. In this work we propose novel deep neural network based technique that...
Dense video captioning is an extremely challenging task since accurate and coherent description of events in a requires holistic understanding contents as well contextual reasoning individual events. Most existing approaches handle this problem by first detecting event proposals from then on subset the proposals. As result, generated sentences are prone to be redundant or inconsistent they fail consider temporal dependency between To tackle challenge, we propose novel dense framework, which...
The air-ground network provides users with seamless connections and real-time services, while its resource constraint triggers a paradigm shift from machine learning to federated learning. Federated enables clients collaboratively train models without sharing data. Digital twins provide virtual representation of the networks reflect time-varying status, which in combination reconcile conflict between privacy protection data training networks. In this paper, we consider dynamic digital twin...
Most previous bounding-box-based segmentation methods assume the bounding box tightly covers object of interest. However it is common that a rectangle input could be too large or small. In this paper, we propose novel approach uses as soft constraint by transforming into an Euclidean distance map. A convolutional encoder-decoder network trained end-to-end concatenating images with these maps inputs and predicting masks outputs. Our gets accurate results given sloppy rectangles while being...
Recent progress in using recurrent neural networks (RNNs) for video description has attracted an increasing interest, due to its capability encode a sequence of frames caption generation. While existing methods have studied various features (e.g., CNN, 3D and semantic attributes) visual encoding, the representation fusion heterogeneous information from multi-modal spaces not fully explored. Consider that different modalities are often asynchronous, frame-level concatenation linear fusion)...
Human action recognition is an active research area in both computer vision and machine learning communities. In the past decades, problem has evolved from conventional single-view problem, to cross-view learning, cross-domain multitask where a large number of algorithms have been proposed literature. Despite having datasets, most them are designed for subset four problems, comparisons between can further limited by variances within experimental configurations, other factors. To best our...
Different from the fully-supervised action detection problem that is dependent on expensive frame-level annotations, weakly supervised (WSAD) only needs video-level making it more practical for real-world applications. Existing WSAD methods detect instances by scoring each video segment (a stack of frames) individually. Most them fail to model temporal relations among segments and cannot effectively characterize possessing latent structure. To alleviate this in WSAD, we propose structure...
Image captioning is one of the most challenging tasks in AI because it requires an understanding both complex visuals and natural language. Because image essentially a sequential prediction task, recent advances have used reinforcement learning (RL) to better explore dynamics word-by-word generation. However, existing RL-based methods rely primarily on single policy network reward function-an approach that not well matched multi-level (word sentence) multi-modal (vision language) nature...
Image captioning aims at understanding various semantic concepts (e.g., objects and relationships) from an image integrating them in a sentence-level description. Hence, it is necessary to learn the interaction among these concepts. If we define context of be involved <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subject-predicate-object</i> triplet, most current methods only focus on single triplet for first-order generate sentences....
Domain-invariant (view-invariant and modality-invariant) feature representation is essential for human action recognition. Moreover, given a discriminative visual representation, it critical to discover the latent correlations among multiple actions in order facilitate modeling. To address these problems, we propose multi-domain multi-task learning (MDMTL) method to: 1) extract domain-invariant information multi-view multi-modal 2) explore relatedness categories. Specifically, present sparse...
Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the classes. Conventional methods, however, sequentially distill knowledge only from last model, leading to degradation classes later incremental steps. In this paper, we propose a multi-model and multi-level strategy. Instead of distilling directly leverage all previous model snapshots. addition, incorporate an auxiliary...
Knowledge-based Visual Question Answering (KB-VQA) aims to answer the image-aware question via external knowledge, which requires an agent not only understand images but also explicitly retrieve and integrate knowledge facts. Intuitively, accurately question, we humans can validate retrieved based on our memory, then align facts with image regions infer answers. However, most existing methods ignore process of validation alignment. In this paper, propose Multi-Modal Validation Domain...
Understanding the structures of oxygen vacancies in bulk ceria is crucial as they significantly impact material's catalytic and electronic properties. The complex interaction between Ce3+ ions presents challenges characterizing ceria's defect chemistry. We introduced a machine learning-assisted cluster-expansion model to predict energetics defective configurations accurately within ceria. This effectively samples configurational spaces, detailing vacancy across different temperatures...
Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground background distributions. In this paper, we present novel deep learning based algorithm which much better understanding of objectness thus can reduce just few clicks. Our transforms provided positive negative clicks into two Euclidean distance maps are then concatenated with RGB channels images compose (image,...