- Advanced Neural Network Applications
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Speech and dialogue systems
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Robotics and Sensor-Based Localization
- Distributed Control Multi-Agent Systems
- Video Surveillance and Tracking Methods
- Robotic Path Planning Algorithms
- Human Pose and Action Recognition
- AI in Service Interactions
- Stochastic Gradient Optimization Techniques
- Visual Attention and Saliency Detection
- Advanced Graph Neural Networks
- Sparse and Compressive Sensing Techniques
- Brain Tumor Detection and Classification
- Advanced Vision and Imaging
- Optical Network Technologies
- Quantum Information and Cryptography
- Machine Learning and Data Classification
- Anomaly Detection Techniques and Applications
- Software Testing and Debugging Techniques
- Digital Media Forensic Detection
Hohai University
2024-2025
Beijing Institute of Technology
2022-2025
Beihang University
2022-2023
Chongqing University of Technology
2023
China XD Group (China)
2023
Electric Power Research Institute
2023
QuantumCTek (China)
2023
Anhui University
2023
Megvii (China)
2022
Taizhou University
2022
Existing pose estimation approaches fall into two categories: single-stage and multi-stage methods. While methods are seemingly more suited for the task, their performance in current practice is not as good This work studies this issue. We argue that methods' unsatisfactory comes from insufficiency various design choices. propose several improvements, including module design, cross stage feature aggregation, coarse-to-fine supervision. The resulting method establishes new state-of-the-art on...
In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify representations in the voxel space accurate and robust single- or cross-modality detection. To end, modality-specific is first designed represent different inputs feature space. Different from previous our approach preserves without height compression alleviate semantic ambiguity enable spatial connections. make full use of sensors, interaction then proposed,...
In state-of-the-art image retrieval systems, an is represented by a bag of visual words obtained quantizing high-dimensional local descriptors, and scalable schemes inspired text are then applied for large scale indexing retrieval. Bag-of-words representations, however: 1) reduce the discriminative power features due to feature quantization; 2) ignore geometric relationships among words. Exploiting such constraints, estimating 2D affine transformation between query each candidate image, has...
Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on understanding and generation tasks while neglecting the exploitation of policy. In this paper, we propose GALAXY, a novel pre-trained model that explicitly learns policy from limited labeled dialogs large-scale unlabeled corpora via semi-supervised learning. Specifically, introduce act prediction task for optimization during employ consistency...
Digital twins are propelling the next generation of industrial revolution and serve as a key technology in enabling intelligent water conservancy. However, due to diversity objects within conservancy scenarios complexity related factors, research application digital field remain immature. There still significant challenges constructing fine‐grained, high‐fidelity twin for their corresponding scenarios. In this context, taking polder areas subjects, area system is proposed, which includes...
The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step achieve this goal is schema linking, i.e., properly recognizing mentions unseen columns or tables when generating SQLs. In work, we propose novel framework elicit relational structures from large-scale pre-trained language models (PLMs) via probing procedure based on Poincaré distance metric, use the induced relations augment current graph-based for better...
Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL to convert a natural language (NL) question its corresponding structured query (SQL) based on the evidences provided by relational databases. Early systems from database community achieved noticeable progress with cost heavy human engineering user interactions systems. In recent years, deep neural networks have significantly advanced this task generation models, which automatically learn mapping function input...
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances vision transformers (ViTs), this paper, we demonstrate that using a few kernels instead of stack small could be more powerful paradigm. suggested five guidelines, e.g., applying re-parameterized depth-wise convolutions, to efficient high-performance large-kernel CNNs. Following the propose RepLKNet, pure CNN architecture whose size is as 31x31, contrast commonly used 3x3. RepLKNet...
Yingxiu Zhao, Zhiliang Tian, Huaxiu Yao, Yinhe Zheng, Dongkyu Lee, Yiping Song, Jian Sun, Nevin Zhang. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.
In this paper, we propose a novel query design for the transformer-based object detection. previous detectors, queries are set of learned embeddings. However, each embedding does not have an explicit physical meaning and cannot explain where it will focus on. It is difficult to optimize as prediction slot specific mode. other words, on region. To solved these problems, in our design, based anchor points, which widely used CNN-based detectors. So focuses objects near point. Moreover, can...
In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes information previous frames to boost object detection. More specifically, extend position embedding (3D PE) in PETR modeling. The PE achieves alignment different frames. A feature-guided encoder is further introduced improve data adaptability PE. To support multi-task learning (e.g., BEV segmentation and lane...
This paper introduces Doc2Bot, a novel dataset for building machines that help users seek information via conversations. is of particular interest companies and organizations own large number manuals or instruction books. Despite its potential, the nature our task poses several challenges: (1) documents contain various structures hinder ability to comprehend, (2) user needs are often underspecified. Compared prior datasets either focus on single structural type overlook role questioning...
In this paper, we propose a novel approach to extract mattes using pair of flash/no-flash images. Our approach, which call flash matting, was inspired by the simple observation that most noticeable difference between and no-flash images is foreground object if background scene sufficiently distant. We apply new matting algorithm called joint Bayesian robustly recover matte from images, even for scenes in are similar or complex. Experimental results involving variety complex indoors outdoors...
In this paper, we propose a simple but effective image prior - dark channel to remove haze from single input image. The is kind of statistics the haze-free outdoor images. It based on key observation most local patches in images contain some pixels which have very low intensities at least one color channel. Using with imaging model, can directly estimate thickness and recover high quality Results variety demonstrate power proposed prior. Moreover, depth map also be obtained as by-product removal.
We have developed a simple time-bin phase encoding quantum key distribution system, using the optical injection locking technique. This setup incorporates both merits of simplicity and stability in encoding, immunity to channel disturbance. demonstrated field implementation over long-distance deployed aerial fiber automatically. During 70-day test, we achieved approximately 1.0 kbps secure rate with stable performance. Our work takes an important step toward widespread QKD systems diverse...
Although there have been significant advances in the field of image restoration recently, system complexity state-of-the-art (SOTA) methods is increasing as well, which may hinder convenient analysis and comparison methods. In this paper, we propose a simple baseline that exceeds SOTA computationally efficient. To further simplify baseline, reveal nonlinear activation functions, e.g. Sigmoid, ReLU, GELU, Softmax, etc. are not necessary: they could be replaced by multiplication or removed....
Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing use only a single task-specific token control their models. This scheme usually not strong enough constrain model due insufficient information involved. In this paper, we propose novel method, prompt conditioned VAE lifelong...
Expression recognition has been an important research direction in the field of psychology, which can be used traffic, medical, security, and criminal investigation by expressing human feelings through muscles corners mouth, eyes, face. Most existing work uses convolutional neural networks (CNN) to recognize face images thus classify expressions, does achieve good results, but CNN do not have enough ability extract global features. The Transformer advantages for feature extraction, is more...
Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either understanding or generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised conversation model learning from large-scale corpora with limited annotations, which can be effectively fine-tuned wide range of downstream tasks. Specifically, SPACE-3 consists four successive components single...