Jian Sun

ORCID: 0009-0006-9443-4046
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Topic Modeling
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Speech and dialogue systems
  • Natural Language Processing Techniques
  • Multimodal Machine Learning Applications
  • Robotics and Sensor-Based Localization
  • Distributed Control Multi-Agent Systems
  • Video Surveillance and Tracking Methods
  • Robotic Path Planning Algorithms
  • Human Pose and Action Recognition
  • AI in Service Interactions
  • Stochastic Gradient Optimization Techniques
  • Visual Attention and Saliency Detection
  • Advanced Graph Neural Networks
  • Sparse and Compressive Sensing Techniques
  • Brain Tumor Detection and Classification
  • Advanced Vision and Imaging
  • Optical Network Technologies
  • Quantum Information and Cryptography
  • Machine Learning and Data Classification
  • Anomaly Detection Techniques and Applications
  • Software Testing and Debugging Techniques
  • Digital Media Forensic Detection

Hohai University
2024-2025

Beijing Institute of Technology
2022-2025

Beihang University
2022-2023

Chongqing University of Technology
2023

China XD Group (China)
2023

Electric Power Research Institute
2023

QuantumCTek (China)
2023

Anhui University
2023

Megvii (China)
2022

Taizhou University
2022

Existing pose estimation approaches fall into two categories: single-stage and multi-stage methods. While methods are seemingly more suited for the task, their performance in current practice is not as good This work studies this issue. We argue that methods' unsatisfactory comes from insufficiency various design choices. propose several improvements, including module design, cross stage feature aggregation, coarse-to-fine supervision. The resulting method establishes new state-of-the-art on...

10.48550/arxiv.1901.00148 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this work, we present a unified framework for multi-modality 3D object detection, named UVTR. The proposed method aims to unify representations in the voxel space accurate and robust single- or cross-modality detection. To end, modality-specific is first designed represent different inputs feature space. Different from previous our approach preserves without height compression alleviate semantic ambiguity enable spatial connections. make full use of sensors, interaction then proposed,...

10.48550/arxiv.2206.00630 preprint EN other-oa arXiv (Cornell University) 2022-01-01

In state-of-the-art image retrieval systems, an is represented by a bag of visual words obtained quantizing high-dimensional local descriptors, and scalable schemes inspired text are then applied for large scale indexing retrieval. Bag-of-words representations, however: 1) reduce the discriminative power features due to feature quantization; 2) ignore geometric relationships among words. Exploiting such constraints, estimating 2D affine transformation between query each candidate image, has...

10.1109/cvprw.2009.5206566 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

Pre-trained models have proved to be powerful in enhancing task-oriented dialog systems. However, current pre-training methods mainly focus on understanding and generation tasks while neglecting the exploitation of policy. In this paper, we propose GALAXY, a novel pre-trained model that explicitly learns policy from limited labeled dialogs large-scale unlabeled corpora via semi-supervised learning. Specifically, introduce act prediction task for optimization during employ consistency...

10.48550/arxiv.2111.14592 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Digital twins are propelling the next generation of industrial revolution and serve as a key technology in enabling intelligent water conservancy. However, due to diversity objects within conservancy scenarios complexity related factors, research application digital field remain immature. There still significant challenges constructing fine‐grained, high‐fidelity twin for their corresponding scenarios. In this context, taking polder areas subjects, area system is proposed, which includes...

10.1155/int/8899669 article EN cc-by International Journal of Intelligent Systems 2025-01-01

The importance of building text-to-SQL parsers which can be applied to new databases has long been acknowledged, and a critical step achieve this goal is schema linking, i.e., properly recognizing mentions unseen columns or tables when generating SQLs. In work, we propose novel framework elicit relational structures from large-scale pre-trained language models (PLMs) via probing procedure based on Poincaré distance metric, use the induced relations augment current graph-based for better...

10.1145/3534678.3539305 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Text-to-SQL parsing is an essential and challenging task. The goal of text-to-SQL to convert a natural language (NL) question its corresponding structured query (SQL) based on the evidences provided by relational databases. Early systems from database community achieved noticeable progress with cost heavy human engineering user interactions systems. In recent years, deep neural networks have significantly advanced this task generation models, which automatically learn mapping function input...

10.48550/arxiv.2208.13629 preprint EN other-oa arXiv (Cornell University) 2022-01-01

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by recent advances vision transformers (ViTs), this paper, we demonstrate that using a few kernels instead of stack small could be more powerful paradigm. suggested five guidelines, e.g., applying re-parameterized depth-wise convolutions, to efficient high-performance large-kernel CNNs. Following the propose RepLKNet, pure CNN architecture whose size is as 31x31, contrast commonly used 3x3. RepLKNet...

10.48550/arxiv.2203.06717 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Yingxiu Zhao, Zhiliang Tian, Huaxiu Yao, Yinhe Zheng, Dongkyu Lee, Yiping Song, Jian Sun, Nevin Zhang. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.44 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

In this paper, we propose a novel query design for the transformer-based object detection. previous detectors, queries are set of learned embeddings. However, each embedding does not have an explicit physical meaning and cannot explain where it will focus on. It is difficult to optimize as prediction slot specific mode. other words, on region. To solved these problems, in our design, based anchor points, which widely used CNN-based detectors. So focuses objects near point. Moreover, can...

10.48550/arxiv.2109.07107 preprint EN other-oa arXiv (Cornell University) 2021-01-01

In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes information previous frames to boost object detection. More specifically, extend position embedding (3D PE) in PETR modeling. The PE achieves alignment different frames. A feature-guided encoder is further introduced improve data adaptability PE. To support multi-task learning (e.g., BEV segmentation and lane...

10.48550/arxiv.2206.01256 preprint EN other-oa arXiv (Cornell University) 2022-01-01

10.1016/j.jfranklin.2023.02.036 article EN Journal of the Franklin Institute 2023-03-10

This paper introduces Doc2Bot, a novel dataset for building machines that help users seek information via conversations. is of particular interest companies and organizations own large number manuals or instruction books. Despite its potential, the nature our task poses several challenges: (1) documents contain various structures hinder ability to comprehend, (2) user needs are often underspecified. Compared prior datasets either focus on single structural type overlook role questioning...

10.18653/v1/2022.findings-emnlp.131 article EN cc-by 2022-01-01

In this paper, we propose a novel approach to extract mattes using pair of flash/no-flash images. Our approach, which call flash matting, was inspired by the simple observation that most noticeable difference between and no-flash images is foreground object if background scene sufficiently distant. We apply new matting algorithm called joint Bayesian robustly recover matte from images, even for scenes in are similar or complex. Experimental results involving variety complex indoors outdoors...

10.1145/1179352.1141954 article EN 2006-01-01

In this paper, we propose a simple but effective image prior - dark channel to remove haze from single input image. The is kind of statistics the haze-free outdoor images. It based on key observation most local patches in images contain some pixels which have very low intensities at least one color channel. Using with imaging model, can directly estimate thickness and recover high quality Results variety demonstrate power proposed prior. Moreover, depth map also be obtained as by-product removal.

10.1109/cvprw.2009.5206515 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2009-06-01

We have developed a simple time-bin phase encoding quantum key distribution system, using the optical injection locking technique. This setup incorporates both merits of simplicity and stability in encoding, immunity to channel disturbance. demonstrated field implementation over long-distance deployed aerial fiber automatically. During 70-day test, we achieved approximately 1.0 kbps secure rate with stable performance. Our work takes an important step toward widespread QKD systems diverse...

10.1364/oe.494318 article EN cc-by Optics Express 2023-07-04

Although there have been significant advances in the field of image restoration recently, system complexity state-of-the-art (SOTA) methods is increasing as well, which may hinder convenient analysis and comparison methods. In this paper, we propose a simple baseline that exceeds SOTA computationally efficient. To further simplify baseline, reveal nonlinear activation functions, e.g. Sigmoid, ReLU, GELU, Softmax, etc. are not necessary: they could be replaced by multiplication or removed....

10.48550/arxiv.2204.04676 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing use only a single task-specific token control their models. This scheme usually not strong enough constrain model due insufficient information involved. In this paper, we propose novel method, prompt conditioned VAE lifelong...

10.18653/v1/2022.emnlp-main.766 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

Expression recognition has been an important research direction in the field of psychology, which can be used traffic, medical, security, and criminal investigation by expressing human feelings through muscles corners mouth, eyes, face. Most existing work uses convolutional neural networks (CNN) to recognize face images thus classify expressions, does achieve good results, but CNN do not have enough ability extract global features. The Transformer advantages for feature extraction, is more...

10.3934/mfc.2022018 article EN Mathematical Foundations of Computing 2022-07-04

Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either understanding or generation, but not both. In this paper, we propose SPACE-3, a novel unified semi-supervised conversation model learning from large-scale corpora with limited annotations, which can be effectively fine-tuned wide range of downstream tasks. Specifically, SPACE-3 consists four successive components single...

10.48550/arxiv.2209.06664 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...