Jiajun Deng

ORCID: 0000-0001-9624-7451
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Advanced Image and Video Retrieval Techniques
  • Robotics and Sensor-Based Localization
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • 2D Materials and Applications
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Handwritten Text Recognition Techniques
  • Visual Attention and Saliency Detection
  • Graphene research and applications
  • 3D Surveying and Cultural Heritage
  • Remote-Sensing Image Classification
  • Rock Mechanics and Modeling
  • MXene and MAX Phase Materials
  • Optical Wireless Communication Technologies
  • Advanced Image Processing Techniques
  • Image Processing Techniques and Applications
  • Digital Media Forensic Detection
  • Industrial Vision Systems and Defect Detection
  • Ga2O3 and related materials
  • Advanced Photocatalysis Techniques
  • Medical Image Segmentation Techniques
  • Advanced Optical Sensing Technologies
  • Grouting, Rheology, and Soil Mechanics

Sun Yat-sen University
2025

University of Science and Technology of China
2018-2024

Tongji University
2022-2024

Australian Centre for Robotic Vision
2023-2024

The University of Adelaide
2023-2024

National University of Defense Technology
2024

North China Electric Power University
2018-2024

The University of Sydney
2022-2024

Guizhou University
2024

Shanghai University
2024

Recent advances on 3D object detection heavily rely how the data are represented, i.e., voxel-based or point-based representation. Many existing high performance detectors because this structure can better retain precise point positions. Nevertheless, point-level features lead to computation overheads due unordered storage. In contrast, is suited for feature extraction but often yields lower accuracy input divided into grids. paper, we take a slightly different viewpoint --- find that...

10.1609/aaai.v35i2.16207 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Abstract 3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for on point clouds. First, a novel detector, PV-RCNN, which boosts the performance by deeply integrating feature learning of point-based set abstraction voxel-based sparse convolution through two steps, i.e. , voxel-to-keypoint scene encoding keypoint-to-grid...

10.1007/s11263-022-01710-9 article EN cc-by International Journal of Computer Vision 2022-11-24

In this paper, we present a neat yet effective transformer-based framework for visual grounding, namely TransVG, to address the task of grounding language query corresponding region onto an image. The state-of-the-art methods, including two-stage or one-stage ones, rely on complex module with manually-designed mechanisms perform reasoning and multi-modal fusion. However, involvement certain in fusion design, such as decomposition image scene graph, makes models easily overfit datasets...

10.1109/iccv48922.2021.00179 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

It has been well recognized that modeling object-to-object relations would be helpful for object detection. Nevertheless, the problem is not trivial especially when exploring interactions between objects to boost video detectors. The difficulty originates from aspect reliable in a should depend on only present frame but also all supportive extracted over long range span of video. In this paper, we introduce new design capture across spatio-temporal context. Specifically, Relation...

10.1109/iccv.2019.00712 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

It has been well recognized that fusing the complementary information from depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection. Nevertheless, it is non-trivial to explore inherently unnatural interaction between sparse points dense 2D pixels. To ease this difficulty, recent approaches generally project onto image plane sample data then aggregate at points. However, these often suffer mismatch resolution of RGB images, leading sub-optimal...

10.1109/tmm.2022.3189778 article EN IEEE Transactions on Multimedia 2022-07-11

Coral reef limestone at different depositional depths and facies differ remarkably on the textural mineralogical characteristics, owing to complex sedimentary diagenesis. To explore effects of pore structure mineral composition associated with diagenetic variation mechanical behavior limestone, a series quasi-static dynamic compression tests along microscopic examinations were performed shallow deep burial depths. It is revealed that (SRL) classified as porous aragonite-type carbonate rock...

10.1016/j.ijmst.2024.07.004 article EN cc-by-nc-nd International Journal of Mining Science and Technology 2024-07-01

As an emerging data modal with precise distance sensing, LiDAR point clouds have been placed great expectations on 3D scene understanding. However, are always sparsely distributed in the space, and unstructured storage, which makes it difficult to represent them for effective object detection. To this end, work, we regard as hollow-3D propose a new architecture, namely Hallucinated Hollow-3D R-CNN (H <sup xmlns:mml="http://www.w3.org/1998/Math/MathML"...

10.1109/tcsvt.2021.3100848 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-07-28

Temporal language grounding (TLG) is a fundamental and challenging problem for vision understanding. Existing methods mainly focus on fully supervised setting with temporal boundary labels training, which, however, suffers expensive cost of annotation. In this work, we are dedicated to weakly TLG, where multiple description sentences given an untrimmed video without labels. task, it critical learn strong cross-modal semantic alignment between sentence semantics visual content. To end,...

10.1109/tmm.2021.3096087 article EN IEEE Transactions on Multimedia 2021-08-24

In this work, we explore neat yet effective Transformer-based frameworks for visual grounding. The previous methods generally address the core problem of grounding, i.e., multi-modal fusion and reasoning, with manually-designed mechanisms. Such heuristic designs are not only complicated but also make models easily overfit specific data distributions. To avoid this, first propose TransVG, which establishes correspondences by Transformers localizes referred regions directly regressing box...

10.1109/tpami.2023.3296823 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-07-19

Single shot detectors that are potentially faster and simpler than two-stage tend to be more applicable object detection in videos. Nevertheless, the extension of such from image video is not trivial especially when appearance deterioration exists videos, e.g., motion blur or occlusion. A valid question how explore temporal coherence across frames for boosting detection. In this paper, we propose address problem by enhancing per-frame features through aggregation neighboring frames....

10.1109/tmm.2020.2990070 article EN IEEE Transactions on Multimedia 2020-04-23

In this work, we propose a new framework, called Document Image Transformer (DocTr), to address the issue of geometry and illumination distortion document images. Specifically, DocTr consists geometric unwarping transformer an correction transformer. By setting set learned query embedding, captures global context image by self-attention mechanism decodes pixel-wise displacement solution correct distortion. After unwarping, our further removes shading artifacts improve visual quality OCR...

10.1145/3474085.3475388 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

In pixel-based reinforcement learning (RL), the states are raw video frames, which mapped into hidden representation before feeding to a policy network. To improve sample efficiency of state learning, recently, most prominent work is based on contrastive unsupervised representation. Witnessing that consecutive frames in game highly correlated, further data efficiency, we propose new algorithm, i.e., masked for RL (M-CURL), takes correlation among inputs consideration. our architecture,...

10.1109/tpami.2022.3176413 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Recent progress on weakly supervised object detection (WSOD) is characterized by formulating WSOD as a Multiple Instance Learning (MIL) problem and taking online refinement with the selected region proposals from MIL. However, MIL inclines to select most discriminative part rather than entire instance top-scoring proposals, which leads weak localization capability for detectors. We attribute this limited intra-class diversity within single image. Specifically, due lack of annotated bounding...

10.1609/aaai.v35i4.16429 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

LiDAR and Radar are two complementary sensing approaches in that specializes capturing an object's 3D shape while provides longer detection ranges as well velocity hints. Though seemingly natural, how to efficiently combine them for improved feature representation is still unclear. The main challenge arises from data extremely sparse lack height information. Therefore, directly integrating features into LiDAR-centric networks not optimal. In this work, we introduce a bi-directional...

10.1109/cvpr52729.2023.01287 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

A recent trend is to combine multiple sensors ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i.e.</i> , cameras, LiDARs and millimeter-wave Radars) achieve robust multi-modal perception for autonomous systems such as self-driving vehicles. Although quite a few sensor fusion algorithms have been proposed, some of which are top-ranked on various leaderboards, systematic study how integrate these three types develop effective 3D object...

10.1109/lra.2022.3193465 article EN IEEE Robotics and Automation Letters 2022-07-25

Atom-substituting doping by atmospheric-pressure chemical vapor deposition (AP-CVD) is an effective and promising strategy for changing the properties of two-dimensional transition-metal dichalcogenides (2D TMDs). In this paper, we successfully grew V-doped MoSe2 films. The photoluminescence (PL) spectra gradually red-shifted with increase concentration, X-ray photoelectron spectroscopy (XPS) after shifted toward a lower binding energy, change polarity before can be seen in transfer...

10.1021/acs.jpcc.3c06829 article EN The Journal of Physical Chemistry C 2024-01-11

Abstract Two-dimensional (2D) WSe 2 has received increasing attention due to its unique optical properties and bipolar behavior. Several -based heterojunctions exhibit bidirectional rectification characteristics, but most devices have a lower ratio. In this work, the Bi O Se/WSe heterojunction prepared by us type Ⅱ band alignment, which can vastly suppress channel current through interface barrier so that device large ratio of about 10 5 . Meanwhile, under different gate voltage modulation,...

10.1088/1674-4926/45/1/012701 article EN Journal of Semiconductors 2024-01-01

Current 3D Large Multimodal Models (3D LMMs) have shown tremendous potential in 3D-vision-based dialogue and reasoning. However, how to further enhance LMMs achieve fine-grained scene understanding facilitate flexible human-agent interaction remains a challenging problem. In this work, we introduce 3D-LLaVA, simple yet highly powerful LMM designed act as an intelligent assistant comprehending, reasoning, interacting with the world. Unlike existing top-performing methods that rely on...

10.48550/arxiv.2501.01163 preprint EN arXiv (Cornell University) 2025-01-02
Coming Soon ...