Ning Xu

ORCID: 0000-0002-7526-4356
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • VLSI and FPGA Design Techniques
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Embedded Systems Design Techniques
  • VLSI and Analog Circuit Testing
  • Advanced Vision and Imaging
  • Low-power high-performance VLSI design
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Handwritten Text Recognition Techniques
  • Image Enhancement Techniques
  • Speech Recognition and Synthesis
  • Advanced Computational Techniques and Applications
  • Music and Audio Processing
  • Image and Video Quality Assessment
  • Interconnection Networks and Systems
  • Advanced Neural Network Applications
  • 3D IC and TSV technologies
  • Robotic Path Planning Algorithms
  • Rough Sets and Fuzzy Logic
  • Sentiment Analysis and Opinion Mining

Tianjin University
2016-2025

Wuhan University of Technology
2013-2024

Hangzhou Dianzi University
2024

General Hospital of Shenyang Military Region
2024

Adobe Systems (United States)
2019-2024

Zhejiang University
2024

Shanghai Jiao Tong University
2019-2024

Academy of Art University
2024

Ningbo University
2024

Beijing Institute of Technology
2014-2023

Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground background distributions. In this paper, we present novel deep-learning-based algorithm which much better understanding of objectness can reduce just few clicks. Our transforms user-provided positive negative clicks into two Euclidean distance maps are then concatenated with RGB channels images compose (image,...

10.1109/cvpr.2016.47 preprint EN 2016-06-01

Internet of Vehicles (IoV), when empowered by aerial communications, provides vehicles with seamless connections and proximate computing services. The unpredictable network dynamics aerial-assisted IoV pose challenges to the resource allocation. In this article, dynamic digital twin (DT) is established capture time-varying supply demands, so that unified scheduling allocation can be performed. We design a two-stage incentive mechanism for based on Stackelberg game where DT or road side units...

10.1109/jiot.2021.3058213 article EN IEEE Internet of Things Journal 2021-02-10

Image captioning is one of the primary goals in computer vision which aims to automatically generate natural descriptions for images. Intuitively, human visual system can notice some stimulating regions at first glance, and then volitionally focus on interesting objects within region. For example, a free-form sentence about "boy-catch-baseball", region involving "boy" "baseball" could be attended guide salient object discovery word-by-word generation. Till now, previous works mainly rely...

10.1109/tcsvt.2021.3107035 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-08-24

An algorithm based on independent component analysis (ICA) is introduced for P300 detection. After ICA decomposition, P300-related components are selected according to the a priori knowledge of spatio-temporal pattern, and clear peak reconstructed by back projection ICA. Applied dataset IIb BCI Competition 2003, achieved an accuracy 100% in detection within five repetitions.

10.1109/tbme.2004.826699 article EN IEEE Transactions on Biomedical Engineering 2004-05-25

Combining complementary information from multiple modalities is intuitively appealing for improving the performance of learning-based approaches. However, it challenging to fully leverage different due practical challenges such as varying levels noise and conflicts between modalities. Existing methods do not adopt a joint approach capturing synergies while simultaneously filtering resolving on per sample basis. In this work we propose novel deep neural network based technique that...

10.48550/arxiv.1805.11730 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Dense video captioning is an extremely challenging task since accurate and coherent description of events in a requires holistic understanding contents as well contextual reasoning individual events. Most existing approaches handle this problem by first detecting event proposals from then on subset the proposals. As result, generated sentences are prone to be redundant or inconsistent they fail consider temporal dependency between To tackle challenge, we propose novel dense framework, which...

10.1109/cvpr.2019.00675 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The air-ground network provides users with seamless connections and real-time services, while its resource constraint triggers a paradigm shift from machine learning to federated learning. Federated enables clients collaboratively train models without sharing data. Digital twins provide virtual representation of the networks reflect time-varying status, which in combination reconcile conflict between privacy protection data training networks. In this paper, we consider dynamic digital twin...

10.1109/tnse.2020.3048137 article EN IEEE Transactions on Network Science and Engineering 2020-12-30

Most previous bounding-box-based segmentation methods assume the bounding box tightly covers object of interest. However it is common that a rectangle input could be too large or small. In this paper, we propose novel approach uses as soft constraint by transforming into an Euclidean distance map. A convolutional encoder-decoder network trained end-to-end concatenating images with these maps inputs and predicting masks outputs. Our gets accurate results given sloppy rectangles while being...

10.5244/c.31.182 preprint EN 2017-01-01

Recent progress in using recurrent neural networks (RNNs) for video description has attracted an increasing interest, due to its capability encode a sequence of frames caption generation. While existing methods have studied various features (e.g., CNN, 3D and semantic attributes) visual encoding, the representation fusion heterogeneous information from multi-modal spaces not fully explored. Consider that different modalities are often asynchronous, frame-level concatenation linear fusion)...

10.1109/tcsvt.2018.2867286 article EN IEEE Transactions on Circuits and Systems for Video Technology 2018-08-28

Human action recognition is an active research area in both computer vision and machine learning communities. In the past decades, problem has evolved from conventional single-view problem, to cross-view learning, cross-domain multitask where a large number of algorithms have been proposed literature. Despite having datasets, most them are designed for subset four problems, comparisons between can further limited by variances within experimental configurations, other factors. To best our...

10.1109/tcyb.2016.2582918 article EN IEEE Transactions on Cybernetics 2016-07-18

Different from the fully-supervised action detection problem that is dependent on expensive frame-level annotations, weakly supervised (WSAD) only needs video-level making it more practical for real-world applications. Existing WSAD methods detect instances by scoring each video segment (a stack of frames) individually. Most them fail to model temporal relations among segments and cannot effectively characterize possessing latent structure. To alleviate this in WSAD, we propose structure...

10.1109/iccv.2019.00562 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Image captioning is one of the most challenging tasks in AI because it requires an understanding both complex visuals and natural language. Because image essentially a sequential prediction task, recent advances have used reinforcement learning (RL) to better explore dynamics word-by-word generation. However, existing RL-based methods rely primarily on single policy network reward function-an approach that not well matched multi-level (word sentence) multi-modal (vision language) nature...

10.1109/tmm.2019.2941820 article EN IEEE Transactions on Multimedia 2019-09-18

10.1016/j.jvcir.2018.12.027 article EN Journal of Visual Communication and Image Representation 2018-12-14

Image captioning aims at understanding various semantic concepts (e.g., objects and relationships) from an image integrating them in a sentence-level description. Hence, it is necessary to learn the interaction among these concepts. If we define context of be involved <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">subject-predicate-object</i> triplet, most current methods only focus on single triplet for first-order generate sentences....

10.1109/tcsvt.2021.3121062 article EN IEEE Transactions on Circuits and Systems for Video Technology 2021-10-19

Domain-invariant (view-invariant and modality-invariant) feature representation is essential for human action recognition. Moreover, given a discriminative visual representation, it critical to discover the latent correlations among multiple actions in order facilitate modeling. To address these problems, we propose multi-domain multi-task learning (MDMTL) method to: 1) extract domain-invariant information multi-view multi-modal 2) explore relatedness categories. Specifically, present sparse...

10.1109/tip.2018.2872879 article EN IEEE Transactions on Image Processing 2018-09-28

Incremental learning targets at achieving good performance on new categories without forgetting old ones. Knowledge distillation has been shown critical in preserving the classes. Conventional methods, however, sequentially distill knowledge only from last model, leading to degradation classes later incremental steps. In this paper, we propose a multi-model and multi-level strategy. Instead of distilling directly leverage all previous model snapshots. addition, incorporate an auxiliary...

10.48550/arxiv.1904.01769 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Knowledge-based Visual Question Answering (KB-VQA) aims to answer the image-aware question via external knowledge, which requires an agent not only understand images but also explicitly retrieve and integrate knowledge facts. Intuitively, accurately question, we humans can validate retrieved based on our memory, then align facts with image regions infer answers. However, most existing methods ignore process of validation alignment. In this paper, propose Multi-Modal Validation Domain...

10.1109/tkde.2024.3384270 article EN IEEE Transactions on Knowledge and Data Engineering 2024-04-12

Understanding the structures of oxygen vacancies in bulk ceria is crucial as they significantly impact material's catalytic and electronic properties. The complex interaction between Ce3+ ions presents challenges characterizing ceria's defect chemistry. We introduced a machine learning-assisted cluster-expansion model to predict energetics defective configurations accurately within ceria. This effectively samples configurational spaces, detailing vacancy across different temperatures...

10.1021/acs.jpclett.4c00889 article EN The Journal of Physical Chemistry Letters 2024-05-28

Interactive object selection is a very important research problem and has many applications. Previous algorithms require substantial user interactions to estimate the foreground background distributions. In this paper, we present novel deep learning based algorithm which much better understanding of objectness thus can reduce just few clicks. Our transforms provided positive negative clicks into two Euclidean distance maps are then concatenated with RGB channels images compose (image,...

10.48550/arxiv.1603.04042 preprint EN other-oa arXiv (Cornell University) 2016-01-01
Coming Soon ...