Li Niu

ORCID: 0000-0003-1970-8634
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Generative Adversarial Networks and Image Synthesis
  • Human Pose and Action Recognition
  • Advanced Image Processing Techniques
  • Image Enhancement Techniques
  • Advanced Neural Network Applications
  • Advanced Vision and Imaging
  • Visual Attention and Saliency Detection
  • Anomaly Detection Techniques and Applications
  • Image and Signal Denoising Methods
  • COVID-19 diagnosis using AI
  • Image Retrieval and Classification Techniques
  • Machine Learning and Data Classification
  • Sensorless Control of Electric Motors
  • Image Processing Techniques and Applications
  • Text and Document Classification Technologies
  • Video Surveillance and Tracking Methods
  • Color Science and Applications
  • Video Analysis and Summarization
  • Digital Media Forensic Detection
  • Aesthetic Perception and Analysis
  • Multilevel Inverters and Converters
  • Electric Motor Design and Analysis

Shanghai Jiao Tong University
2018-2025

Nanjing University
2023-2025

Harbin Institute of Technology
2011-2024

Xinyang Normal University
2023

Beijing University of Chemical Technology
2020-2021

Chinese Academy of Sciences
2021

Yunnan University
2018-2020

Shanghai Municipal Education Commission
2019-2020

Dongfang Electric Corporation (China)
2020

China General Nuclear Power Corporation (China)
2019

Textual-visual cross-modal retrieval has been a hot research topic in both computer vision and natural language processing communities. Learning appropriate representations for multi-modal data is crucial the performance. Unlike existing image-text approaches that embed pairs as single feature vectors common representational space, we propose to incorporate generative processes into embedding, through which are able learn not only global abstract features but also local grounded features....

10.1109/cvpr.2018.00750 preprint EN 2018-06-01

Image composition is an important operation in image processing, but the inconsistency between foreground and background significantly degrades quality of composite image. harmonization, aiming to make compatible with background, a promising yet challenging task. However, lack high-quality publicly available dataset for harmonization greatly hinders development techniques. In this work, we contribute iHarmony4 by generating synthesized images based on COCO (resp., Adobe5k, Flickr, day2night)...

10.1109/cvpr42600.2020.00842 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Existing semantic segmentation models heavily rely on dense pixel-wise annotations. To reduce the annotation pressure, we focus a challenging task named zero-shot segmentation, which aims to segment unseen objects with zero This can be accomplished by transferring knowledge across categories via word embeddings. In this paper, propose novel context-aware feature generation method for CaGNet. particular, observation that highly depends its contextual information, insert module in network...

10.1145/3394171.3413593 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Recently, DETR and Deformable have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors. However, their on Video Object Detection (VOD) has not well explored. In this paper, we present TransVOD, an end-to-end video model based a spatial-temporal Transformer architecture. The goal of paper is streamline pipeline VOD, effectively removing feature aggregation, e.g., optical flow,...

10.1145/3474085.3475285 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

In order to generate images for a given category, existing deep generative models generally rely on abundant training images. However, extensive data acquisition is expensive and fast learning ability from limited necessarily required in real-world applications. Also, these methods are not well-suited adaptation new category. Few-shot image generation, aiming only few has attracted some research interest. this paper, we propose Fusing-and-Filling Generative Adversarial Network (F2GAN)...

10.1145/3394171.3413561 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Given a composite image with inharmonious foreground and background, harmonization aims to adjust the make it compatible background. Previous methods mainly focus on learning mapping from real image, while ignoring crucial guidance role that background plays. In this work, we formulate task as background-guided domain translation. Specifically, use code extractor capture information guide harmonization, which is regulated by well-tailored triplet losses. Extensive experiments benchmark...

10.1109/icme51207.2021.9428394 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2021-06-09

Given a composite image, image harmonization aims to adjust the foreground make it compatible with background. High-resolution is in high demand, but still remains unexplored. Conventional methods learn global RGB-to-RGB transformation which could effortlessly scale resolution, ignore diverse local context. Recent deep learning dense pixel-to-pixel generate harmonious outputs, are highly constrained low resolution. In this work, we propose high-resolution network Collaborative Dual...

10.1109/cvpr52688.2022.01792 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this work, we formulate a new weakly supervised domain generalization approach for visual recognition by using loosely labeled web images/videos as training data. Specifically, aim to address two challenging issues when learning robust classifiers: 1) coping with noise in the labels of source domain; and 2) enhancing capability learnt classifiers any unseen target domain. To first issue, partition samples each class into multiple clusters. By treating cluster "bag" "instances",...

10.1109/cvpr.2015.7298894 article EN 2015-06-01

This paper presents a novel on-line inertia identification method with load torque observer to optimize the speed loop PID parameters of servo system. The proposed algorithm in this adopts fixed-order recursive empirical frequency-domain optimal parameter estimation improve performance. A is employed order obtain more precise value inertia. Then, PI optimization identified and deduced frequency domain. Compared least square algorithm, effectiveness demonstrated by simulation experimental results.

10.1109/tpel.2014.2307061 article EN IEEE Transactions on Power Electronics 2014-02-20

Fine-grained image classification, which targets at distinguishing subtle distinctions among various subordinate categories, remains a very difficult task due to the high annotation cost of enormous fine-grained categories. To cope with scarcity well-labeled training images, existing works mainly follow two research directions: 1) utilize freely available web images without human annotation; 2) only annotate some categories and transfer knowledge other falls into scope zero-shot learning...

10.1109/cvpr.2018.00749 article EN 2018-06-01

In this paper, we propose a new multi-view domain generalization (MVDG) approach for visual recognition, in which aim to use the source samples with multiple types of features (i.e., features) learn robust classifiers that can generalize well any unseen target domain. Considering recent works show capability be enhanced by fusing SVM classifiers, build upon exemplar SVMs set using one positive sample and all negative each time. When come from latent domains, expect weight vectors organized...

10.1109/iccv.2015.477 article EN 2015-12-01

Video understanding has achieved great success in representation learning, such as video caption, object grounding, and descriptive question-answer. However, current methods still struggle on reasoning, including evidence reasoning commonsense reasoning. To facilitate deeper towards we present the task of Causal-VidQA, which includes four types questions ranging from scene description (description) to (explanation) (prediction counterfactual). For set up a two-step solution by answering...

10.1109/cvpr52688.2022.02059 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Deep image inpainting can inpaint a corrupted using feed-forward inference, but still fails to handle large missing area or complex semantics. Recently, GAN inversion based methods propose leverage semantic information in pretrained generator (e.g., StyleGAN) solve the above issues. Different from methods, they seek for closest latent code and feed it generator. However, inferring is either time-consuming inaccurate. In this paper, we develop dual-path network with path path, which provides...

10.1109/cvpr52688.2022.01113 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Affordance learning considers the interaction opportunities for an actor in scene and thus has wide application understanding intelligent robotics. In this paper, we focus on contextual affordance learning, i.e., using as context to generate a reasonable human pose scene. Existing scene-aware generation methods could be divided into two categories depending whether templates. Our proposed method belongs template-based category, which benefits from representative Moreover, inspired by recent...

10.1145/3581783.3612439 article EN 2023-10-26

Camera-based face detection and verification have advanced to the point where they are ready be integrated into myriad applications, from household appliances Internet of Things devices drones. Many these applications impose stringent constraints on form-factor, weight, cost camera package that cannot met by current-generation lens-based imagers. Lensless imaging systems provide an increasingly promising alternative radically changes form-factor reduces weight a system. However, lensless...

10.1109/tci.2018.2889933 article EN IEEE Transactions on Computational Imaging 2018-12-27

To generate new images for a given category, most deep generative models require abundant training from this which are often too expensive to acquire. achieve the goal of generation based on only few images, we propose matching-based Generative Adversarial Network (GAN) few-shot generation, includes matching generator and discriminator. Matching can match random vectors with conditional same category fused features. The discriminator extends conventional GAN by feature generated image...

10.1109/icme46284.2020.9102917 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2020-06-09

Image-text retrieval aims to capture the semantic correlation between images and texts. Existing image-text methods can be roughly categorized into embedding learning paradigm pair-wise paradigm. The former fails fine-grained correspondence latter achieves finegrained alignment regions words, but high cost of computation leads slow speed. In this paper, we propose a novel method named MEMBER by using Memory-based EMBedding Enhancement for Retrieval (MEMBER), which introduces global memory...

10.1109/tip.2021.3123553 article EN IEEE Transactions on Image Processing 2021-01-01

As a common image editing operation, composition aims to combine the foreground from one and another background image, resulting in composite image. However, there are many issues that could make images unrealistic. These can be summarized as inconsistency between background, which includes appearance (e.g., incompatible illumination), geometry unreasonable size), semantic mismatched context). Image task decomposed into multiple sub-tasks, each sub-task targets at or more issues....

10.48550/arxiv.2106.14490 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Image composition targets at inserting a foreground object into background image. Most previous image methods focus on adjusting the to make it compatible with while ignoring shadow effect of background. In this work, we generating plausible for in composite First, contribute real-world generation dataset DESOBA by synthetic images based paired real and deshadowed images. Then, propose novel network SGRNet, which consists mask prediction stage filling stage. stage, information are thoroughly...

10.1609/aaai.v36i1.19974 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

In few-shot image generation, directly training GAN models on just a handful of images faces the risk overfitting. A popular solution is to transfer pretrained large source domains small target ones. this work, we introduce WeditGAN, which realizes model by editing intermediate latent codes w in StyleGANs with learned constant offsets (delta w), discovering and constructing spaces via simply relocating distribution spaces. The established one-to-one mapping between can naturally prevents...

10.1609/aaai.v38i2.27932 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

In this paper, we propose a new exemplar-based multi-view domain generalization (EMVDG) framework for visual recognition by learning robust classifier that are able to generalize well arbitrary target based on the training samples with multiple types of features (i.e., features). framework, aim address two issues simultaneously. First, distribution source domain) is often considerably different from testing domain), so performance classifiers learnt may drop significantly domain. Moreover,...

10.1109/tnnls.2016.2615469 article EN IEEE Transactions on Neural Networks and Learning Systems 2016-11-03

Aesthetic image cropping is a practical but challenging task which aims at finding the best crops with highest aesthetic quality in an image. Recently, many deep learning methods have been proposed to address this problem, they did not reveal intrinsic mechanism of evaluation. In paper, we propose interpretable model unveil mystery. For each image, use fully convolutional network produce score map, shared among all candidate during crop-level Then, require map be both composition-aware and...

10.1609/aaai.v34i07.6889 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03
Coming Soon ...