NFDI4DS | UHH-SEMS - Publication Details

Shiliang Pu

ORCID: 0000-0001-5269-7821

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5085955762

Research Areas

Domain Adaptation and Few-Shot Learning
Advanced Neural Network Applications
Multimodal Machine Learning Applications
Advanced Image and Video Retrieval Techniques
Handwritten Text Recognition Techniques
Human Pose and Action Recognition
Topic Modeling
Video Surveillance and Tracking Methods
Face recognition and analysis
Advanced Graph Neural Networks
Natural Language Processing Techniques
Advanced Image Processing Techniques
Image Retrieval and Classification Techniques
Anomaly Detection Techniques and Applications
Robotics and Sensor-Based Localization
3D Surveying and Cultural Heritage
3D Shape Modeling and Analysis
Image Enhancement Techniques
Advanced Vision and Imaging
Digital Media Forensic Detection
Gait Recognition and Analysis
Biometric Identification and Security
Text and Document Classification Technologies
Image Processing and 3D Reconstruction
COVID-19 diagnosis using AI

Hikvision (China)
2018-2024

InferVision (China)
2018-2024

Zhejiang University
2019-2023

Peking University
2023

Chongqing University
2023

South China University of Technology
2020

Cloud Computing Center
2020

Fudan University
2018

Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation

OPENALEX - Publications

Chao Li Qiaoyong Zhong Di Xie Shiliang Pu

Skeleton-based human action recognition has recently drawn increasing attentions with the availability of large-scale skeleton datasets. The most crucial factors for this task lie in two aspects: intra-frame representation joint co-occurrences and inter-frame skeletons' temporal evolutions. In paper we propose an end-to-end convolutional co-occurrence feature learning framework. features are learned a hierarchical methodology, which different levels contextual information aggregated...

10.24963/ijcai.2018/109 preprint EN 2018-07-01

Focusing Attention: Towards Accurate Text Recognition in Natural Images

OPENALEX - Publications

Zhanzhan Cheng Fan Bai Yunlu Xu Gang Zheng Shiliang Pu and 1 more

Scene text recognition has been a hot research topic in computer vision due to its various applications. The state of the art is attention-based encoder-decoder framework that learns mapping between input images and output sequences purely data-driven way. However, we observe existing methods perform poorly on complicated and/or low-quality images. One major reason cannot get accurate alignments feature areas targets for such We call this phenomenon "attention drift". To tackle problem,...

10.1109/iccv.2017.543 preprint EN 2017-10-01

AON: Towards Arbitrarily-Oriented Text Recognition

OPENALEX - Publications

Zhanzhan Cheng Yangliu Xu Fan Bai Yi Niu Shiliang Pu and 1 more

Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring of several decades on optical character recognition (OCR), recognizing texts still challenging task. This because scene are often irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed literature. Existing methods mainly work with regular (horizontal and frontal) cannot be trivially generalized...

10.1109/cvpr.2018.00584 article EN 2018-06-01

Counterfactual Samples Synthesizing for Robust Visual Question Answering

OPENALEX - Publications

Long Chen Xin Yan Jun Xiao Hanwang Zhang Shiliang Pu and 1 more

Despite Visual Question Answering (VQA) has realized impressive progress over the last few years, today's VQA models tend to capture superficial linguistic correlations in train set and fail generalize test with different QA distributions. To reduce language biases, several recent works introduce an auxiliary question-only model regularize training of targeted model, achieve dominating performance on VQA-CP. However, since complexity design, current methods are unable equip ensemble-based...

10.1109/cvpr42600.2020.01081 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Skeleton-based action recognition with convolutional neural networks

OPENALEX - Publications

Chao Li Qiaoyong Zhong Di Xie Shiliang Pu

Current state-of-the-art approaches to skeleton-based action recognition are mostly based on recurrent neural networks (RNN). In this paper, we propose a novel convolutional (CNN) framework for both classification and detection. Raw skeleton coordinates as well motion fed directly into CNN label prediction. A transformer module is designed rearrange select important joints automatically. With simple 7-layer network, obtain 89.3% accuracy validation set of the NTU RGB+D dataset. For detection...

10.1109/icmew.2017.8026285 article EN 2017-07-01

Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition

OPENALEX - Publications

Fanfan Ye Shiliang Pu Qiaoyong Zhong Chao Li Di Xie and 1 more

raph Convolutional Networks (GCNs) have attracted increasing interests for the task of skeleton-based action recognition. The key lies in design graph structure, which encodes skeleton topology information. In this paper, we propose Dynamic GCN, a novel convolutional neural network named Context-encoding Network (CeN) is introduced to learn automatically. particular, when learning dependency between two joints, contextual features from rest joints are incorporated global manner. CeN...

10.1145/3394171.3413941 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Advancing Image Understanding in Poor Visibility Environments: A Collective Benchmark Study

OPENALEX - Publications

Wenhan Yang Ye Yuan Wenqi Ren Jiaying Liu Walter J. Scheirer and 63 more

Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed not always be case in practice. We focus on object or face detection poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination fair comparison, we introduce three benchmark sets collected real-world hazy, rainy, low-light conditions, respectively, with annotated objects/faces. launched UG <sup...

10.1109/tip.2020.2981922 article EN IEEE Transactions on Image Processing 2020-01-01

RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation

OPENALEX - Publications

Jianyun Xu Ruixiang Zhang Jian Dou Zhu Yushi Jie Sun and 1 more

Point clouds can be represented in many forms (views), typically, point-based sets, voxel-based cells or range-based images(i.e., panoramic view). The view is geometrically accurate, but it disordered, which makes difficult to find local neighbors efficiently. regular, sparse, and computation grows cubicly when voxel resolution increases. regular generally dense, however spherical projection physical dimensions distorted. Both voxel-and views suffer from quantization loss, especially for...

10.1109/iccv48922.2021.01572 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Edit Probability for Scene Text Recognition

OPENALEX - Publications

Fan Bai Zhanzhan Cheng Yi Niu Shiliang Pu Shuigeng Zhou

We consider the scene text recognition problem under attention-based encoder-decoder framework, which is state of art. The existing methods usually employ a frame-wise maximal likelihood loss to optimize models. When we train model, misalignment between ground truth strings and attention's output sequences probability distribution, caused by missing or superfluous characters, will confuse mislead training process, consequently make costly degrade accuracy. To handle this problem, propose...

10.1109/cvpr.2018.00163 article EN 2018-06-01

Content-Aware Convolutional Neural Network for In-Loop Filtering in High Efficiency Video Coding

OPENALEX - Publications

Chuanmin Jia Shiqi Wang Xinfeng Zhang Shanshe Wang Jiaying Liu and 2 more

Recently, convolutional neural network (CNN) has attracted tremendous attention and achieved great success in many image processing tasks. In this paper, we focus on CNN technology combined with restoration to facilitate video coding performance propose the content-aware based in-loop filtering for high-efficiency (HEVC). particular, quantitatively analyze structure of proposed model from multiple dimensions make interpretable optimal CNN-based loop filtering. More specifically, each tree...

10.1109/tip.2019.2896489 article EN IEEE Transactions on Image Processing 2019-01-31

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

OPENALEX - Publications

Long Chen Hanwang Zhang Jun Xiao Xiangnan He Shiliang Pu and 1 more

Scene graphs --- objects as nodes and visual relationships edges describe the whereabouts interactions of in an image for comprehensive scene understanding. To generate coherent graphs, almost all existing methods exploit fruitful context by modeling message passing among objects. For example, ``person'' on ``bike'' can help to determine relationship ``ride'', which turn contributes confidence two However, we argue that is not properly learned using prevailing cross-entropy based supervised...

10.1109/iccv.2019.00471 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Forward Compatible Few-Shot Class-Incremental Learning

OPENALEX - Publications

Da-Wei Zhou Fuyun Wang Han-Jia Ye Liang Ma Shiliang Pu and 1 more

Novel classes frequently arise in our dynamically changing world, e.g., new users the authentication system, and a machine learning model should recognize without forgetting old ones. This scenario becomes more challenging when class instances are insufficient, which is called few-shot class-incremental (FSCIL). Cur-rent methods handle incremental retrospectively by making updated similar to one. By contrast, we suggest prospectively prepare for future updates, propose ForwArd Compatible...

10.1109/cvpr52688.2022.00884 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Divide-and-Assemble: Learning Block-wise Memory for Unsupervised Anomaly Detection

OPENALEX - Publications

Jinlei Hou Yingying Zhang Qiaoyong Zhong Di Xie Shiliang Pu and 1 more

Reconstruction-based methods play an important role in unsupervised anomaly detection images. Ideally, we expect a perfect reconstruction for normal samples and poor abnormal samples. Since the generalizability of deep neural networks is difficult to control, existing models such as autoencoder do not work well. In this work, interpret image divide-and-assemble procedure. Surprisingly, by varying granularity division on feature maps, are able modulate capability model both That is, finer...

10.1109/iccv48922.2021.00867 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation

OPENALEX - Publications

Di Xie Jiang Xiong Shiliang Pu

Deep neural network is difficult to train and this predicament becomes worse as the depth increases. The essence of problem exists in magnitude backpropagated errors that will result gradient vanishing or exploding phenomenon. We show a variant regularizer which utilizes orthonormality among different filter banks can alleviate problem. Moreover, we design backward error modulation mechanism based on quasi-isometry assumption between two consecutive parametric layers. Equipped with these...

10.1109/cvpr.2017.539 preprint EN 2017-07-01

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

OPENALEX - Publications

Jianing Deng Li Wang Shiliang Pu Cheng Zhuo

Recent years have witnessed remarkable success of deep learning methods in quality enhancement for compressed video. To better explore temporal information, existing usually estimate optical flow motion compensation. However, since video could be seriously distorted by various compression artifacts, the estimated tends to inaccurate and unreliable, thereby resulting ineffective enhancement. In addition, estimation consecutive frames is generally conducted a pairwise manner, which...

10.1609/aaai.v34i07.6697 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Collaborative Spatiotemporal Feature Learning for Video Action Recognition

OPENALEX - Publications

Chao Li Qiaoyong Zhong Di Xie Shiliang Pu

Spatiotemporal feature learning is of central importance for action recognition in videos. Existing deep neural network models either learn spatial and temporal features independently (C2D) or jointly with unconstrained parameters (C3D). In this paper, we propose a novel operation which encodes spatiotemporal collaboratively by imposing weight-sharing constraint on the learnable parameters. particular, perform 2D convolution along three orthogonal views volumetric video data, learns...

10.1109/cvpr.2019.00806 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

A Free Lunch for Unsupervised Domain Adaptive Object Detection without Source Data

OPENALEX - Publications

Xianfeng Li Weijie Chen Di Xie Shicai Yang Peng Yuan and 2 more

Unsupervised domain adaptation (UDA) assumes that source and target data are freely available usually trained together to reduce the gap. However, considering privacy inefficiency of transmission, it is impractical in real scenarios. Hence, draws our eyes optimize network without accessing labeled data. To explore this direction object detection, for first time, we propose a data-free adaptive detection (SFOD) framework via modeling into problem learning with noisy labels. Generally,...

10.1609/aaai.v35i10.17029 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting

OPENALEX - Publications

Qiao Liang Sanli Tang Zhanzhan Cheng Yunlu Xu Yi Niu and 2 more

Many approaches have recently been proposed to detect irregular scene text and achieved promising results. However, their localization results may not well satisfy the following recognition part mainly because of two reasons: 1) recognizing arbitrary shaped is still a challenging task, 2) prevalent non-trainable pipeline strategies between detection will lead suboptimal performances. To handle this incompatibility problem, in paper we propose an end-to-end trainable spotting approach named...

10.1609/aaai.v34i07.6864 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

OPENALEX - Publications

Peng Zhang Yunlu Xu Zhanzhan Cheng Shiliang Pu Jing Lu and 3 more

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes and leaflets) contain rich information, automatic document image understanding has become a hot topic. Most existing works decouple the problem into two separate tasks, (1) text reading for detecting recognizing texts in images (2) information extraction analyzing extracting key elements from previously extracted plain text.However, they mainly focus on improving task, while neglecting fact that are mutually correlated....

10.1145/3394171.3413900 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection

OPENALEX - Publications

Yunlu Xu Chengwei Zhang Zhanzhan Cheng Jianwen Xie Yi Niu and 2 more

This paper proposes a segregated temporal assembly recurrent (STAR) network for weakly-supervised multiple action detection. The model learns from untrimmed videos with only supervision of video-level labels and makes prediction intervals actions. Specifically, we first assemble video clips according to class by an attention mechanism that class-variable weights thus helps the noise relieving background or other Secondly, build relationship between actions feeding assembled features into...

10.1609/aaai.v33i01.33019070 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Self-Domain Adaptation for Face Anti-Spoofing

OPENALEX - Publications

Jingjing Wang Jingyi Zhang Ying Bian Youyi Cai Chunmao Wang and 1 more

Although current face anti-spoofing methods achieve promising results under intra-dataset testing, they suffer from poor generalization to unseen attacks. Most existing works adopt domain adaptation (DA) or (DG) techniques address this problem. However, the target is often unknown during training which limits utilization of DA methods. DG can conquer by learning invariant features without seeing any data. fail in utilizing information In paper, we propose a self-domain framework leverage...

10.1609/aaai.v35i4.16379 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

All You Need Is a Few Shifts: Designing Efficient Convolutional Neural Networks for Image Classification

OPENALEX - Publications

Weijie Chen Di Xie Yuan Zhang Shiliang Pu

Shift operation is an efficient alternative over depthwise separable convolution. However, it still bottlenecked by its implementation manner, namely memory movement. To put this direction forward, a new and novel basic component named Sparse Layer (SSL) introduced in paper to construct convolutional neural networks. In family of architectures, the block only composed 1x1 layers with few shift operations applied intermediate feature maps. make idea feasible, we introduce penalty during...

10.1109/cvpr.2019.00741 preprint EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

MANGO: A Mask Attention Guided One-Stage Scene Text Spotter

OPENALEX - Publications

Qiao Liang Ying Chen Zhanzhan Cheng Yunlu Xu Yi Niu and 2 more

Recently end-to-end scene text spotting has become a popular research topic due to its advantages of global optimization and high maintainability in real applications. Most methods attempt develop various region interest (RoI) operations concatenate the detection part sequence recognition into two-stage framework. However, such framework, is highly sensitive detected results (e.g., compactness contours). To address this problem, paper, we propose novel Mask AttentioN Guided One-stage...

10.1609/aaai.v35i3.16348 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks

OPENALEX - Publications

Da-Wei Zhou Han-Jia Ye Liang Ma Di Xie Shiliang Pu and 1 more

New classes arise frequently in our ever-changing world, e.g., emerging topics social media and new types of products e-commerce. A model should recognize meanwhile maintain discriminability over old classes. Under severe circumstances, only limited novel instances are available to incrementally update the model. The task recognizing few-shot without forgetting is called class-incremental learning (FSCIL). In this work, we propose a paradigm for FSCIL based on meta-learning by LearnIng...

10.1109/tpami.2022.3200865 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-01-01

Coming Soon ...