Song Bai

ORCID: 0000-0002-2570-9118
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Domain Adaptation and Few-Shot Learning
  • Image Retrieval and Classification Techniques
  • Multimodal Machine Learning Applications
  • 3D Shape Modeling and Analysis
  • Anomaly Detection Techniques and Applications
  • Visual Attention and Saliency Detection
  • Genetic Mapping and Diversity in Plants and Animals
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Handwritten Text Recognition Techniques
  • Image Processing and 3D Reconstruction
  • Adversarial Robustness in Machine Learning
  • Robotics and Sensor-Based Localization
  • Video Analysis and Summarization
  • Image and Object Detection Techniques
  • COVID-19 diagnosis using AI
  • 3D Surveying and Cultural Heritage
  • Radiomics and Machine Learning in Medical Imaging
  • Privacy-Preserving Technologies in Data
  • Natural Language Processing Techniques
  • Artificial Intelligence in Healthcare and Education

Noorul Islam University
2025

Canegrowers (Australia)
2025

Harbin University of Commerce
2024-2025

Central South University
2008-2025

University of Oxford
2018-2024

Jinan University
2022-2024

Tianjin University
2022-2024

Southern University of Science and Technology
2022-2024

Xuzhou No.1 People's Hospital
2024

Xuzhou Medical College
2024

In object detection, keypoint-based approaches often experience the drawback of a large number incorrect bounding boxes, arguably due to lack an additional assessment inside cropped regions. This paper presents efficient solution that explores visual patterns within individual regions with minimal costs. We build our framework upon representative one-stage detector named CornerNet. Our approach, CenterNet, detects each as triplet, rather than pair, keypoints, which improves both precision...

10.1109/iccv.2019.00667 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Though CNNs have achieved the state-of-the-art performance on various vision tasks, they are vulnerable to adversarial examples --- crafted by adding human-imperceptible perturbations clean images. However, most of existing attacks only achieve relatively low success rates under challenging black-box setting, where attackers no knowledge model structure and parameters. To this end, we propose improve transferability creating diverse input patterns. Instead using original images generate...

10.1109/cvpr.2019.00284 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

The non-local module works as a particularly useful technique for semantic segmentation while criticized its prohibitive computation and GPU memory occupation. In this paper, we present Asymmetric Non-local Neural Network to segmentation, which has two prominent components: Pyramid Block (APNB) Fusion (AFNB). APNB leverages pyramid sampling into the block largely reduce consumption without sacrificing performance. AFNB is adapted from fuse features of different levels under sufficient...

10.1109/iccv.2019.00068 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

10.1016/j.patcog.2020.107637 article EN Pattern Recognition 2020-09-14

This letter introduces a robust representation of 3-D shapes, named DeepPano, learned with deep convolutional neural networks (CNN). Firstly, each shape is converted into panoramic view, namely cylinder projection around its principle axis. Then, variant CNN specifically designed for learning the representations directly from such views. Different typical CNN, row-wise max-pooling layer inserted between convolution and fully-connected layers, making invariant to rotation Our approach...

10.1109/lsp.2015.2480802 article EN IEEE Signal Processing Letters 2015-09-22

Weakly Supervised Object Detection (WSOD), using only image-level annotations to train object detectors, is of growing importance in recognition. In this paper, we propose a novel deep network for WSOD. Unlike previous networks that transfer the detection problem an image classification Multiple Instance Learning (MIL), our strategy generates proposal clusters learn refined instance classifiers by iterative process. The proposals same cluster are spatially adjacent and associated with...

10.1109/tpami.2018.2876304 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2018-10-16

Most existing 3D object recognition algorithms focus on leveraging the strong discriminative power of deep learning models with softmax loss for classification data, while features metric retrieval is more or less neglected. In paper, we study variants losses retrieval, which did not receive enough attention from this area. First, two kinds representative losses, triplet and center loss, are introduced could learn than traditional loss. Then, propose a novel named triplet-center can further...

10.1109/cvpr.2018.00208 preprint EN 2018-06-01

Most existing person re-identification algorithms either extract robust visual features or learn discriminative metrics for images. However, the underlying manifold which those images reside on is rarely investigated. That arises a problem that learned metric not smooth with respect to local geometry structure of data manifold. In this paper, we study manifold-based affinity learning, did receive enough attention from area. An unconventional manifold-preserving algorithm proposed, can 1)...

10.1109/cvpr.2017.358 article EN 2017-07-01

Projective analysis is an important solution for 3D shape retrieval, since human visual perceptions of shapes rely on various 2D observations from different view points. Although multiple informative and discriminative views are utilized, most projection-based retrieval systems suffer heavy computational cost, thus cannot satisfy the basic requirement scalability search engines. In this paper, we present a real-time engine based projective images shapes. The property our results following...

10.1109/cvpr.2016.543 article EN 2016-06-01

A typical pipeline for multi-object tracking (MOT) is to use a detector object localization, and following re-identification (re-ID)for association. This partially motivated by recent progress in both detection re- ID, biases existing datasets, where most objects tend have distin-guishing appearance re-ID models are sufficient es-tablishing associations. In response such bias, we would like re-emphasize that methods should also work when not sufficiently discriminative. To this end, propose...

10.1109/cvpr52688.2022.02032 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention. As data annotation requires massive human labor from experienced radiologists, it common that training usually partially-labeled. However, these background labels can be misleading in since the ``background'' contains some other organs of interest. To address ambiguity partially-labeled datasets, we propose Prior-aware Neural Network (PaNN) via explicitly...

10.1109/iccv.2019.01077 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Temporal action detection (TAD) aims to determine the semantic label and temporal interval of every instance in an untrimmed video. It is a fundamental challenging task video understanding. Previous methods tackle this with complicated pipelines. They often need train multiple networks involve hand-designed operations, such as non-maximal suppression anchor generation, which limit flexibility prevent end-to-end learning. In paper, we propose Transformer-based method for TAD, termed TadTR....

10.1109/tip.2022.3195321 article EN IEEE Transactions on Image Processing 2022-01-01

Reading text in the wild is a very challenging task due to diversity of instances and complexity natural scenes. Recently, community has paid increasing attention problem recognizing with irregular shapes. One intuitive effective way handle this rectify canonical form before recognition. However, these methods might struggle when dealing highly curved or distorted instances. To tackle issue, we propose paper Symmetry-constrained Rectification Network (ScRN) based on local attributes...

10.1109/iccv.2019.00924 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots voxel-wise annotations, which are usually difficult, expensive, and slow to obtain. comparison, massive unlabeled 3D volumes easily accessible. Current mainstream works address semi-supervised biomedical image problem mostly graph-based. By contrast, network based methods have not drawn much attention in this field. work, we propose Deep Multi-Planar Co-Training (DMPCT),...

10.1109/wacv.2019.00020 article EN 2019-01-01

Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals a density map over image pixels. Existing approaches mainly suffer the extreme variations. Such pattern shift poses challenges even for multi-scale model ensembling. In this paper, we propose simple yet effective approach tackle problem. First, patch-level is extracted estimation and further grouped into several levels which are determined full datasets. Second, each patch automatically...

10.1109/iccv.2019.00847 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

In this work we present SwiftNet for real-time semisupervised video object segmentation (one-shot VOS), which reports 77.8% $\mathcal{J}\& \mathcal{F}$ and 70 FPS on DAVIS 2017 validation dataset, leading all solutions in overall accuracy speed performance. We achieve by elaborately compressing spatiotemporal redundancy matching-based VOS via Pixel-Adaptive Memory (PAM). Temporally, PAM adaptively triggers memory updates frames where objects display noteworthy inter-frame variations....

10.1109/cvpr46437.2021.00135 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e.g., SwinIR) can significantly improve model performance but computation overhead is also considerable. In this paper, we present SRFormer, a simple novel method enjoy benefit of large self-attention introduces even less computational burden. The core our SRFormer permuted (PSA), which strikes an appropriate balance between channel and spatial information self-attention. Our PSA be...

10.1109/iccv51070.2023.01174 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Open-vocabulary scene understanding aims to localize and recognize unseen categories beyond the annotated label space. The recent breakthrough of 2D open-vocabulary perception is largely driven by Internet-scale paired image-text data with rich vocabulary concepts. However, this success cannot be directly transferred 3D scenarios due inaccessibility large-scale 3D-text pairs. To end, we propose distill knowledge encoded in pretrained vision-language (VL) foundation models through captioning...

10.1109/cvpr52729.2023.00677 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

There are two mainstream approaches for object detection: top-down and bottom-up. The state-of-the-art mainly methods. In this paper, we demonstrate that bottom-up show competitive performance compared with have higher recall rates. Our approach, named CenterNet, detects each as a triplet of keypoints (top-left bottom-right corners the center keypoint). We first group according to some designed cues confirm locations based on keypoints. corner allow approach detect objects various scales...

10.1109/tpami.2023.3342120 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-13

Video object segmentation (VOS) aims at segmenting a particular throughout the entire video clip sequence. The state-of-the-art VOS methods have achieved excellent performance (e.g., 90+% $\mathcal{J}$ & $\mathcal{F}$) on existing datasets. However, since target objects in these datasets are usually relatively salient, dominant, and isolated, under complex scenes has rarely been studied. To revisit make it more applicable real world, we collect new dataset called coMplex Object SEgmentation...

10.1109/iccv51070.2023.01850 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

In this paper, we propose an extremely efficient algorithm for visual re-ranking. By considering the original pairwise distance in contextual space, develop a feature vector called sparse activation (SCA) that encodes local distribution of image. Hence, re-ranking task can be simply accomplished by comparison under generalized Jaccard metric, which has its theoretical meaning fuzzy set theory. order to improve time efficiency procedure, inverted index is successfully introduced speed up...

10.1109/tip.2016.2514498 article EN IEEE Transactions on Image Processing 2016-01-05
Coming Soon ...