Yang Song

ORCID: 0000-0003-1283-1672
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • AI in cancer detection
  • Advanced Neural Network Applications
  • Cell Image Analysis Techniques
  • Advanced Image and Video Retrieval Techniques
  • Digital Imaging for Blood Diseases
  • Image Retrieval and Classification Techniques
  • Domain Adaptation and Few-Shot Learning
  • Radiomics and Machine Learning in Medical Imaging
  • Medical Image Segmentation Techniques
  • Image Processing Techniques and Applications
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • COVID-19 diagnosis using AI
  • Advanced Neuroimaging Techniques and Applications
  • Topic Modeling
  • Anomaly Detection Techniques and Applications
  • Adversarial Robustness in Machine Learning
  • Brain Tumor Detection and Classification
  • Video Surveillance and Tracking Methods
  • Advanced Graph Neural Networks
  • Human Pose and Action Recognition
  • Advanced Image Processing Techniques
  • Natural Language Processing Techniques
  • Video Analysis and Summarization
  • Face recognition and analysis

UNSW Sydney
2018-2025

Hebei Agricultural University
2025

Northeast Forestry University
2025

Google (United States)
2010-2024

Siemens (China)
2023-2024

Tongji University
2024

Anhui Water Conservancy and Hydropower Survey and Design Institute
2022-2024

First Affiliated Hospital of Anhui Medical University
2024

Anhui Medical University
2024

Shanghai Changzheng Hospital
2024

The goal of this paper is to serve as a guide for selecting detection architecture that achieves the right speed/memory/accuracy balance given application and platform. To end, we investigate various ways trade accuracy speed memory usage in modern convolutional object systems. A number successful systems have been proposed recent years, but apples-toapples comparisons are difficult due different base feature extractors (e.g., VGG, Residual Networks), default image resolutions, well hardware...

10.1109/cvpr.2017.351 preprint EN 2017-07-01

With the rapid increase of large-scale, real-world datasets, it becomes critical to address problem long-tailed data distribution (i.e., a few classes account for most data, while are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on number observations each class. In this work, we argue that samples increases, additional benefit newly added point will diminish. We introduce novel theoretical framework measure...

10.1109/cvpr.2019.00949 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class differences. This paper proposes deep ranking model that employs learning techniques learn metric directly from images. has higher capability than models based on hand-crafted features. A novel multiscale network structure been developed describe the images effectively. An efficient triplet sampling algorithm also proposed with distributed asynchronized stochastic gradient....

10.1109/cvpr.2014.180 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2014-06-01

Existing image classification datasets used in computer vision tend to have a uniform distribution of images across object categories. In contrast, the natural world is heavily imbalanced, as some species are more abundant and easier photograph than others. To encourage further progress challenging real conditions we present iNaturalist detection dataset, consisting 859,000 from over 5,000 different plants animals. It features visually similar species, captured wide variety situations, all...

10.1109/cvpr.2018.00914 article EN 2018-06-01

If I provide you a face image of mine (without telling the actual age when took picture) and large amount images that crawled (containing labeled faces different ages but not necessarily paired), can show me what would look like am 80 or was 5? The answer is probably No. Most existing aging works attempt to learn transformation between groups thus require paired samples as well query image. In this paper, we at problem from generative modeling perspective such no required. addition, given an...

10.1109/cvpr.2017.463 article EN 2017-07-01

In this paper we address the issue of output instability deep neural networks: small perturbations in visual input can significantly distort feature embeddings and a network. Such affects many architectures with state-of-the-art performance on wide range computer vision tasks. We present general stability training method to stabilize networks against distortions that result from various types common image processing, such as compression, rescaling, cropping. validate our by stabilizing state...

10.1109/cvpr.2016.485 article EN 2016-06-01

Transferring the knowledge learned from large scale datasets (e.g., ImageNet) via fine-tuning offers an effective solution for domain-specific fine-grained visual categorization (FGVC) tasks recognizing bird species or car make & model). In such scenarios, data annotation often calls specialized domain and thus is difficult to scale. this work, we first tackle a problem in FGVC. Our method won place iNaturalist 2017 classification challenge. Central success of our approach training scheme...

10.1109/cvpr.2018.00432 article EN 2018-06-01

The accurate identification of malignant lung nodules on chest CT is critical for the early detection cancer, which also offers patients best chance cure. Deep learning methods have recently been successfully introduced to computer vision problems, although substantial challenges remain in due lack large training data sets. In this paper, we propose a multi-view knowledge-based collaborative (MV-KBC) deep model separate from benign using limited data. Our learns 3-D nodule characteristics by...

10.1109/tmi.2018.2876510 article EN IEEE Transactions on Medical Imaging 2018-10-17

Discrete point cloud objects lack sufficient shape descriptors of 3D geometries. In this paper, we present a novel method for aggregating hypothetical curves in clouds. Sequences connected points (curves) are initially grouped by taking guided walks the clouds, and then subsequently aggregated back to augment their pointwise features. We provide an effective implementation proposed aggregation strategy including curve grouping operator followed operator. Our was benchmarked on several...

10.1109/iccv48922.2021.00095 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Given an arbitrary face image and speech clip, the proposed work attempts to generate talking video with accurate lip synchronization. Existing works either do not consider temporal dependency across frames thus yielding abrupt facial movement or are limited generation of for a specific person lacking generalization capacity. We propose novel conditional recurrent network that incorporates both audio features in unit dependency. To achieve image- video-realism, pair spatial-temporal...

10.24963/ijcai.2019/129 article EN 2019-07-28

Guided image synthesis enables everyday users to create and edit photo-realistic images with minimum effort. The key challenge is balancing faithfulness the user input (e.g., hand-drawn colored strokes) realism of synthesized image. Existing GAN-based methods attempt achieve such balance using either conditional GANs or GAN inversions, which are challenging often require additional training data loss functions for individual applications. To address these issues, we introduce a new editing...

10.48550/arxiv.2108.01073 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Abstract Background and Hypothesis Neuroimaging studies investigating the neural substrates of auditory verbal hallucinations (AVH) in schizophrenia have yielded mixed results, which may be reconciled by network localization. We sought to examine whether AVH-state AVH-trait brain alterations localize common or distinct networks. Study Design initially identified reported 48 previous studies. By integrating these affected locations with large-scale discovery validation resting-state...

10.1093/schbul/sbae020 article EN Schizophrenia Bulletin 2024-02-24

In this paper, we propose a new classification method for five categories of lung tissues in high-resolution computed tomography (HRCT) images, with feature-based image patch approximation. We design two feature descriptors higher descriptiveness, namely the rotation-invariant Gabor-local binary patterns (RGLBP) texture descriptor and multi-coordinate histogram oriented gradients (MCHOG) gradient descriptor. Together intensity features, each is then labeled based on its approximation from...

10.1109/tmi.2013.2241448 article EN IEEE Transactions on Medical Imaging 2013-01-18

The goal of this paper is to serve as a guide for selecting detection architecture that achieves the right speed/memory/accuracy balance given application and platform. To end, we investigate various ways trade accuracy speed memory usage in modern convolutional object systems. A number successful systems have been proposed recent years, but apples-to-apples comparisons are difficult due different base feature extractors (e.g., VGG, Residual Networks), default image resolutions, well...

10.48550/arxiv.1611.10012 preprint EN other-oa arXiv (Cornell University) 2016-01-01

Human actions capture a wide variety of interactions between people and objects. As result, the set possible is extremely large it difficult to obtain sufficient training examples for all actions. However, we could compensate this sparsity in supervision by leveraging rich semantic relationship different A single action often composed other smaller exclusive certain others. We need method which can reason about such relationships extrapolate unobserved from known Hence, propose novel neural...

10.1109/cvpr.2015.7298713 article EN 2015-06-01

Accurate and reliable segmentation of the prostate gland using magnetic resonance (MR) imaging has critical importance for diagnosis treatment diseases, especially cancer. Although many automated approaches, including those based on deep learning have been proposed, performance still room improvement due to large variability in image appearance, interference, anisotropic spatial resolution. In this paper, we propose 3D adversarial pyramid convolutional neural network (3D APA-Net) MR images....

10.1109/tmi.2019.2928056 article EN IEEE Transactions on Medical Imaging 2019-07-11

"If I provide you a face image of mine (without telling the actual age when took picture) and large amount images that crawled (containing labeled faces different ages but not necessarily paired), can show me what would look like am 80 or was 5?" The answer is probably "No." Most existing aging works attempt to learn transformation between groups thus require paired samples as well query image. In this paper, we at problem from generative modeling perspective such no required. addition,...

10.48550/arxiv.1702.08423 preprint EN cc-by arXiv (Cornell University) 2017-01-01

Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class differences. This paper proposes deep ranking model that employs learning techniques learn metric directly from images.It has higher capability than models based on hand-crafted features. A novel multiscale network structure been developed describe the images effectively. An efficient triplet sampling algorithm proposed with distributed asynchronized stochastic gradient. Extensive...

10.48550/arxiv.1404.4661 preprint EN other-oa arXiv (Cornell University) 2014-01-01

With the rapid increase of large-scale, real-world datasets, it becomes critical to address problem long-tailed data distribution (i.e., a few classes account for most data, while are under-represented). Existing solutions typically adopt class re-balancing strategies such as re-sampling and re-weighting based on number observations each class. In this work, we argue that samples increases, additional benefit newly added point will diminish. We introduce novel theoretical framework measure...

10.48550/arxiv.1901.05555 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Most existing recommender systems leverage user behavior data of one type only, such as the purchase in E-commerce that is directly related to business Key Performance Indicator (KPI) conversion rate. Besides key behavioral data, we argue other forms behaviors also provide valuable signal, views, clicks, adding a product shopping carts and so on. They should be taken into account properly quality recommendation for users. In this work, contribute new solution named short Neural Multi-Task...

10.1109/tkde.2019.2958808 article EN IEEE Transactions on Knowledge and Data Engineering 2019-12-10
Coming Soon ...