NFDI4DS | UHH-SEMS - Publication Details

Liangliang Cao

ORCID: 0000-0003-0900-1512

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5103187717

Research Areas

Advanced Image and Video Retrieval Techniques
Image Retrieval and Classification Techniques
Video Analysis and Summarization
Multimodal Machine Learning Applications
Speech Recognition and Synthesis
Human Pose and Action Recognition
Video Surveillance and Tracking Methods
Topic Modeling
Domain Adaptation and Few-Shot Learning
Natural Language Processing Techniques
Speech and Audio Processing
Music and Audio Processing
3D Surveying and Cultural Heritage
Anomaly Detection Techniques and Applications
Face recognition and analysis
Complex Network Analysis Techniques
3D Shape Modeling and Analysis
Advanced Vision and Imaging
Remote-Sensing Image Classification
Advanced Neural Network Applications
Recommender Systems and Techniques
Geographic Information Systems Studies
Advanced Text Analysis Techniques
Data Management and Algorithms
Computer Graphics and Visualization Techniques

Apple (United States)
2023-2025

Wuxi Fourth People's Hospital
2024

XinHua Hospital
2024

Shanghai Jiao Tong University
2024

Jiangnan University
2024

University of Massachusetts Amherst
2019-2022

Google (United States)
2019-2022

Guangzhou Experimental Station
2022

Carnegie Mellon University
2020

Amherst College
2020

Learning from Noisy Labels with Distillation

OPENALEX - Publications

Yuncheng Li Shuicheng Yan Yale Song Liangliang Cao Jiebo Luo and 1 more

The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount data with are relatively easy to obtain. Traditionally, label noise has been treated statistical outliers, and techniques such importance re-weighting bootstrapping have proposed alleviate the problem. According our observation, real-world exhibit multimode characteristics true labels, rather than behaving like independent random outliers. In this work, we propose unified distillation...

10.1109/iccv.2017.211 article EN 2017-10-01

Learning Locally-Adaptive Decision Functions for Person Verification

OPENALEX - Publications

Zhen Li Shiyu Chang Feng Liang Thomas S. Huang Liangliang Cao and 1 more

This paper considers the person verification problem in modern surveillance and video retrieval systems. The is to identify whether a pair of face or human body images about same person, even if not seen before. Traditional methods usually look for distance (or similarity) measure between (e.g., by metric learning algorithms), make decisions based on fixed threshold. We show that this nevertheless insufficient sub-optimal problem. proposes learn decision function can be viewed as joint model...

10.1109/cvpr.2013.463 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

Large-scale image classification: Fast feature extraction and SVM training

OPENALEX - Publications

Yuanqing Lin Fengjun Lv Shenghuo Zhu Shuicheng Yan Timothée Cour and 3 more

Most research efforts on image classification so far have been focused medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G~48G). There two main reasons for limited effort large-scale classification. First, until emergence ImageNet dataset, there was almost no publicly available benchmark data This is mostly because class labels expensive to obtain. Second, hard it poses more challenges than its counterparts. A key challenge how...

10.1109/cvpr.2011.5995477 article EN 2011-06-01

Spatially Coherent Latent Topic Model for Concurrent Segmentation and Classification of Objects and Scenes

OPENALEX - Publications

Liangliang Cao Li Fei-Fei

We present a novel generative model for simultaneously recognizing and segmenting object scene classes. Our is inspired by the traditional bag of words representation texts images as well number related models, including probabilistic Latent Semantic Analysis (pLSA) Dirichlet Allocation (LDA). A major drawback pLSA LDA models assumption that each patch in image independently generated given its corresponding latent topic. While such provides an efficient computational method, it lacks power...

10.1109/iccv.2007.4408965 article EN 2007-01-01

Geographical topic discovery and comparison

OPENALEX - Publications

Zhijun Yin Liangliang Cao Jiawei Han ChengXiang Zhai Thomas S. Huang

This paper studies the problem of discovering and comparing geographical topics from GPS-associated documents. documents become popular with pervasiveness location-acquisition technologies. For example, in Flickr, geo-tagged photos are associated tags GPS locations. In Twitter, locations tweets can be identified by smart phones. Many interesting concepts, including cultures, scenes, product sales, correspond to specialized distributions. this paper, we interested two questions: (1) how...

10.1145/1963405.1963443 article EN 2011-03-28

Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data

OPENALEX - Publications

Yuncheng Li Liangliang Cao Jiang Zhu Jiebo Luo

Composing fashion outfits involves deep understanding of standards while incorporating creativity for choosing multiple items (e.g., Jewelry, Bag, Pants, Dress). In websites, popular or high-quality are usually designed by experts and followed large audiences. this paper, we propose a machine learning system to compose automatically. The core the proposed automatic composition is score outfit candidates based on appearances meta-data. We leverage popularity oriented websites supervise...

10.1109/tmm.2017.2690144 article EN IEEE Transactions on Multimedia 2017-03-30

Designing Category-Level Attributes for Discriminative Visual Recognition

OPENALEX - Publications

Felix X. Yu Liangliang Cao Rogério Feris John R. Smith Shih‐Fu Chang

Attribute-based representation has shown great promises for visual recognition due to its intuitive interpretation and cross-category generalization property. However, human efforts are usually involved in the attribute designing process, making costly obtain. In this paper, we propose a novel formulation automatically design discriminative "category-level attributes", which can be efficiently encoded by compact category-attribute matrix. The allows us achieve critical criteria...

10.1109/cvpr.2013.105 article EN 2009 IEEE Conference on Computer Vision and Pattern Recognition 2013-06-01

TGIF: A New Dataset and Benchmark on Animated GIF Description

OPENALEX - Publications

Yuncheng Li Yale Song Liangliang Cao Joel Tetreault Larry Goldberg and 2 more

With the recent popularity of animated GIFs on social media, there is need for ways to index them with rich meta-data. To advance research GIF understanding, we collected a new dataset, Tumblr (TGIF), 100K from and 120K natural language descriptions obtained via crowdsourcing. The motivation this work develop testbed image sequence description systems, where task generate or video clips. ensure high quality developed series novel controls validate free-form text input crowd-workers. We show...

10.1109/cvpr.2016.502 article EN 2016-06-01

Automatic Adaptation of Object Detectors to New Domains Using Self-Training

OPENALEX - Publications

Aruni RoyChowdhury Prithvijit Chakrabarty Ashish Singh SouYoung Jin Huaizu Jiang and 2 more

This work addresses the unsupervised adaptation of an existing object detector to a new target domain. We assume that large number unlabeled videos from this domain are readily available. automatically obtain labels on data by using high-confidence detections detector, augmented with hard (misclassified) examples acquired exploiting temporal cues tracker. These automatically-obtained then used for re-training original model. A modified knowledge distillation loss is proposed, and we...

10.1109/cvpr.2019.00087 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition

OPENALEX - Publications

Yu Zhang Daniel Park Wei Han James Qin Anmol Gulati and 21 more

We summarize the results of a host efforts using giant automatic speech recognition (ASR) models pre-trained large, diverse unlabeled datasets containing approximately million hours audio. find that combination pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens thousands labeled data. In particular, on an ASR task 34k data, by fine-tuning 8 billion parameter Conformer we can match state-of-the-art (SoTA)...

10.1109/jstsp.2022.3182537 article EN IEEE Journal of Selected Topics in Signal Processing 2022-06-13

Microfluidic-based exosome isolation and highly sensitive aptamer exosome membrane protein detection for lung cancer diagnosis

OPENALEX - Publications

Liang Zhao Hong Wang Jun Fu Xia Wu Xiao-ye Liang and 5 more

Non-invasive methods of detecting cancer by circulating exosomes are challenged inefficient purification and identification. This study hereby proposed an automated centrifugal microfluidic disc system combined with functionalized membranes (Exo-CMDS) to isolate enrich exosomes, which will then be processed a novel aptamer fluorescence (Exo-AFS) in order detect the exosome surface proteins effective manner. Exo-CMDS features highly qualified yields optimal exosomal concentration 5.1 × 109...

10.1016/j.bios.2022.114487 article EN cc-by Biosensors and Bioelectronics 2022-06-18

Diffusion Model-Based Image Editing: A Survey

OPENALEX - Publications

Yi Huang Jiancheng Huang Yifan Liu Mingfu Yan Jiaxi Lv and 5 more

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning to reverse process gradually adding noise images, allowing generate high-quality samples from complex distribution. In this survey, we provide exhaustive overview existing methods using editing, covering both theoretical practical aspects field. We delve...

10.1109/tpami.2025.3541625 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2025-02-13

Cross-dataset action detection

OPENALEX - Publications

Liangliang Cao Zicheng Liu Thomas S. Huang

In recent years, many research works have been carried out to recognize human actions from video clips. To learn an effective action classifier, most of the previous approaches rely on enough training labels. When being required in a different dataset, these re-train model using new However, labeling sequences is very tedious and time-consuming task, especially when detailed spatial locations time durations are required. this paper, we propose adaptive detection approach which reduces...

10.1109/cvpr.2010.5539875 article EN 2010-06-01

Gender recognition from body

OPENALEX - Publications

Liangliang Cao Mert Dikmen Yun Fu Thomas S. Huang

This paper studies the problem of recognizing gender from full body images. has not been addressed before, partly because variant nature human bodies and clothing that can bring tough difficulties. However, recognition high application potentials, e.g. security surveillance customer statistics collection in restaurants, supermarkets, even building entrances. In this paper, we build a system images, taken frontal or back views. Our contributions are three-fold. First, to handle variety...

10.1145/1459359.1459470 article EN Proceedings of the 30th ACM International Conference on Multimedia 2008-10-26

Action detection in complex scenes with spatial and temporal ambiguities

OPENALEX - Publications

Yuxiao Hu Liangliang Cao Fengjun Lv Shuicheng Yan Yihong Gong and 1 more

In this paper, we investigate the detection of semantic human actions in complex scenes. Unlike conventional action recognition well-controlled environments, scenes suffers from cluttered backgrounds, heavy crowds, occluded bodies, and spatial-temporal boundary ambiguities caused by imperfect tracking. Conventional algorithms are likely to fail with such ambiguities. work, candidate regions an treated as a bag instances. Then novel multiple-instance learning framework, named SMILE-SVM...

10.1109/iccv.2009.5459153 article EN 2009-09-01

Video2GIF: Automatic Generation of Animated GIFs from Video

OPENALEX - Publications

Michael Gygli Yale Song Liangliang Cao

We introduce the novel problem of automatically generating animated GIFs from video. are short looping video with no sound, and a perfect combination between image that really capture our attention. tell story, express emotion, turn events into humorous moments, new wave photojournalism. pose question: Can we automate entirely manual elaborate process GIF creation by leveraging plethora user generated content? propose Robust Deep RankNet that, given video, generates ranked list its segments...

10.1109/cvpr.2016.114 article EN 2016-06-01

Focal Visual-Text Attention for Visual Question Answering

OPENALEX - Publications

Junwei Liang Lu Jiang Liangliang Cao Li-Jia Li Alexander G. Hauptmann

Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, tackle real-life answering problems multimedia collections such as personal photos, we look at whole sequences of photos or videos. When questions from a large collection, natural problem is identify snippets support the answer. In this paper, describe novel network called Focal Visual-Text Attention (FVTA) for collective reasoning in...

10.1109/cvpr.2018.00642 article EN 2018-06-01

Robust Visual-Textual Sentiment Analysis

OPENALEX - Publications

Quanzeng You Liangliang Cao Hailin Jin Jiebo Luo

Sentiment analysis is crucial for extracting social signals from media content. Due to huge variation in media, the performance of sentiment classifiers using single modality (visual or textual) still lags behind satisfaction. In this paper, we propose a new framework that integrates textual and visual information robust analysis. Different previous work, believe should be treated jointly structural fashion. Our system first builds semantic tree structure based on sentence parsing, aimed at...

10.1145/2964284.2964288 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Detecting Sarcasm in Multimodal Social Platforms

OPENALEX - Publications

Rossano Schifanella Paloma de Juan Joel Tetreault Liangliang Cao

Sarcasm is a peculiar form of sentiment expression, where the surface differs from implied sentiment. The detection sarcasm in social media platforms has been applied past mainly to textual utterances lexical indicators (such as interjections and intensifiers), linguistic markers, contextual information user profiles, or conversations) were used detect sarcastic tone. However, modern allow create multimodal messages audiovisual content integrated with text, making analysis mode isolation...

10.1145/2964284.2964321 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video

OPENALEX - Publications

Hassan Akbari Himani Arora Liangliang Cao Nima Mesgarani

In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of and its corresponding sound generation method resulting in more natural sounding reconstructed speech. Our proposed consists an autoencoder to extract bottleneck features the which is then used target our main reading comprising CNN, LSTM fully connected layers. experiments show that able reconstruct original with...

10.1109/icassp.2018.8461856 article EN 2018-04-01

Ferret: Refer and Ground Anything Anywhere at Any Granularity

OPENALEX - Publications

Haoxuan You Haotian Zhang Zhe Gan Xianzhi Du Bowen Zhang and 4 more

We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify in the LLM paradigm, Ferret employs novel powerful hybrid region representation that integrates discrete coordinates continuous features jointly to represent image. extract versatile regions, we propose spatial-aware visual sampler, adept at handling varying sparsity across...

10.48550/arxiv.2310.07704 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Multiple feature fusion by subspace learning

OPENALEX - Publications

Yun Fu Liangliang Cao Guodong Guo Thomas S. Huang

Since the emergence of extensive multimedia data, feature fusion has been more and important for image video retrieval, indexing annotation. Existing techniques simply concatenate a pair different features or use canonical correlation analysis based methods joint dimensionality reduction in space. However, how to fuse multiple generalized way is still an open problem. In this paper, we reformulate as general subspace learning The objective framework find linear which cumulative pairwise...

10.1145/1386352.1386373 article EN 2008-07-07

The wisdom of social multimedia

OPENALEX - Publications

Xin Jin Andrew Gallagher Liangliang Cao Jiebo Luo Jiawei Han

Social multimedia hosting and sharing websites, such as Flickr, Facebook, Youtube, Picasa, ImageShack Photobucket, are increasingly popular around the globe. A major trend in current studies on social is using media sites a source of huge amount labeled data for solving large scale computer science problems vision, mining multimedia. In this paper, we take new path to explore global trends sentiments that can be drawn by analyzing patterns uploaded downloaded sense, each time an image or...

10.1145/1873951.1874196 article EN Proceedings of the 30th ACM International Conference on Multimedia 2010-10-25

Aworldwide tourism recommendation system based on geotaggedweb photos

OPENALEX - Publications

Liangliang Cao Jiebo Luo Andrew Gallagher Xin Jin Jiawei Han and 1 more

This work aims to build a system suggest tourist destinations based on visual matching and minimal user input. A can provide either photo of the desired scenary or keyword describing place interest, will look into its database for places that share characteristics. To end, we first cluster large-scale geotagged web collection groups by location then find representative images each group. Tourist destination recommendations are produced comparing query against tags under premise "if you like...

10.1109/icassp.2010.5495905 article EN IEEE International Conference on Acoustics Speech and Signal Processing 2010-01-01

Latent Community Topic Analysis

OPENALEX - Publications

Zhijun Yin Liangliang Cao Quanquan Gu Jiawei Han

This article studies the problem of latent community topic analysis in text-associated graphs. With development social media, a lot user-generated content is available with user networks. Along rich information networks, graphs can be extended text associated nodes. Topic modeling classic mining and it interesting to discover topics Different from traditional methods considering links, we incorporate discovery into guarantee topical coherence communities so that users same are closely linked...

10.1145/2337542.2337548 article EN ACM Transactions on Intelligent Systems and Technology 2012-09-01

Coming Soon ...