Shuqiang Jiang

ORCID: 0000-0002-1596-4326
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Neural Network Applications
  • Human Pose and Action Recognition
  • Video Surveillance and Tracking Methods
  • Advanced Chemical Sensor Technologies
  • Nutritional Studies and Diet
  • Music and Audio Processing
  • Robotics and Sensor-Based Localization
  • Multimedia Communication and Technology
  • Advanced Vision and Imaging
  • Visual Attention and Saliency Detection
  • Remote-Sensing Image Classification
  • Text and Document Classification Technologies
  • Culinary Culture and Tourism
  • Web Data Mining and Analysis
  • Image and Video Quality Assessment
  • Anomaly Detection Techniques and Applications
  • Handwritten Text Recognition Techniques
  • Spectroscopy and Chemometric Analyses
  • Hand Gesture Recognition Systems
  • Robotic Path Planning Algorithms

Chinese Academy of Sciences
2016-2025

Institute of Computing Technology
2016-2025

University of Chinese Academy of Sciences
2017-2024

Hebei GEO University
2024

Hebei University
2024

Center for Excellence in Brain Science and Intelligence Technology
2014-2023

Hubei University
2023

Peng Cheng Laboratory
2023

Association for Computing Machinery
2020

Universitat Autònoma de Barcelona
2019

Since scenes are composed in part of objects, accurate recognition requires knowledge about both and objects. In this paper we address two related problems: 1) scale induced dataset bias multi-scale convolutional neural network (CNN) architectures, 2) how to combine effectively scene-centric object-centric (i.e. Places ImageNet) CNNs. An earlier attempt, Hybrid-CNN[23], showed that incorporating ImageNet did not help much. Here propose an alternative method taking the into account, resulting...

10.1109/cvpr.2016.68 preprint EN 2016-06-01

Automatically describing the content of an image has been attracting considerable research attention in multimedia field. To represent image, many approaches directly utilize convolutional neural networks (CNNs) to extract visual representations, which are fed into recurrent generate natural language. Recently, some have detected semantic concepts from images and then encoded them high-level representations. Although substantial progress achieved, most previous methods treat entities...

10.1109/tmm.2019.2896516 article EN IEEE Transactions on Multimedia 2019-01-30

Recently, food recognition has received more and attention in image processing computer vision for its great potential applications human health. Most of the existing methods directly extracted deep visual features via convolutional neural networks (CNNs) recognition. Such ignore characteristics images are, thus, hard to achieve optimal performance. In contrast general object recognition, typically do not exhibit distinctive spatial arrangement common semantic patterns. this paper, we...

10.1109/tip.2019.2929447 article EN IEEE Transactions on Image Processing 2019-07-29

Plant disease diagnosis is very critical for agriculture due to its importance increasing crop production. Recent advances in image processing offer us a new way solve this issue via visual plant analysis. However, there are few works area, not mention systematic researches. In paper, we systematically investigate the problem of recognition diagnosis. Compared with other types images, images generally exhibit randomly distributed lesions, diverse symptoms and complex backgrounds, thus hard...

10.1109/tip.2021.3049334 article EN IEEE Transactions on Image Processing 2021-01-01

Food recognition plays an important role in food choice and intake, which is essential to the health well‐being of humans. It thus importance computer vision community, can further support many food-oriented multimodal tasks, e.g., detection segmentation, cross-modal recipe retrieval generation. Unfortunately, we have witnessed remarkable advancements generic visual for released large-scale datasets, yet largely lags domain. In this paper, introduce Food2K, largest dataset with 2,000...

10.1109/tpami.2023.3237871 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-01-18

Not withstanding its great success and wide adoption in Bag-of-visual Words representation, visual vocabulary created from single image local features is often shown to be ineffective largely due three reasons. First, many detected are not stable enough, resulting noisy non-descriptive words images. Second, word discards the rich spatial contextual information among features, which has been proven valuable for matching. Third, distance metric commonly used generating does take semantic...

10.1145/1873951.1874018 article EN Proceedings of the 30th ACM International Conference on Multimedia 2010-10-25

A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due unhealthy eating patterns, such as increased intake food with high energy fat. Food recommendation paramount importance alleviate this problem. Unfortunately, modern multimedia research has enhanced performance experience in many fields movies POI, yet largely lags domain. This article proposes a unified framework for...

10.1109/tmm.2019.2958761 article EN IEEE Transactions on Multimedia 2019-12-09

Cuisine is a style of cooking and usually associated with specific geographic region. Recipes from different cuisines shared on the web are an indicator culinary cultures in countries. Therefore, analysis these recipes can lead to deep understanding food cultural perspective. In this paper, we perform first cross-region recipe by jointly using ingredients, images, attributes such as cuisine course (e.g., main dish dessert). For that solution, propose culture framework discover topics...

10.1109/tmm.2017.2759499 article EN IEEE Transactions on Multimedia 2017-10-05

Food recognition has received more and attention in the multimedia community for its various real-world applications, such as diet management self-service restaurants. A large-scale ontology of food images is urgently needed developing advanced algorithms, well providing benchmark dataset algorithms. To encourage further progress recognition, we introduce ISIA Food-500 with 500 categories from list Wikipedia 399,726 images, a comprehensive that surpasses existing popular datasets by category...

10.1145/3394171.3414031 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

This paper considers the problem of recipe-oriented image-ingredient correlation learning with multi-attributes for recipe retrieval and exploration. Existing methods mainly focus on food visual information recognition while we model information, textual content (e.g., ingredients), attributes cuisine course) together to solve extended problems, such as multimodal classification attribute-enhanced image retrieval. As a solution, propose multitask deep belief network ( <inline-formula...

10.1109/tmm.2016.2639382 article EN IEEE Transactions on Multimedia 2016-12-14

Recently, food recognition is gaining more attention in the multimedia community due to its various applications, e.g., multimodal foodlog and personalized healthcare. Most of existing methods directly extract visual features whole image using popular deep networks for without considering own characteristics. Compared with other types object images, images generally do not exhibit distinctive spatial arrangement common semantic patterns, thus are very hard capture discriminative information....

10.1145/3343031.3350948 article EN Proceedings of the 30th ACM International Conference on Multimedia 2019-10-15

The goal of few-shot image recognition is to distinguish different categories with only one or a few training samples. Previous works learning mainly work on general object images. And current solutions usually learn global representation from tasks adapt novel tasks. However, fine-gained are distinguished by subtle and local parts, which could not be captured representations effectively. This may hinder existing approaches dealing well. In this work, we propose multi-attention meta-learning...

10.24963/ijcai.2020/152 article EN 2020-07-01

Logo detection has been gaining considerable attention because of its wide range applications in the multimedia field, such as copyright infringement detection, brand visibility monitoring, and product management on social media. In this article, we introduce LogoDet-3K, largest logo dataset with full annotation, which 3,000 categories, about 200,000 manually annotated objects, 158,652 images. LogoDet-3K creates a more challenging benchmark for higher comprehensive coverage wider variety...

10.1145/3466780 article EN ACM Transactions on Multimedia Computing Communications and Applications 2022-01-27

Recently, visual food analysis has received more and attention in the computer vision community due to its wide application scenarios, e.g., diet nutrition management, smart restaurant, personalized recommendation. Considering that images are unstructured with complex unfixed patterns, mining food-related semantic-aware regions is crucial. Furthermore, ingredients contained semantically related each other cooking habits have significant semantic relationships categories under hierarchical...

10.1109/tip.2024.3374211 article EN IEEE Transactions on Image Processing 2024-01-01

Food-image recognition plays a pivotal role in intelligent nutrition management, and lightweight methods based on deep learning are crucial for enabling mobile deployment. This capability empowers individuals to effectively manage their daily diet using devices such as smartphones. In this study, we propose an Efficient Hybrid Food Recognition Net (EHFR–Net), novel neural network that integrates Convolutional Neural Networks (CNN) Vision Transformer (ViT). We find the context of food-image...

10.3390/nu16020200 article EN Nutrients 2024-01-08

Most of existing approaches on event detection in sports video are general audience oriented. The extracted events then presented to the without further analysis. However, professionals, such as soccer coaches, more interested tactics used events. In this paper, we present a novel approach extract tactic information from goal broadcast and mode coaches professionals. We first with far-view shots based analysis alignment web-casting text video. For detected event, employ multi-object tracking...

10.1145/1291233.1291250 article EN Proceedings of the 30th ACM International Conference on Multimedia 2007-09-29

In modern times, music video (MV) has become an important favorite pastime to people because of its conciseness, convenience, and the ability bring both audio visual experiences audiences. As amount MVs is explosively increasing, it task develop new techniques for effective MV analysis, retrieval, management. By stimulating human affective response mechanism, content analysis extracts information contained in videos, and, with information, natural, user-friendly, access strategies could be...

10.1109/tmm.2010.2059634 article EN IEEE Transactions on Multimedia 2010-09-15

Most existing approaches on sports video analysis have concentrated semantic event detection. Sports professionals, however, are more interested in tactic to help improve their performance. In this paper, we propose a novel approach extract information from the attack events broadcast soccer and present mode coaches professionals. We with far-view shots using alignment of web-casting text video. For detected event, two representations, aggregate trajectory play region sequence, constructed...

10.1109/tmm.2008.2008918 article EN IEEE Transactions on Multimedia 2008-12-18

Food-related photos have become increasingly popular , due to social networks, food recommendations, and dietary assessment systems. Reliable annotation is essential in those systems, but unconstrained automatic recognition still not accurate enough. Most works focus on exploiting only the visual content while ignoring context. To address this limitation, paper we explore leveraging geolocation external information about restaurants simplify classification problem. We propose a framework...

10.1109/tmm.2015.2438717 article EN IEEE Transactions on Multimedia 2015-06-03

Over the last several decades, researches on visual object retrieval and recognition have achieved fast remarkable success. However, while category-level tasks prevail in community, instance-level (especially recognition) not yet received adequate focuses. Applications such as content-based search engine robot vision systems alerted awareness to bring into a more realistic challenging scenario. Motivated by limited scope of existing datasets, this article we propose new benchmark for...

10.1145/2700292 article EN ACM Transactions on Multimedia Computing Communications and Applications 2015-02-05
Coming Soon ...