Jingjing Chen

ORCID: 0000-0003-3148-264X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Adversarial Robustness in Machine Learning
  • Human Pose and Action Recognition
  • Generative Adversarial Networks and Image Synthesis
  • Anomaly Detection Techniques and Applications
  • Image Retrieval and Classification Techniques
  • Advanced Chemical Sensor Technologies
  • Video Analysis and Summarization
  • Digital Media Forensic Detection
  • Advanced Neural Network Applications
  • Visual Attention and Saliency Detection
  • Nutritional Studies and Diet
  • COVID-19 diagnosis using AI
  • Biochemical Analysis and Sensing Techniques
  • Retinal Imaging and Analysis
  • Topic Modeling
  • Nanopore and Nanochannel Transport Studies
  • Face recognition and analysis
  • Advanced Vision and Imaging
  • Natural Language Processing Techniques
  • Radio Frequency Integrated Circuit Design
  • Image Processing Techniques and Applications
  • Machine Learning and ELM

Tianjin University of Technology
2018-2025

Changzhou No.2 People's Hospital
2025

Nanjing Traditional Chinese Medicine Hospital
2025

Nanjing Medical University
2025

Fudan University
2019-2024

Kunming Medical University
2021-2024

Northeastern University
2023-2024

Chongqing Medical University
2024

Jilin Business and Technology College
2024

Zhengzhou University
2009-2024

Retrieving recipes corresponding to given dish pictures facilitates the estimation of nutrition facts, which is crucial various health relevant applications. The current approaches mostly focus on recognition food category based global appearance without explicit analysis ingredient composition. Such are incapable for retrieval with unknown categories, a problem referred as zero-shot retrieval. On other hand, content-based knowledge categories also difficult attain satisfactory performance...

10.1145/2964284.2964315 article EN Proceedings of the 30th ACM International Conference on Multimedia 2016-09-29

In recent years, the abuse of a face swap technique called deepfake has raised enormous public concerns. So far, large number videos (known as "deepfakes") have been crafted and uploaded to internet, calling for effective countermeasures. One promising countermeasure against deepfakes is detection. Several datasets released support training testing detectors, such DeepfakeDetection [1] FaceForensics++ [23]. While this greatly advanced detection, most real in these are filmed with few...

10.1145/3394171.3413769 article EN Proceedings of the 30th ACM International Conference on Multimedia 2020-10-12

The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. particular, introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches sizes local inconsistencies in images spatial levels. M2TR further learns forgery frequency domain complement RGB information through carefully designed cross...

10.1145/3512527.3531415 preprint EN 2022-06-23

Deep neural networks (DNNs) are vulnerable to backdoor attacks which can hide triggers in DNNs by poisoning training data. A backdoored model behaves normally on clean test images, yet consistently predicts a particular target class for any examples that contain the trigger pattern. As such, hard detect, and have raised severe security concerns real-world applications. Thus far, research has mostly been conducted image domain with classification models. In this paper, we show existing far...

10.1109/cvpr42600.2020.01445 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a fully-labeled source different unlabeled target domain. Most existing UDA methods learn domain-invariant feature representations by minimizing distances across domains. In this work, we build upon contrastive self-supervised learning align features so as reduce the discrepancy between training and testing sets. Exploring same set of categories shared both domains, introduce simple yet effective framework CDCL, for...

10.1109/tmm.2022.3146744 article EN IEEE Transactions on Multimedia 2022-01-27

Real-world data typically follow a long-tailed distribution, where few majority categories occupy most of the while minority contain limited number samples. Classification models minimizing crossentropy struggle to represent and classify tail classes. Although problem learning unbiased classifiers has been well studied, methods for representing imbalanced are under-explored. In this paper, we focus on representation data. Recently, supervised contrastive shown promising performance balanced...

10.1109/cvpr52688.2022.00678 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives research tampering detection. In this paper, we propose ObjectFormer detect and localize manipulations. To capture subtle manipulation traces that are no longer visible RGB domain, extract high-frequency features images combine them with as multimodal patch embeddings. Additionally, use a set learnable object prototypes mid-level representations model object-level...

10.1109/cvpr52688.2022.00240 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Semi-supervised action recognition is a challenging but critical task due to the high cost of video annotations. Existing approaches mainly use convolutional neural networks, yet current revolutionary vision transformer models have been less explored. In this paper, we investigate under SSL setting for recognition. To end, introduce SVFormer, which adopts steady pseudo-labeling framework (i.e., EMA-Teacher) cope with unlabeled samples. While wide range data augmentations shown effective...

10.1109/cvpr52729.2023.01804 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Fusing LiDAR and camera information is essential for accurate reliable 3D object detection in autonomous driving systems. This challenging due to the difficulty of combining multi-granularity geometric semantic features from two drastically different modalities. Recent approaches aim at exploring densities through lifting points 2D images (referred as "seeds") into space, then incorporate semantics via cross-modal interaction or fusion techniques. However, depth under-investigated these when...

10.1109/cvpr52729.2023.02073 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

We introduce a novel visual question answering (VQA) task in the context of autonomous driving, aiming to answer natural language questions based on street-view clues. Compared traditional VQA tasks, driving scenario presents more challenges. Firstly, raw data are multi-modal, including images and point clouds captured by camera LiDAR, respectively. Secondly, multi-frame due continuous, real-time acquisition. Thirdly, outdoor scenes exhibit both moving foreground static background. Existing...

10.1609/aaai.v38i5.28253 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Near infrared light photodiodes have been attracting increasing research interest due to their wide application in various fields. In this study, the fabrication of a new n ‐type GaAs nanocone (GaAsNCs) array/monolayer graphene (MLG) Schottky junction is reported for NIR detection. The photodetector (NIRPD) shows obvious rectifying behavior with turn‐on voltage 0.6 V. Further device analysis reveals that photovoltaic NIRPDs are highly sensitive 850 nm illumination, fast response speed and...

10.1002/adfm.201303368 article EN Advanced Functional Materials 2014-01-21

Representing procedure text such as recipe for crossmodal retrieval is inherently a difficult problem, not mentioning to generate image from visualization. This paper studies new version of GAN, named Recipe Retrieval Generative Adversarial Network (R2GAN), explore the feasibility generating problem. The motivation using GAN twofold: learning compatible cross-modal features in an adversarial way, and explanation search results by showing images generated recipes. novelty R2GAN comes...

10.1109/cvpr.2019.01174 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Carbon nanotube (CNT)-based flexible sensors have been intensively developed for physical sensing. However, great challenges remain in fabricating stretchable CNT films with high electrochemical performance real-time chemical sensing, due to large sheet resistance of film and further increase caused by separation between each during stretching. Herein, we develop a facile versatile strategy construct single-walled carbon nanotubes (SWNTs)-based transparent sensors, coating binding SWNT...

10.1021/acs.analchem.6b04616 article EN Analytical Chemistry 2016-12-28

This paper proposes a Hyperbolic Visual Embedding Learning Network for zero-shot recognition. The network learns image embeddings in hyperbolic space, which is capable of preserving the hierarchical structure semantic classes low dimensions. Comparing with existing zeroshot learning approaches, more robust because embedding feature space better represents class hierarchy and thereby avoid misleading resulted from unrelated siblings. Our outperforms exiting baselines under evaluation an...

10.1109/cvpr42600.2020.00929 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Food recognition has captured numerous research attention for its importance health-related applications. The existing approaches mostly focus on the categorization of food according to dish names, while ignoring underlying ingredient composition. In reality, two dishes with same name do not necessarily share exact list ingredients. Therefore, under category are mandatorily equal in nutrition content. Nevertheless, due limited datasets available labels, problem is often overlooked....

10.1109/tip.2020.3045639 article EN IEEE Transactions on Image Processing 2020-12-23

Cross-modal text to video retrieval aims find relevant videos for given queries, which is crucial various real-world applications. The key address this task build the correspondence between and such that related samples from different modalities can be aligned. As (sentence) contains both nouns verbs representing objects as well their interactions, retrieving requires a fine-grained understanding of contents---not only semantic concepts (i.e., objects) but also interactions them....

10.1109/tmm.2021.3090595 article EN IEEE Transactions on Multimedia 2021-06-23

Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. In this paper, we posit that attacks should be specially tailored for their architecture, jointly considering both patches and self-attention, in order to achieve high transferability. More specifically, introduce dual attack framework, which contains Pay No Attention (PNA) PatchOut attack, improve the transferability samples across...

10.1609/aaai.v36i3.20169 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Food is rich of visible (e.g., colour, shape) and procedural cutting, cooking) attributes. Proper leveraging these attributes, particularly the interplay among ingredients, cutting cooking methods, for health-related applications has not been previously explored. This paper investigates cross-modal retrieval recipes, specifically to retrieve a text-based recipe given food picture as query. As similar ingredient composition can end up with wildly different dishes depending on procedures,...

10.1145/3123266.3123428 article EN Proceedings of the 30th ACM International Conference on Multimedia 2017-10-20

We study the problem of attacking video recognition models in black-box setting, where model information is unknown and adversary can only make queries to detect predicted top-1 class its probability. Compared with attack on images, videos more challenging as computation cost for searching adversarial perturbations a much higher due high dimensionality. To overcome this challenge, we propose heuristic that generates selected frames regions. More specifically, heuristic-based algorithm...

10.1609/aaai.v34i07.6918 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Recognizing ingredients for a given dish image is at the core of automatic dietary assessment, attracting increasing attention from both industry and academia. Nevertheless, task challenging due to difficulty collecting labeling sufficient training data. On one hand, there are hundred thousands food in world, ranging common rare. Collecting samples all ingredient categories difficult. other as appearances exhibit huge visual variance during preparation, it requires collect under different...

10.1609/aaai.v34i07.6626 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Video moment retrieval aims to localize the most relevant video given text query. Weakly supervised approaches leverage video-text pairs only for training, without temporal annotations. Most current methods align proposed and in a joint embedding space. However, lack of annotations, semantic gap between these two modalities makes it predominant learn feature representation methods, with less emphasis on learning visual representation. This paper improve supervisions domain, obtaining...

10.1145/3474085.3475278 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17
Coming Soon ...