Jia-Hong Huang

ORCID: 0000-0001-7943-2591
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Video Analysis and Summarization
  • Advanced Image and Video Retrieval Techniques
  • Music and Audio Processing
  • Retinal Imaging and Analysis
  • Domain Adaptation and Few-Shot Learning
  • Topic Modeling
  • Digital Imaging for Blood Diseases
  • Image Retrieval and Classification Techniques
  • Retinal and Optic Conditions
  • AI in cancer detection
  • Natural Language Processing Techniques
  • Multimedia Communication and Technology
  • Data Quality and Management
  • Advanced Database Systems and Queries
  • Semantic Web and Ontologies
  • Quantum Computing Algorithms and Architecture
  • Data Management and Algorithms
  • Viral Infections and Outbreaks Research
  • Machine Learning and Algorithms
  • Brain Tumor Detection and Classification
  • Data Stream Mining Techniques
  • Sparse and Compressive Sensing Techniques
  • Cell Image Analysis Techniques
  • Explainable Artificial Intelligence (XAI)

University of Amsterdam
2019-2024

Amsterdam University of the Arts
2020-2024

King Abdullah University of Science and Technology
2017-2019

In this paper, we discuss the initial attempts at boosting understanding human language based on deep-learning models with quantum computing. We successfully train a quantum-enhanced Long Short-Term Memory network to perform parts-of-speech tagging task via numerical simulations. Moreover, Transformer is proposed sentiment analysis existing dataset.

10.1109/icassp43922.2022.9747675 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022-04-27

In this work, we propose an AI-based method that intends to improve the conventional retinal disease treatment procedure and help ophthalmologists increase diagnosis efficiency accuracy. The proposed is composed of a deep neural networks-based (DNN-based) module, including identifier clinical description generator, DNN visual explanation module. To train validate effectiveness our DNN-based large-scale image dataset. Also, as ground truth, provide dataset manually labeled by qualitatively...

10.1109/wacv48630.2021.00249 article EN 2021-01-01

When video collections become huge, how to explore both within and across videos efficiently is challenging. Video summarization one of the ways tackle this issue. Traditional approaches limit effectiveness exploration because they only generate fixed summary for a given input independent information need user. In work, we introduce method which takes text-based query as generates corresponding it. We do so by modeling supervised learning problem propose an end-to-end deep based...

10.1145/3372278.3390695 article EN 2020-06-02

Low-rank adaptation (LoRA) has been demonstrated effective in reducing the trainable parameter number when fine-tuning a large foundation model (LLM). However, it still encounters computational and memory challenges scaling to larger models or addressing more complex task adaptation. In this work, we introduce Sparse Spectrum Adaptation via Discrete Hartley Transformation (SSH), novel approach that significantly reduces of parameters while enhancing performance. It selects most informative...

10.48550/arxiv.2502.05539 preprint EN arXiv (Cornell University) 2025-02-08

Few-shot learning is a nascent research topic, motivated by the fact that traditional deep requires tremendous amounts of data. In this work, we propose new task along direction, call few-shot common-localization. Given few weakly-supervised support images, aim to localize common object in query image without any box annotation. This differs from standard settings, since address localization problem, rather than global classification problem. To tackle network aims get most out and images....

10.1109/iccv.2019.00517 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Traditional video summarization methods generate fixed representations regardless of user interest. Therefore such limit users' expectations in content search and exploration scenarios. Multi-modal is one the utilized to address this problem. When multi-modal used help exploration, a text-based query considered as main drivers summary generation, it user-defined. Thus, encoding both effectively important for task summarization. In work, new method proposed that uses specialized attention...

10.1145/3460426.3463662 article EN 2021-08-24

Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet to medical diagnostics. Conventional image systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies limited recall. These also face challenges, such vocabulary mismatch semantic gap, constraining their...

10.1145/3652583.3658032 preprint EN cc-by 2024-05-30

Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was main focus research but now there is a trend toward assessing robustness these models against adversarial attacks by evaluating tolerance to varying noise levels. In VQA, can target image and/or proposed question and yet lack proper analysis later. this work, we propose flexible framework that focuses on language...

10.1609/aaai.v33i01.33018449 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

Automatic machine learning-based (ML-based) medical report generation systems for retinal images suffer from a relative lack of interpretability. Hence, such ML-based are still not widely accepted. The main reason is that trust one the important motivating aspects interpretability and humans do blindly. Precise technical definitions consensus. it difficult to make human-comprehensible system. Heat maps/saliency maps, i.e., post-hoc explanation approaches, used improve systems. However, they...

10.1109/wacv56688.2023.00190 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

The goal of video summarization is to automatically shorten videos such that it conveys the overall story without losing relevant information. In many application scenarios, improper can have a large impact. For example in forensics, quality generated summary will affect an investigator's judgment while journalism might yield undesired bias. Because this, modeling explainability key concern. One best ways address challenge uncover causal relations steer process and lead result. Current...

10.1109/cvprw59228.2023.00262 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Medical image captioning automatically generates a medical description to describe the content of given image. Traditional models create based on single input only. Hence, an abstract or concept is hard be generated traditional approach. Such method limits effectiveness captioning. Multi-modal one approaches utilized address this problem. In multi-modal captioning, textual input, e.g., expert-defined keywords, considered as main drivers generation. Thus, encoding and effectively are both...

10.1145/3460426.3463667 article EN 2021-08-24

Automatic clinical diagnosis of retinal diseases has emerged as a promising approach to facilitate discovery in areas with limited access specialists. We propose novel visual-assisted hybrid model based on the support vector machine (SVM) and deep neural networks (DNNs). The incorporates complementary strengths DNNs SVM. Furthermore, we present new retina label collection for ophthalmology incorporating 32 classes. Using EyeNet, our achieves 89.73% accuracy performance is comparable...

10.48550/arxiv.1806.06423 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Automatically generating medical reports for retinal images is one of the promising ways to help ophthalmologists reduce their workload and improve work efficiency. In this work, we propose a new context-driven encoding network automatically generate images. The proposed model mainly composed multi-modal input encoder fused-feature decoder. Our experimental results show that our method capable effectively leveraging interactive information between image context, i.e., keywords in case....

10.1109/icip42928.2021.9506803 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2021-08-23

Taking an image and question as the input of our method, it can output text-based answer query about given image, so called Visual Question Answering (VQA). There are two main modules in algorithm. Given a natural language first module takes then outputs basic questions question. The second question, these We formulate generation problem LASSO optimization problem, also propose criterion how to exploit help Our method is evaluated on challenging VQA dataset yields state-of-the-art accuracy,...

10.48550/arxiv.1703.06492 preprint EN cc-by arXiv (Cornell University) 2017-01-01

Visual Question Answering (VQA) models should have both high robustness and accuracy. Unfortunately, most of the current VQA research only focuses on accuracy because there is a lack proper methods to measure models. There are two main modules in our algorithm. Given natural language question about an image, first module takes as input then outputs ranked basic questions, with similarity scores, given question. The second question, image these questions text-based answer image. We claim that...

10.48550/arxiv.1709.04625 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Automatically generating medical reports from retinal images is a difficult task in which an algorithm must generate semantically coherent descriptions for given image. Existing methods mainly rely on the input image to descriptions. However, many abstract concepts or cannot be generated based information only. In this work, we integrate additional help solve task; observe that early diagnosis process, ophthalmologists have usually written down small set of keywords denoting important...

10.1109/wacv51458.2022.00331 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable insights, its cost time-consuming nature pose limitations. Existing automated like BLEU, ROUGE, METEOR, CIDEr aim to bridge this gap but often show weak correlations with judgment. We address challenge by introducing a novel framework rooted in modern large...

10.48550/arxiv.2408.01723 preprint EN arXiv (Cornell University) 2024-08-03

Large language models (LLMs) often produce unsupported or unverifiable information, known as "hallucinations." To mitigate this, retrieval-augmented LLMs incorporate citations, grounding the content in verifiable sources. Despite such developments, manually assessing how well a citation supports associated statement remains major challenge. Previous studies use faithfulness metrics to estimate support automatically but are limited binary classification, overlooking fine-grained practical...

10.48550/arxiv.2406.15264 preprint EN arXiv (Cornell University) 2024-06-21

Recently, video summarization has been proposed as a method to help exploration. However, traditional models only generate fixed summary which is usually independent of user-specific needs and hence limits the effectiveness Multi-modal one approaches utilized address this issue. input text-based query input. Hence, effective modeling interaction between essential multi-modal summarization. In work, new causality-based named Causal Video Summarizer (CVS) effectively capture interactive...

10.1109/icme52920.2022.9859948 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18
Coming Soon ...