Bosheng Qin

ORCID: 0000-0003-1978-9999
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Generative Adversarial Networks and Image Synthesis
  • Video Analysis and Summarization
  • Face recognition and analysis
  • Advanced Image Processing Techniques
  • AI in cancer detection
  • Advanced Neural Network Applications
  • Data Visualization and Analytics
  • Video Surveillance and Tracking Methods
  • Image and Signal Denoising Methods
  • Image Enhancement Techniques
  • Infection Control and Ventilation
  • Down syndrome and intellectual disability research
  • Human Motion and Animation
  • Sparse and Compressive Sensing Techniques
  • Visual Attention and Saliency Detection
  • Cinema and Media Studies
  • Face and Expression Recognition
  • Advanced Data Compression Techniques
  • Natural Language Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Psychedelics and Drug Studies
  • Image and Video Quality Assessment
  • Domain Adaptation and Few-Shot Learning

Zhejiang University of Science and Technology
2022-2025

Zhejiang University
2020-2024

The rapid worldwide spread of Coronavirus Disease 2019 (COVID-19) has resulted in a global pandemic. Correct facemask wearing is valuable for infectious disease control, but the effectiveness facemasks been diminished, mostly due to improper wearing. However, there have not any published reports on automatic identification facemask-wearing conditions. In this study, we develop new condition method by combining image super-resolution and classification networks (SRCNet), which quantifies...

10.3390/s20185236 article EN cc-by Sensors 2020-09-14

Abstract Rapid worldwide spread of Coronavirus Disease 2019 (COVID 19) has resulted in a global pandemic. Correct facemask wearing is valuable infectious disease control, but the effectiveness facemasks been diminished mostly due to improper wearing. However, there have not any published reports on automatic identification conditions. In this study, we developed new condition method combination with image super resolution classification network (SRCNet) SRCNet), which quantified three...

10.21203/rs.3.rs-28668/v1 preprint EN cc-by Research Square (Research Square) 2020-05-14

Down syndrome is one of the most common genetic disorders. The distinctive facial features provide an opportunity for automatic identification. Recent studies showed that recognition technologies have capability to identify However, there a paucity on identification with technologies, especially using deep convolutional neural networks. Here, we developed method utilizing images and networks, which quantified binary classification problem distinguishing subjects from healthy based...

10.3390/diagnostics10070487 article EN cc-by Diagnostics 2020-07-17

The attention-based networks have become prevailing recently in visual question answering (VQA) due to their high performances. However, the extensive memory consumption of models poses excessive-high demand for implementation equipment, raising concerns about future application scenarios. Therefore, designing an efficient and lightweight VQA model is central expanding possible areas. Our work presents a novel model, namely residual weight-sharing attention network (RWSAN), consisting (RWSA)...

10.1109/tmm.2022.3173131 article EN IEEE Transactions on Multimedia 2022-05-06

Many studies have aimed to improve Transformer model efficiency using low-rank-based methods that compress sequence length with predetermined or learned compression matrices. However, these fix coefficients for tokens in the same position during inference, ignoring sequence-specific variations. They also overlook impact of hidden state dimensions on gains. To address limitations, we propose dynamic bilinear low-rank attention (DBA), an efficient and effective mechanism compresses...

10.1109/tnnls.2025.3527046 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

The rising demand for creating lifelike avatars in the digital realm has led to an increased need generating high-quality human videos guided by textual descriptions and poses. We propose Dancing Avatar, designed fabricate motion driven poses cues. Our approach employs a pretrained T2I diffusion model generate each video frame autoregressive fashion. crux of innovation lies our adept utilization producing frames successively while preserving contextual relevance. surmount hurdles posed...

10.48550/arxiv.2308.07749 preprint EN other-oa arXiv (Cornell University) 2023-01-01

We present an end-to-end diffusion-based method for editing videos with human language instructions, namely $\textbf{InstructVid2Vid}$. Our approach enables the of input based on natural instructions without any per-example fine-tuning or inversion. The proposed InstructVid2Vid model combines a pretrained image generation model, Stable Diffusion, conditional 3D U-Net architecture to generate time-dependent sequence video frames. To obtain training data, we incorporate knowledge and expertise...

10.48550/arxiv.2305.12328 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Many genetic diseases are known to have distinctive facial phenotypes, which highly informative provide an opportunity for automated detection. However, the diagnostic performance of artificial intelligence identify with phenotypes requires further investigation. The objectives this systematic review and meta-analysis evaluate accuracy face then find best algorithm.The will be conducted in accordance "Preferred Reporting Items Systematic Reviews Meta-Analyses Protocols" guidelines. following...

10.1097/md.0000000000020989 article EN cc-by-nc Medicine 2020-06-29

10.1109/icme57554.2024.10687529 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2024-07-15

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent data, which could lead to hallucinatory outputs MLLMs, remain under-explored. This work aims investigate (i.e., object, relation, attribute hallucinations) mitigate those toxicities large-scale visual instruction datasets. Drawing human ability identify factual...

10.48550/arxiv.2311.13614 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Many studies have been conducted to improve the efficiency of Transformer from quadric linear. Among them, low-rank-based methods aim learn projection matrices compress sequence length. However, are fixed once they learned, which length with dedicated coefficients for tokens in same position. Adopting such input-invariant projections ignores fact that most informative part a varies sequence, thus failing preserve useful information lies varied positions. In addition, previous efficient...

10.48550/arxiv.2211.16368 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...