Yuzhong Zhao

ORCID: 0000-0002-2425-6786
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Handwritten Text Recognition Techniques
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Image and Video Retrieval Techniques
  • Gene expression and cancer classification
  • Genetic Mapping and Diversity in Plants and Animals
  • Genetic Associations and Epidemiology
  • Vehicle License Plate Recognition
  • Machine Learning in Bioinformatics
  • Natural Language Processing Techniques
  • Bioinformatics and Genomic Networks
  • Advanced Data Storage Technologies
  • Advanced Algorithms and Applications
  • Distributed systems and fault tolerance
  • Biomedical Text Mining and Ontologies
  • Digital Games and Media
  • Metabolomics and Mass Spectrometry Studies
  • Genomics and Phylogenetic Studies
  • Cloud Computing and Resource Management
  • Human Motion and Animation
  • Face recognition and analysis
  • Advanced Research in Science and Engineering

Shanghai Jiao Tong University
2023-2025

University of Chinese Academy of Sciences
2022-2024

China Academy of Space Technology
2024

Antea Group (France)
2023

China Tobacco
2020

University of Science and Technology of China
2006-2019

Anhui Provincial Hospital
2019

Shandong First Medical University
2014

Chinese Academy of Medical Sciences & Peking Union Medical College
2014

University of Waterloo
2010

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) stand as the two most popular foundation models for visual representation learning. While CNNs exhibit remarkable scalability with linear complexity w.r.t. image resolution, ViTs surpass them in fitting capabilities despite contending quadratic complexity. A closer inspection reveals that achieve superior modeling performance through incorporation of global receptive fields dynamic weights. This observation motivates us to...

10.48550/arxiv.2401.10166 preprint EN cc-by-sa arXiv (Cornell University) 2024-01-01

Collecting and annotating images with pixel-wise labels is time-consuming laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable Diffusion). this paper, we show that it possible to automatically obtain accurate semantic masks of generated by the Off-the-shelf Diffusion model, which uses only text-image pairs during training. Our approach, termed DiffuMask, exploits potential cross-attention map between text image, natural seamless extend...

10.1109/iccv51070.2023.00117 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

We have designed and developed OceanBase, a distributed relational database system from the very basics for decade. Being scale-out multi-tenant system, OceanBase is cross-region fault tolerant, which based on shared-nothing architecture. Besides sharing many similar goals with alternative DBMS, such as horizontal scalability, fault-tolerance, etc., our design has been driven by demands of typical RDBMS compatibility well both on-premise off-premise deployments. fulfilled its goal. It...

10.14778/3554821.3554830 article EN Proceedings of the VLDB Endowment 2022-08-01

Neste trabalho, a estabilidade de cinco compostos fenólicos (catecol, protocatecualdeído, ácido salviânico A , protocatecuico e ferulaico) em água alta temperatura foi investigada.Os efeitos dois fatores principais nos experimentos estabilidade, como tempo, foram investigados.A decomposição três ácidos (ácidos A, aumenta com o aumento da os tornam-se menos estáveis maiores tempos aquecimento.Os apresentaram pouca 200 °C completa observada 300-350 °C.Os produtos do protocatechuico ferulaico...

10.5935/0103-5053.20140201 article EN cc-by Journal of the Brazilian Chemical Society 2014-01-01

In the ongoing evolution of OceanBase database system, it is essential to enhance its adaptability small-scale enterprises. The system has demonstrated stability and effectiveness within Ant Group other commercial organizations, besides through TPC-C TPC-H tests. this paper, we have designed a stand-alone distributed integrated architecture named Paetica address overhead caused by components in mode, with respect system. enables adaptive configuration that allows support both serial parallel...

10.14778/3611540.3611560 article EN Proceedings of the VLDB Endowment 2023-08-01

Current deep networks are very data-hungry and benefit from training on largescale datasets, which often time-consuming to collect annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E diffusion models, with minimal effort cost. In this paper, we present DatasetDM, a generic dataset generation model that produce diverse images the corresponding high-quality perception annotations (e.g., segmentation masks, depth). Our method builds upon...

10.48550/arxiv.2308.06160 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Weakly supervised object localization (WSOL) remains challenging when learning models from image category labels. Conventional methods that discriminatively train activation ignore representative yet less discriminative parts. In this study, we propose a generative prompt model (GenPromp), defining the first pipeline to localize parts by formulating WSOL as conditional denoising procedure. During training, GenPromp converts labels learnable embeddings which are fed conditionally recover...

10.1109/iccv51070.2023.00584 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Generally, pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks. In this paper, we present new scene detection network (called FANet) with Fast convergence speed Accurate localization. The proposed FANet is an end-to-end transformer feature learning normalized Fourier descriptor modeling, where the Descriptor Proposal Network Iterative Text Decoding designed to efficiently accurately identify proposals....

10.1109/icme55011.2023.00035 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2023-07-01

<p>Bone age assessment (BAA) is a widely used clinical practice for the biological development of adolescents. The Tanner Whitehouse (TW) method traditionally mainstream that manually extracts multiple regions interest (ROIs) related to skeletal maturity infer bone age. In this paper, we propose deep learning-based fully automatic ROIs localization and BAA. consists two parts: U-net-based backbone, selected its strong performance in semantic segmentation, which enables precise...

10.3934/mbe.2025007 article EN cc-by Mathematical Biosciences & Engineering 2025-01-01

Purpose Liquid water being the major constituent of human body, is fundamental importance in radiobiological research. Hence, knowledge electron‐water interaction physics and particularly secondary electron yield essential. However, to date, only very little known experimentally on low energy with liquid because certain practical limitations. The purpose this study was gain some useful information about emission from using a Monte Carlo (MC) simulation technique that can numerically model...

10.1002/mp.13913 article EN Medical Physics 2019-11-08

Collecting and annotating images with pixel-wise labels is time-consuming laborious. In contrast, synthetic data can be freely available using a generative model (e.g., DALL-E, Stable Diffusion). this paper, we show that it possible to automatically obtain accurate semantic masks of generated by the Off-the-shelf Diffusion model, which uses only text-image pairs during training. Our approach, called DiffuMask, exploits potential cross-attention map between text image, natural seamless extend...

10.48550/arxiv.2303.11681 preprint EN cc-by arXiv (Cornell University) 2023-01-01

The exploding growth of the biomedical literature presents many challenges for biological researchers. One such challenge is from use a great deal abbreviations. Extracting abbreviations and their definitions accurately very helpful to biologists also facilitates text analysis. Existing approaches fall into four broad categories: rule based, machine learning alignment based statistically based. State art methods either focus exclusively on acronym-type abbreviations, or could not recognize...

10.1186/1471-2105-10-14 article EN cc-by BMC Bioinformatics 2009-01-09

Current video text spotting methods can achieve preferable performance, powered with sufficient labeled training data. However, labeling data manually is time-consuming and labor-intensive. To overcome this, using low-cost synthetic a promising alternative. This paper introduces novel synthesis technique called FlowText, which utilizes optical flow estimation to synthesize large amount of at low cost for robust spotters. Unlike existing that focus on image-level synthesis, FlowText...

10.1109/icme55011.2023.00262 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2023-07-01

Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary determination. Relatively few work has been done on this subject. There needs to be systematic investigation algorithms that are (a) robust large datasets; (b) easily extendable (the dynamic) new databases; and (c) approaching limit accuracy. We introduce approaches using k-nearest neighbor algorithm do basic prediction use BCJR smooth predictions combine different shifts...

10.1142/s0219720010004987 article EN Journal of Bioinformatics and Computational Biology 2010-06-22

Image segmentation based on continual learning exhibits a critical drop of performance, mainly due to catastrophic forgetting and background shift, as they are required incorporate new classes continually. In this paper, we propose simple, yet effective Continual Segmentation method with incremental Dynamic Query (CISDQ), which decouples the representation both old knowledge lightweight query embedding. CISDQ includes three contributions: 1) We define <italic...

10.1109/tcsvt.2023.3337884 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-11-30

In this paper, we propose a controllable dense captioner (ControlCap), which accommodates user's intention to captioning by introducing linguistic guidance. ControlCap is defined as multimodal embedding bridging architecture, comprises generation (MEG) module and bi-directional (BEB) module. While MEG represents objects/regions combining embeddings of detailed information with context-aware ones, it also endows the adaptability specialized controls utilizing them BEB aligns guidance visual...

10.48550/arxiv.2401.17910 preprint EN arXiv (Cornell University) 2024-01-31

Air-writing is a challenging task that combines the fields of computer vision and natural language processing, offering an intuitive approach for human-computer interaction. However, current air-writing solutions face two primary challenges: (1) their dependency on complex sensors ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g</i> ., Radar, EEGs others) capturing precise handwritten trajectories, (2) absence video-based dataset covers...

10.1109/tcsvt.2024.3385851 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-04-10

Abstract Motivation: Haplotype played an important role in the association studies of disease gene and drug responsivity over past years, but low throughput expensive biological experiments largely limited its application. Alternatively, some efficient statistical methods were developed to deduce haplotypes from genotypes directly. Because these algorithms usually needed estimate frequencies numerous possible haplotypes, partition ligation strategy was widely adopted reduce time complexity....

10.1093/bioinformatics/btn519 article EN Bioinformatics 2008-10-09

Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular the computer vision community. However, most existing algorithms benchmarks focus on common cases (e.g., normal size, density) single scenarios, while ignoring extreme challenges, i.e., dense small various scenarios. In this competition report, we establish a reading benchmark, DSText, which focuses challenges with Compared previous datasets, proposed dataset mainly include three new...

10.48550/arxiv.2304.04376 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Most existing cross-modal language-to-video retrieval (VR) research focuses on single-modal input from video, i.e., visual representation, while the text is omnipresent in human environments and frequently critical to understand video. To study how retrieve video with both modal inputs, semantic representations, we firstly introduce a largescale Video Retrieval dataset reading comprehension, TextVR, which contains 42.2k sentence queries for 10.5k videos of 8 scenario domains, Street View...

10.2139/ssrn.4419851 preprint EN 2023-01-01

The two-stage scene text detection algorithms based on Mask R-CNN have achieved good performances multiple challenging benchmarks. However, their effectiveness is degraded due to artificially setting constant thresholds and low localization quality of candidate boxes. In this paper, we present a novel method the proposed method, named LOAD, proposes adaptive threshold module estimation address above two problems. We propose kinds which are used for filtering boxes binarization pixels...

10.1109/tcsvt.2023.3274673 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-05-11

Abstract Background Haplotype analysis has gained increasing attention in the context of association studies disease genes and drug responsivities over last years. The potential use haplotypes led to initiation HapMap project which is investigate haplotype patterns human genome different populations. inference frequency estimation are essential components this endeavour. Results We present a two-stage method estimate frequencies pedigrees, includes haplotyping stage stage. In stage, we...

10.1186/1471-2105-7-s4-s5 article EN cc-by BMC Bioinformatics 2006-12-01
Coming Soon ...