- Advanced Vision and Imaging
- Image Processing Techniques and Applications
- Cell Image Analysis Techniques
- Multimodal Machine Learning Applications
- Speech and Audio Processing
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Human Pose and Action Recognition
- Advanced Image Processing Techniques
- Advanced Neural Network Applications
- Image and Signal Denoising Methods
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
- Medical Image Segmentation Techniques
- Music and Audio Processing
- Computer Graphics and Visualization Techniques
- Advanced Data Compression Techniques
- Digital Imaging for Blood Diseases
- AI in cancer detection
- Image Enhancement Techniques
- Retinal Imaging and Analysis
- Advanced Fluorescence Microscopy Techniques
- Hand Gesture Recognition Systems
- Face recognition and analysis
The University of Sydney
2018-2025
East China Jiaotong University
2024
Jingchu University of Technology
2020-2024
University of Electronic Science and Technology of China
2024
Shanghai Electric (China)
2024
Henan University of Technology
2024
Ningxia University
2023
Tianjin University
2023
Texas Instruments (United States)
2023
Inner Mongolia University
2022
Face aging simulation has received rising investigations nowadays, whereas it still remains a challenge to generate convincing and natural age-progressed face images. In this paper, we present novel approach such an issue by using hidden factor analysis joint sparse representation. contrast the majority of tasks in literature that handle facial texture integrally, proposed separately models person-specific properties tend be stable relatively long period age-specific clues change gradually...
Three-dimensional (3D) volumetric neural image segmentation is crucial to reconstructing accurate neuron structures. However, due the structural complexity of neurons and diverse imaging qualities microscopes, it challenging achieve both accuracy efficiency. In this paper, we propose a teacher-student learning framework for fast segmentation. The inference performed using light-weighted student network which benefits from knowledge distillation teacher with higher capacity. Evaluated on...
State-of-the-art neural network models estimate large displacement optical flow in multi-resolution and use warping to propagate the estimation between two resolutions. Despite their impressive results, it is known that there are problems with approach. First, of fails situations where small objects move fast. Second, creates artifacts when occlusion or dis-occlusion happens. In this paper, we propose a new module, Deformable Cost Volume, which alleviates problems. Based on designed Volume...
Dense captioning in 3D point clouds is an emerging vision-and-language task involving object-level scene understanding. Apart from coarse semantic class prediction and bounding box regression as traditional object detection, dense aims at producing a further finer instance-level label of natural language description on visual appearance spatial relations for each interest. To detect describe objects scene, following the spirit neural machine translation, we propose transformer-based...
Video-Language Pre-training models have recently significantly improved various multi-modal downstream tasks. Previous dominant works mainly adopt contrastive learning to achieve global feature alignment across modalities. However, the local associations between videos and texts are not modeled, restricting pre-training models' generality, especially for tasks requiring temporal video boundary certain query texts. This work introduces a novel text-video localization pre-text task enable...
We propose PAniC-3D, a system to reconstruct stylized 3D character heads directly from illustrated (p)ortraits of (ani)me (c)haracters. Our anime-style domain poses unique challenges single-view reconstruction; compared natural images human heads, portrait illustrations have hair and accessories with more complex diverse geometry, are shaded non-photorealistic contour lines. In addition, there is lack both model illustration data suitable train evaluate this ambiguous reconstruction task....
Deep convolutional neural network (DCNN) based image codecs, consisting of encoder, quantizer and decoder, have achieved promising compression results. The major challenge in learning these DCNN models lies the joint optimization as well adaptivity to input images. In this paper, we proposed a architecture for compression, where decoder are jointly learned. Specifically, fully vector quantization (VQNet) has been quantize feature vectors representation, representative VQNet optimized with...
Abstract This study establishes a two-dimensional fluid theoretical model for two-electrode spark gap switch, linking the behavior of microscopic particles with macroscopic discharge phase through multiscale dynamic coupling. It further investigates temporal characteristics switch's conductive process and streamer evolution from perspective particles. Using finite element analysis method, effects factors such as gas pressure, operating voltage, electrode spacing, curvature on time key...
Digital neuron morphology reconstruction from three-dimensional (3D) volumetric optical microscope images is an important procedure to rebuild the connections and structures of neural circuits. Even though many approaches have been proposed achieve precise tracing, it still a challenging task especially when are polluted by noise or discontinuity in their structures. In this paper, we propose new framework overcome these issues performing segmentation prior tracing. Our adopts novel 3D...
The microscopic understanding of high-temperature superconductivity in cuprates has been hindered by the apparent complexity crystal structures these materials. We used scanning tunneling microscopy and spectroscopy to study an electron-doped copper oxide compound Sr$_{1-x}$Nd$_x$CuO$_2$ that only bare cations separating CuO$_2$ planes thus simplest infinite-layer structure among all cuprate superconductors. Tunneling conductance spectra major superconducting state revealed direct evidence...
Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due limitations in accounting for fine-grained details. Although GPT-4V has shown promising results various multi-modal tasks, leveraging as a generalist evaluator these not yet been systematically explored. We comprehensively validate GPT-4V's capabilities evaluation purposes, addressing ranging from foundational image-to-text and text-to-image synthesis high-level...
Digital neuron reconstruction from 3D microscopy images is an essential technique for investigating brain connectomics and morphology. Existing frameworks use convolution-based segmentation networks to partition the noisy backgrounds before applying tracing algorithm. The results are sensitive raw image quality accuracy. In this paper, we propose a novel framework reconstruction. Our key idea geometric representation power of point cloud better explore intrinsic structural information...
Vision-language models such as CLIP [27] learn a generic text-image embedding from large-scale training data. A vision-language model can be adapted to new classification task through few-shot prompt tuning. We find that tuning process is highly robust label noises. This intrigues us study the key reasons contributing robustness of paradigm. conducted extensive experiments explore this property and factors are: 1) fixed classname tokens provide strong regularization optimization model,...
The automatic reconstruction of single neuron cells from microscopic images is essential to establishing the research on morphology. However, performance algorithms constrained by both quantity and quality annotated 3D since annotating large-scale models highly labour intensive. We propose a framework for synthesizing microscopy-realistic simulated skeletons using conditional Generative Adversarial Networks (cGAN). build generator network with multi-resolution sub-modules improve output...
Recent advances in large video-language models have displayed promising outcomes video comprehension. Current approaches straightforwardly convert into language tokens and employ for multi-modal tasks. However, this method often leads to the generation of irrelevant content, commonly known as "hallucination", length text increases impact diminishes. To address problem, we propose Vista-LLaMA, a novel framework that maintains consistent distance between all visual any tokens, irrespective...
Abstract The goal of crowd-counting techniques is to estimate the number people in an image or video real-time and accurately. In recent years, with development deep learning, accuracy task has been improving. However, this still faces great challenges crowded scenarios large individual size variations. To cope situation, paper proposes a new type network: Context-Scaled Fusion Network. details include (1) design Multi-Scale Receptive Field Module (MRFF Module), which employs multiple...
Terahertz (THz) technology has become a new trend in various fields due to its high penetration and harmlessness towards human body objects. The object detection of concealed hidden objects based on THz images is great significance for ensuring public safety. However, the poor quality original leads insufficient accuracy target detection. Therefore, it necessary preprocess before performing In this work, order investigate impact different pre-processing methods using images, we adopt two...
This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback building datasets, we devise scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expanded, then used to create diptychs featuring input output images detailed text prompts, followed by...
3D reconstruction of neuronal morphology is crucial to solving neuron-related problems in neuroscience as it a key technique for investigating the connectivity and functionality neuron system. Many methods have been proposed improve accuracy digital reconstruction. However, large amount computer memory computation time they require process large-scale images posed new challenge us. To solve this problem, we introduce novel Memory (and Time) Efficient Image Tracing (MEIT) framework. Evaluated...
To propose a gesture model updating and results forecasting algorithm based on Mean Shift, to solve the problem of target changing influenced tracking in process. Firstly, background difference skin color detection methods are used detect get model, Shift is track update finally use Kalman predict results. The experimental show that this reduces influence surrounding environment process, better result.