- Generative Adversarial Networks and Image Synthesis
- AI in cancer detection
- Radiomics and Machine Learning in Medical Imaging
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Neonatal Respiratory Health Research
- Neonatal and fetal brain pathology
- Human Pose and Action Recognition
- Cell Image Analysis Techniques
- Ear Surgery and Otitis Media
- Acute Ischemic Stroke Management
- Medical Imaging and Analysis
- Infant Development and Preterm Care
- Advanced Image and Video Retrieval Techniques
- Cerebrovascular and Carotid Artery Diseases
- Video Analysis and Summarization
- Facial Nerve Paralysis Treatment and Research
- Digital Imaging for Blood Diseases
- Medical Image Segmentation Techniques
- Hand Gesture Recognition Systems
- Cerebral Venous Sinus Thrombosis
- Visual Attention and Saliency Detection
- Speech and Audio Processing
- Mycobacterium research and diagnosis
- Gait Recognition and Analysis
Pennsylvania State University
2020-2024
Institute of Computing Technology
2017-2020
Chinese Academy of Sciences
2017-2020
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., person's face) and condition action class label like smile). The key challenge of the cI2V task lies in simultaneous realistic spatial appearance temporal dynamics corresponding given condition. In this paper, we propose approach for using novel latent flow diffusion models (LFDM) that optical sequence space based on warp image. Compared previous direct-synthesis-based works,...
Abstract Supervised learning methods are commonly applied in medical image analysis. However, the success of these approaches is highly dependent on availability large manually detailed annotated dataset. Thus an automatic refined segmentation whole-slide (WSI) significant to alleviate annotation workload pathologists. But most current ways can only output a rough prediction lesion areas and consume much time each slide. In this paper, we propose fast cancer regions framework v3_DCNN, which...
In this paper, we propose a novel dual-branch Transformation-Synthesis network (TS-Net), for video motion retargeting. Given one subject and driving video, TS-Net can produce new plausible with the appearance of pattern video. consists warp-based transformation branch warp-free synthesis branch. The design dual branches combines strengths deformation-grid-based generation better identity preservation robustness to occlusion in synthesized videos. A mask-aware similarity module is further...
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm using diffusion model classifier guidance latent semantic space for compositional visual tasks. Specifically, train auxiliary classifiers to facilitate non-linear navigation representation any pre-trained generative with space. We demonstrate that such conditional by provably maximizes lower bound log probability during training. To...
Motion transfer of talking-head videos involves generating a new video with the appearance subject and motion pattern driving video. Current methodologies primarily depend on limited number images 2D representations, thereby neglecting to fully utilize multi-view features inherent in In this paper, we propose novel 3D-aware network, Head3D, which exploits information by visually-interpretable 3D canonical head from frames recurrent network. A key component our approach is self-supervised...
Text-conditioned image-to-video generation (TI2V) aims to synthesize a realistic video starting from given image (e.g., woman's photo) and text description "a woman is drinking water."). Existing TI2V frameworks often require costly training on video-text datasets specific model designs for conditioning. In this paper, we propose TI2V-Zero, zero-shot, tuning-free method that empowers pretrained text-to-video (T2V) diffusion be conditioned provided image, enabling without any optimization,...
In an emergency room (ER) setting, stroke triage or screening is a common challenge. A quick CT usually done instead of MRI due to MRI's slow throughput and high cost. Clinical tests are commonly referred during the process, but misdiagnosis rate remains high. We propose novel multimodal deep learning framework, DeepStroke, achieve computer-aided presence assessment by recognizing patterns minor facial muscles incoordination speech inability for patients with suspicion in acute setting. Our...
Conditional image-to-video (cI2V) generation aims to synthesize a new plausible video starting from an image (e.g., person's face) and condition action class label like smile). The key challenge of the cI2V task lies in simultaneous realistic spatial appearance temporal dynamics corresponding given condition. In this paper, we propose approach for using novel latent flow diffusion models (LFDM) that optical sequence space based on warp image. Compared previous direct-synthesis-based works,...
Diffusion probabilistic models have achieved enormous success in the field of image generation and manipulation. In this paper, we explore a novel paradigm using diffusion model classifier guidance latent semantic space for compositional visual tasks. Specifically, train auxiliary classifiers to facilitate non-linear navigation representation any pre-trained generative with space. We demonstrate that such conditional by provably maximizes lower bound log probability during training. To...
Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed mitigate the issue by generating realistic images conditioned on class labels. However, effectiveness these methods heavily depends representation capability trained generative model, cannot guaranteed without sufficient labeled data. To further...
Motion transfer of talking-head videos involves generating a new video with the appearance subject and motion pattern driving video. Current methodologies primarily depend on limited number images 2D representations, thereby neglecting to fully utilize multi-view features inherent in In this paper, we propose novel 3D-aware network, Head3D, which exploits information by visually-interpretable 3D canonical head from frames recurrent network. A key component our approach is self-supervised...
General movement assessment (GMA) of infant videos (IMVs) is an effective method for the early detection cerebral palsy (CP) in infants. Automated body parsing a crucial step towards computer-aided GMA, which parts are segmented and tracked over time analysis. However, acquiring fully annotated data video-based particularly expensive due to large number frames IMVs. In this paper, we propose semi-supervised model, termed SiamParseNet (SPN), jointly learn single frame label propagation...
In this paper, we propose a novel dual-branch Transformation-Synthesis network (TS-Net), for video motion retargeting. Given one subject and driving video, TS-Net can produce new plausible with the appearance of pattern video. consists warp-based transformation branch warp-free synthesis branch. The design dual branches combines strengths deformation-grid-based generation better identity preservation robustness to occlusion in synthesized videos. A mask-aware similarity module is further...
General movement assessment (GMA) of infant videos (IMVs) is an effective method for early detection cerebral palsy (CP) in infants. We demonstrate this paper that end-to-end trainable neural networks image sequence recognition can be applied to achieve good results GMA, and more importantly, augmenting raw video with body parsing pose estimation information significantly improve performance. To solve the problem efficiently utilizing partially labeled IMVs parsing, we propose a...