Enis Simsar

ORCID: 0000-0002-6662-3249
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Multimodal Machine Learning Applications
  • AI in cancer detection
  • Advanced Vision and Imaging
  • Dental Radiography and Imaging
  • Image Retrieval and Classification Techniques
  • Image Processing and 3D Reconstruction
  • Cell Image Analysis Techniques
  • Radiomics and Machine Learning in Medical Imaging
  • Visual Attention and Saliency Detection
  • Advanced Image and Video Retrieval Techniques
  • Optical measurement and interference techniques
  • Medical Imaging and Analysis
  • Advanced Neural Network Applications
  • Image Processing Techniques and Applications
  • Computer Graphics and Visualization Techniques
  • Handwritten Text Recognition Techniques
  • Topic Modeling
  • Video Analysis and Summarization
  • Digital Media Forensic Detection
  • Advanced Steganography and Watermarking Techniques
  • Speech and dialogue systems
  • Advanced Neuroimaging Techniques and Applications
  • COVID-19 diagnosis using AI
  • Medical Imaging Techniques and Applications

ETH Zurich
2023-2024

Technical University of Munich
2021-2023

Istanbul Medipol University
2021

Boğaziçi University
2019-2021

Abstract In this paper, a new powerful deep learning framework, named as DENTECT, is developed in order to instantly detect five different dental treatment approaches and simultaneously number the dentition based on FDI notation panoramic X-ray images. This makes DENTECT first system that focuses identification of multiple treatments; namely periapical lesion therapy, fillings, root canal (RCT), surgical extraction, conventional extraction all which are accurately located within their...

10.1038/s41598-021-90386-1 article EN cc-by Scientific Reports 2021-06-11

Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These enable controllable image generation and support a wide range semantic editing operations, such as zoom or rotation. The discovery often done supervised semi-supervised manner requires manual annotations which limits their use practice. In comparison, unsupervised allows finding subtle are difficult detect priori. this work, we...

10.1109/iccv48922.2021.01400 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

A major challenge in computational research 3D medical imaging is the lack of comprehensive datasets. Addressing this issue, our study introduces CT-RATE, first dataset that pairs images with textual reports. CT-RATE consists 25,692 non-contrast chest CT volumes, expanded to 50,188 through various reconstructions, from 21,304 unique patients, along corresponding radiology text Leveraging we developed CT-CLIP, a CT-focused contrastive language-image pre-training framework. As versatile,...

10.48550/arxiv.2403.17834 preprint EN arXiv (Cornell University) 2024-03-26

In this project, we address the issue of infidelity in text-to-image generation, particularly for actions involving multiple objects. For build on top CONFORM framework which uses Contrastive Learning to improve accuracy generated image However depiction involves different object has still large room improvement. To improve, employ semantically hypergraphic contrastive adjacency learning, a comprehension enhanced structure and "contrast but link" technique. We further amend Stable...

10.48550/arxiv.2501.09055 preprint EN arXiv (Cornell University) 2025-01-15

Existing diffusion models show great potential for identity-preserving generation. However, personalized portrait generation remains challenging due to the diversity in user profiles, including variations appearance and lighting conditions. To address these challenges, we propose IC-Portrait, a novel framework designed accurately encode individual identities Our key insight is that pre-trained are fast learners (e.g.,100 ~ 200 steps) in-context dense correspondence matching, which motivates...

10.48550/arxiv.2501.17159 preprint EN arXiv (Cornell University) 2025-01-28

10.1109/wacv61041.2025.00032 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

<title>Abstract</title> While computer vision has achieved tremendous success with multimodal encoding and direct textual interaction images via chat-based large language models, similar advancements in medical imaging AI—particularly 3D imaging—have been limited due to the scarcity of comprehensive datasets. To address this critical gap, we introduce CT-RATE, first dataset that pairs corresponding reports. CT-RATE comprises 25,692 non-contrast chest CT scans from 21,304 unique patients....

10.21203/rs.3.rs-5271327/v1 preprint EN cc-by Research Square (Research Square) 2024-10-28

3D GANs have the ability to generate latent codes for entire volumes rather than only 2D images. These models offer desirable features like high-quality geometry and multi-view consistency, but, unlike their counterparts, complex semantic image editing tasks been partially explored. To address this problem, we propose LatentSwap3D, a edit approach based on space discovery that can be used with any off-the-shelf or GAN model dataset. LatentSwap3D relies identifying code dimensions...

10.1109/iccvw60793.2023.00312 article EN 2023-10-02

Panoramic X-rays are frequently used in dentistry for treatment planning, but their interpretation can be both time-consuming and prone to error. Artificial intelligence (AI) has the potential aid analysis of these X-rays, thereby improving accuracy dental diagnoses plans. Nevertheless, designing automated algorithms this purpose poses significant challenges, mainly due scarcity annotated data variations anatomical structure. To address issues, Dental Enumeration Diagnosis on Challenge...

10.48550/arxiv.2305.19112 preprint EN cc-by arXiv (Cornell University) 2023-01-01

10.1109/cvpr52733.2024.00860 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

The discovery of interpretable directions in the latent spaces pre-trained GAN models has recently become a popular topic. In particular, StyleGAN2 enabled various image generation and manipulation tasks due to its rich disentangled spaces. However, such is typically made either supervised manner, which requires annotated data for each desired manipulation, or an unsupervised manual effort identify directions. As result, existing work finds only handful controllable edits can be made. this...

10.1109/wacv56688.2023.00471 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023-01-01

In this paper, we propose a graph-based image-to-image translation framework for generating images. We use rich data collected from the popular creativity platform Artbreeder <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , where users interpolate multiple GAN-generated images to create artworks. This unique approach of creating new leads tree-like structure one can track historical about creation particular image. Inspired by...

10.1109/iccvw54120.2021.00227 article EN 2021-10-01

Low-Rank Adaptations (LoRAs) have emerged as a powerful and popular technique in the field of image generation, offering highly effective way to adapt refine pre-trained deep learning models for specific tasks without need comprehensive retraining. By employing LoRA models, such those representing cat particular dog, objective is generate an that faithfully embodies both animals defined by LoRAs. However, task seamlessly blending multiple concept LoRAs capture variety concepts one proves be...

10.48550/arxiv.2403.19776 preprint EN arXiv (Cornell University) 2024-03-28

Text-to-image models are becoming increasingly popular, revolutionizing the landscape of digital art creation by enabling highly detailed and creative visual content generation. These have been widely employed across various domains, particularly in generation, where they facilitate a broad spectrum expression democratize access to artistic creation. In this paper, we introduce \texttt{STYLEBREEDER}, comprehensive dataset 6.8M images 1.8M prompts generated 95K users on Artbreeder, platform...

10.48550/arxiv.2406.14599 preprint EN arXiv (Cornell University) 2024-06-20

Evaluating diffusion-based image-editing models is a crucial task in the field of Generative AI. Specifically, it imperative to assess their capacity execute diverse editing tasks while preserving image content and realism. While recent developments generative have opened up previously unheard-of possibilities for editing, conducting thorough evaluation these remains challenging open task. The absence standardized benchmark, primarily due inherent need post-edit reference evaluation, further...

10.48550/arxiv.2410.05710 preprint EN arXiv (Cornell University) 2024-10-08

We propose MegaPortrait. It's an innovative system for creating personalized portrait images in computer vision. It has three modules: Identity Net, Shading and Harmonization Net. Net generates learned identity using a customized model fine-tuned with source images. re-renders portraits extracted representations. fuses pasted faces the reference image's body coherent results. Our approach off-the-shelf Controlnets is better than state-of-the-art AI products preservation image fidelity....

10.48550/arxiv.2411.04357 preprint EN arXiv (Cornell University) 2024-11-06

Recent advances in text-to-image customization have enabled high-fidelity, context-rich generation of personalized images, allowing specific concepts to appear a variety scenarios. However, current methods struggle with combining multiple models, often leading attribute entanglement or requiring separate training preserve concept distinctiveness. We present LoRACLR, novel approach for multi-concept image that merges LoRA each fine-tuned distinct concept, into single, unified model without...

10.48550/arxiv.2412.09622 preprint EN arXiv (Cornell University) 2024-12-12

We propose an unsupervised model for instruction-based image editing that eliminates the need ground-truth edited images during training. Existing supervised methods depend on datasets containing triplets of input image, and edit instruction. These are generated by either existing or human-annotations, which introduce biases limit their generalization ability. Our method addresses these challenges introducing a novel mechanism called Cycle Edit Consistency (CEC), applies forward backward...

10.48550/arxiv.2412.15216 preprint EN arXiv (Cornell University) 2024-12-19

GenerateCT, the first approach to generating 3D medical imaging conditioned on free-form text prompts, incorporates a encoder and three key components: novel causal vision transformer for encoding CT volumes, text-image aligning tokens, text-conditional super-resolution diffusion model. Given absence of directly comparable methods in imaging, we established baselines with cutting-edge demonstrate our method's effectiveness. GenerateCT significantly outperforms these across all metrics....

10.48550/arxiv.2305.16037 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Recent work such as StyleCLIP aims to harness the power of CLIP embeddings for controlled manipulations. Although these models are capable manipulating images based on a text prompt, success manipulation often depends careful selection appropriate desired manipulation. This limitation makes it particularly difficult perform text-based manipulations in do-mains where user lacks expertise, fashion. To address this problem, we propose method automatically determining most successful and...

10.1109/cvprw56347.2022.00255 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

3D GANs have the ability to generate latent codes for entire volumes rather than only 2D images. These models offer desirable features like high-quality geometry and multi-view consistency, but, unlike their counterparts, complex semantic image editing tasks been partially explored. To address this problem, we propose LatentSwap3D, a edit approach based on space discovery that can be used with any off-the-shelf or GAN model dataset. LatentSwap3D relies identifying code dimensions...

10.48550/arxiv.2212.01381 preprint EN cc-by arXiv (Cornell University) 2022-01-01
Coming Soon ...