Ali Furkan Biten

ORCID: 0000-0003-2099-5554
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Handwritten Text Recognition Techniques
  • Domain Adaptation and Few-Shot Learning
  • Natural Language Processing Techniques
  • Image Retrieval and Classification Techniques
  • Topic Modeling
  • Human Pose and Action Recognition
  • Video Analysis and Summarization
  • Islamic Thought and Society Studies
  • Psychology of Moral and Emotional Judgment
  • Generative Adversarial Networks and Image Synthesis
  • Families in Therapy and Culture
  • Speech Recognition and Synthesis
  • Values and Moral Education
  • Image Processing and 3D Reconstruction
  • Mathematics, Computing, and Information Processing
  • Social and Intergroup Psychology
  • Turkish Literature and Culture
  • Vehicle License Plate Recognition
  • Cultural Differences and Values
  • Face Recognition and Perception
  • Advanced Neural Network Applications
  • Religion, Spirituality, and Psychology
  • Evolutionary Psychology and Human Behavior

Computer Vision Center
2019-2023

Universitat Autònoma de Barcelona
2018-2023

Işık University
2020

Istanbul Bilgi University
2017-2020

Fatih University
2020

Bahçeşehir University
2020

Istanbul University
2020

Rogers (United States)
2020

Barcelona Supercomputing Center
2019

Artifex University
2019

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight importance of exploiting high-level in images as textual cues Visual Question Answering process. We use dataset define series tasks increasing difficulty for which reading scene context provided is necessary reason and generate appropriate answer. propose evaluation metric these account both reasoning...

10.1109/iccv.2019.00439 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in scene and their relations. Humans, on contrary, interpret images by integrating several sources of prior knowledge world. In this work, we aim to take step closer producing captions that offer plausible interpretation scene, such contextual information into pipeline. For focus used illustrate news articles. We propose novel method is able leverage provided text articles associated...

10.1109/cvpr.2019.01275 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact each modality, and reveal importance language module, especially when enriched with layout information. Accounting this, single objective pre-training scheme that only text spatial cues. show applying this on scanned documents has certain advantages using...

10.1109/cvpr52688.2022.01605 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing transformer-based architecture that incorporates three pretext tasks as learning objectives be optimized during pre-training without the usage of labelled data. Each is specifically tailored for final downstream tasks. conduct several ablation experiments confirm...

10.1609/aaai.v37i2.25328 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Many sets of human facial photographs produced in Western cultures are available for scientific research. We report here on the development a face database Turkish undergraduate student targets. High-resolution standardized were taken and supported by following materials: (a) basic demographic appearance-related information, (b) two types landmark configurations (for Webmorph geometric morphometrics (GM)), (c) width-to-height ratio (fWHR) measurement, (d) information photography parameters,...

10.1371/journal.pone.0192018 article EN cc-by PLoS ONE 2018-02-14

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in captioning. This behaviour quite common the state-of-the-art captioning models which not desirable by humans. To decrease hallucination captioning, we propose three simple yet efficient training augmentation method for sentences requires no new data increase model size. By extensive analysis, show that proposed methods can significantly diminish our models’ on metrics. Moreover,...

10.1109/wacv51458.2022.00253 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any system up to date, namely the incorporation scene text answer questions asked about image. The a new dataset comprising 23,038 images annotated with 31,791 question / pairs where always grounded on instances present in are taken from 7 different public computer vision datasets, covering wide range scenarios. was...

10.1109/icdar.2019.00251 article EN 2019-09-01

Text contained in an image carries high-level semantics that can be exploited to achieve richer understanding. In particular, the mere presence of text provides strong guiding content should employed tackle a diversity computer vision tasks such as retrieval, fine-grained classification, and visual question answering. this paper, we address problem classification retrieval by leveraging textual information along with cues comprehend existing intrinsic relation between two modalities. The...

10.1109/wacv45572.2020.9093373 article EN 2020-03-01

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content the form visual and textual tackle task fine-grained image classification retrieval. First, obtain from by employing reading system. Then, combine features with salient regions exploit complementary carried two sources. Specifically, employ Graph Convolutional Network...

10.1109/wacv48630.2021.00407 article EN 2021-01-01

The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning that offer very limited set relation-ships between images sentences in their ground-truth annotations. This ground truth information forces us use evaluation metrics based on binary relevance: given sentence query we consider only one as relevant. many other...

10.1109/wacv51458.2022.00254 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Consonant with a functional view of moral emotions, we argue that morality is best analyzed within relationships rather than in individuals, and use Fiske's (1992) theory relational models (RMs: communal sharing [CS], authority ranking [AR], equality matching [EM], market pricing [MP]) to predict violations different RMs will arouse intensities other-blaming emotions (anger, contempt disgust) both observers victims, together self-blaming (shame guilt) perpetrators, these patterns emotion...

10.1037/emo0000736 article EN Emotion 2020-03-19

This paper explores the possibilities of image style transfer applied to text maintaining original transcriptions. Results on different domains (scene text, machine printed and handwritten text) cross-modal results demonstrate that this is feasible, open research lines. Furthermore, two architectures for selective transfer, which means transferring only desired pixels, are proposed. Finally, scene evaluated as a data augmentation technique expand detection datasets, resulting in boost...

10.1109/icdar.2019.00134 article EN 2019-09-01

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and very limited linguistic information (dictionaries language models). For example, in case of historical ciphered manuscripts, which are usually written with invented alphabets hide message contents. Thus, this paper we address through generation technique based on Bayesian Program Learning (BPL). Contrary traditional approaches, require huge amount images, our method able generate human-like...

10.1109/wacv51458.2022.00262 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Humans exploit prior knowledge to describe images, and are able adapt their explanation specific contextual information given, even the extent of inventing plausible explanations when images do not match. In this work, we propose novel task captioning Wikipedia by integrating knowledge. Specifically, produce models that jointly reason over articles, Wikimedia associated descriptions contextualized captions. The same image can be used illustrate different produced caption needs adapted...

10.1609/aaai.v37i2.25285 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later be finetuned on downstream tasks. One problems pretraining approaches is inconsistent usage data with different OCR engines leading incomparable results between models. In other words, it not obvious whether performance gain coming from diverse amount and distinct or proposed To remedy problem, we make public annotations for IDL using commercial engine given...

10.48550/arxiv.2202.12985 preprint EN cc-by arXiv (Cornell University) 2022-01-01

ÖzetShweder ve diğerleri (1997), Kohlberg'in (1971) ahlakın evrenselliği en önemli erdemin adalet olduğu varsayımlarını reddetmişler farklı kültürlerde derecelerde önemsenen "ahlakın üç temel etiği"ni önererek kültürel çeşitliliği varsaymışlardır.Walker Pitts (1998) ise, bugünkü ahlak araştırmalarının bir eksiğinin sıradan insanın doğal kavramsallaştırmalarının çalışılmaması olduğunu ifade etmektedirler.Bu araştırmanın amacı, toplumumuzda nasıl kavramsallaştırıldığına bu...

10.31828/tpy1301996120200219m000021 article TR Türk Psikoloji Yazıları 2020-06-28
Coming Soon ...