NFDI4DS | UHH-SEMS - Publication Details

Ali Furkan Biten

ORCID: 0000-0003-2099-5554

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5019081944

Research Areas

Multimodal Machine Learning Applications
Advanced Image and Video Retrieval Techniques
Handwritten Text Recognition Techniques
Domain Adaptation and Few-Shot Learning
Natural Language Processing Techniques
Image Retrieval and Classification Techniques
Topic Modeling
Human Pose and Action Recognition
Video Analysis and Summarization
Islamic Thought and Society Studies
Psychology of Moral and Emotional Judgment
Generative Adversarial Networks and Image Synthesis
Families in Therapy and Culture
Speech Recognition and Synthesis
Values and Moral Education
Image Processing and 3D Reconstruction
Mathematics, Computing, and Information Processing
Social and Intergroup Psychology
Turkish Literature and Culture
Vehicle License Plate Recognition
Cultural Differences and Values
Face Recognition and Perception
Advanced Neural Network Applications
Religion, Spirituality, and Psychology
Evolutionary Psychology and Human Behavior

Computer Vision Center
2019-2023

Universitat Autònoma de Barcelona
2018-2023

Işık University
2020

Istanbul Bilgi University
2017-2020

Fatih University
2020

Bahçeşehir University
2020

Istanbul University
2020

Rogers (United States)
2020

Barcelona Supercomputing Center
2019

Artifex University
2019

Scene Text Visual Question Answering

OPENALEX - Publications

Ali Furkan Biten Rubèn Tito Andrés Mafla Lluís Gómez Marçal Rusiñol and 3 more

Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight importance of exploiting high-level in images as textual cues Visual Question Answering process. We use dataset define series tasks increasing difficulty for which reading scene context provided is necessary reason and generate appropriate answer. propose evaluation metric these account both reasoning...

10.1109/iccv.2019.00439 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

Good News, Everyone! Context Driven Entity-Aware Captioning for News Images

OPENALEX - Publications

Ali Furkan Biten Lluís Gómez Marçal Rusiñol Dìmosthenis Karatzas

Current image captioning systems perform at a merely descriptive level, essentially enumerating the objects in scene and their relations. Humans, on contrary, interpret images by integrating several sources of prior knowledge world. In this work, we aim to take step closer producing captions that offer plausible interpretation scene, such contextual information into pipeline. For focus used illustrate news articles. We propose novel method is able leverage provided text articles associated...

10.1109/cvpr.2019.01275 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

LaTr: Layout-Aware Transformer for Scene-Text VQA

OPENALEX - Publications

Ali Furkan Biten Ron Litman Yusheng Xie Srikar Appalaraju R. Manmatha

We propose a novel multimodal architecture for Scene Text Visual Question Answering (STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to reason over different modalities. Thus, we first investigate the impact each modality, and reveal importance language module, especially when enriched with layout information. Accounting this, single objective pre-training scheme that only text spatial cues. show applying this on scanned documents has certain advantages using...

10.1109/cvpr52688.2022.01605 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoder for Text Recognition and Document Enhancement

OPENALEX - Publications

Mohamed Ali Souibgui Sanket Biswas Andrés Mafla Ali Furkan Biten Alícia Fornés and 4 more

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing transformer-based architecture that incorporates three pretext tasks as learning objectives be optimized during pre-training without the usage of labelled data. Each is specifically tailored for final downstream tasks. conduct several ablation experiments confirm...

10.1609/aaai.v37i2.25328 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

The Bogazici face database: Standardized photographs of Turkish faces with supporting materials

OPENALEX - Publications

S. Adil Sarıbay Ali Furkan Biten Erdem O. Meral Pınar Aldan Vít Třebický and 1 more

Many sets of human facial photographs produced in Western cultures are available for scientific research. We report here on the development a face database Turkish undergraduate student targets. High-resolution standardized were taken and supported by following materials: (a) basic demographic appearance-related information, (b) two types landmark configurations (for Webmorph geometric morphometrics (GM)), (c) width-to-height ratio (fWHR) measurement, (d) information photography parameters,...

10.1371/journal.pone.0192018 article EN cc-by PLoS ONE 2018-02-14

Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning

OPENALEX - Publications

Ali Furkan Biten Lluís Gómez Dìmosthenis Karatzas

Explaining an image with missing or non-existent objects is known as object bias (hallucination) in captioning. This behaviour quite common the state-of-the-art captioning models which not desirable by humans. To decrease hallucination captioning, we propose three simple yet efficient training augmentation method for sentences requires no new data increase model size. By extensive analysis, show that proposed methods can significantly diminish our models’ on metrics. Moreover,...

10.1109/wacv51458.2022.00253 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

An Investigation of Moral Foundations Theory in Turkey Using Different Measures

OPENALEX - Publications

Bilge Yalçındağ Türker Özkan Sevim Cesur Onurcan Yılmaz Beyza Tepe and 3 more

10.1007/s12144-017-9618-4 article EN Current Psychology 2017-06-09

ICDAR 2019 Competition on Scene Text Visual Question Answering

OPENALEX - Publications

Ali Furkan Biten Rubèn Tito Andrés Mafla Lluís Gómez Marçal Rusiñol and 4 more

This paper presents final results of ICDAR 2019 Scene Text Visual Question Answering competition (ST-VQA). ST-VQA introduces an important aspect that is not addressed by any system up to date, namely the incorporation scene text answer questions asked about image. The a new dataset comprising 23,038 images annotated with 31,791 question / pairs where always grounded on instances present in are taken from 7 different public computer vision datasets, covering wide range scenarios. was...

10.1109/icdar.2019.00251 article EN 2019-09-01

Fine-grained Image Classification and Retrieval by Combining Visual and Locally Pooled Textual Features

OPENALEX - Publications

Andrés Mafla Sounak Dey Ali Furkan Biten Lluís Gómez Dìmosthenis Karatzas

Text contained in an image carries high-level semantics that can be exploited to achieve richer understanding. In particular, the mere presence of text provides strong guiding content should employed tackle a diversity computer vision tasks such as retrieval, fine-grained classification, and visual question answering. this paper, we address problem classification retrieval by leveraging textual information along with cues comprehend existing intrinsic relation between two modalities. The...

10.1109/wacv45572.2020.9093373 article EN 2020-03-01

Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval

OPENALEX - Publications

Andrés Mafla Sounak Dey Ali Furkan Biten Lluís Gómez Dìmosthenis Karatzas

Scene text instances found in natural images carry explicit semantic information that can provide important cues to solve a wide array of computer vision problems. In this paper, we focus on leveraging multi-modal content the form visual and textual tackle task fine-grained image classification retrieval. First, obtain from by employing reading system. Then, combine features with salient regions exploit complementary carried two sources. Specifically, employ Graph Convolutional Network...

10.1109/wacv48630.2021.00407 article EN 2021-01-01

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

OPENALEX - Publications

Ali Furkan Biten Andrés Mafla Lluís Gómez Dìmosthenis Karatzas

The task of image-text matching aims to map representations from different modalities into a common joint visual-textual embedding. However, the most widely used datasets for this task, MSCOCO and Flickr30K, are actually image captioning that offer very limited set relation-ships between images sentences in their ground-truth annotations. This ground truth information forces us use evaluation metrics based on binary relevance: given sentence query we consider only one as relevant. many other...

10.1109/wacv51458.2022.00254 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Multimodal grid features and cell pointers for scene text visual question answering

OPENALEX - Publications

Lluís Gómez Ali Furkan Biten Rubèn Tito Andrés Mafla Marçal Rusiñol and 2 more

10.1016/j.patrec.2021.06.026 article EN Pattern Recognition Letters 2021-07-20

People respond with different moral emotions to violations in different relational models: A cross-cultural comparison.

OPENALEX - Publications

Dıane Sunar Sevim Cesur Zeynep Ecem Piyale Beyza Tepe Ali Furkan Biten and 2 more

Consonant with a functional view of moral emotions, we argue that morality is best analyzed within relationships rather than in individuals, and use Fiske's (1992) theory relational models (RMs: communal sharing [CS], authority ranking [AR], equality matching [EM], market pricing [MP]) to predict violations different RMs will arouse intensities other-blaming emotions (anger, contempt disgust) both observers victims, together self-blaming (shame guilt) perpetrators, these patterns emotion...

10.1037/emo0000736 article EN Emotion 2020-03-19

Selective Style Transfer for Text

OPENALEX - Publications

Raúl Gómez Ali Furkan Biten Lluís Gómez Jaume Gibert Dìmosthenis Karatzas and 1 more

This paper explores the possibilities of image style transfer applied to text maintaining original transcriptions. Results on different domains (scene text, machine printed and handwritten text) cross-modal results demonstrate that this is feasible, open research lines. Furthermore, two architectures for selective transfer, which means transferring only desired pixels, are proposed. Finally, scene evaluated as a data augmentation technique expand detection datasets, resulting in boost...

10.1109/icdar.2019.00134 article EN 2019-09-01

One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition

OPENALEX - Publications

Mohamed Ali Souibgui Ali Furkan Biten Sounak Dey Alícia Fornés Yousri Kessentini and 3 more

Low resource Handwritten Text Recognition (HTR) is a hard problem due to the scarce annotated data and very limited linguistic information (dictionaries language models). For example, in case of historical ciphered manuscripts, which are usually written with invented alphabets hide message contents. Thus, this paper we address through generation technique based on Bayesian Program Learning (BPL). Contrary traditional approaches, require huge amount images, our method able generate human-like...

10.1109/wacv51458.2022.00262 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022-01-01

Show, Interpret and Tell: Entity-Aware Contextualised Image Captioning in Wikipedia

OPENALEX - Publications

Khanh Nguyen Ali Furkan Biten Andrés Mafla Lluís Gómez Dìmosthenis Karatzas

Humans exploit prior knowledge to describe images, and are able adapt their explanation specific contextual information given, even the extent of inventing plausible explanations when images do not match. In this work, we propose novel task captioning Wikipedia by integrating knowledge. Specifically, produce models that jointly reason over articles, Wikimedia associated descriptions contextualized captions. The same image can be used illustrate different produced caption needs adapted...

10.1609/aaai.v37i2.25285 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

OCR-IDL: OCR Annotations for Industry Document Library Dataset

OPENALEX - Publications

Ali Furkan Biten Rubèn Tito Lluís Gómez Ernest Valveny Dìmosthenis Karatzas

Pretraining has proven successful in Document Intelligence tasks where deluge of documents are used to pretrain the models only later be finetuned on downstream tasks. One problems pretraining approaches is inconsistent usage data with different OCR engines leading incomparable results between models. In other words, it not obvious whether performance gain coming from diverse amount and distinct or proposed To remedy problem, we make public annotations for IDL using commercial engine given...

10.48550/arxiv.2202.12985 preprint EN cc-by arXiv (Cornell University) 2022-01-01

“Bana göre” Ahlak: Sıradan İnsanın Ahlakı Kavramsallaştırması

OPENALEX - Publications

Sevim Cesur Beyza Tepe Zeynep Ecem Piyale Dıane Sunar Ali Furkan Biten

ÖzetShweder ve diğerleri (1997), Kohlberg'in (1971) ahlakın evrenselliği en önemli erdemin adalet olduğu varsayımlarını reddetmişler farklı kültürlerde derecelerde önemsenen "ahlakın üç temel etiği"ni önererek kültürel çeşitliliği varsaymışlardır.Walker Pitts (1998) ise, bugünkü ahlak araştırmalarının bir eksiğinin sıradan insanın doğal kavramsallaştırmalarının çalışılmaması olduğunu ifade etmektedirler.Bu araştırmanın amacı, toplumumuzda nasıl kavramsallaştırıldığına bu...

10.31828/tpy1301996120200219m000021 article TR Türk Psikoloji Yazıları 2020-06-28

Coming Soon ...