Matthias Springstein

ORCID: 0000-0002-6509-8534
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Image Retrieval and Classification Techniques
  • Video Analysis and Summarization
  • Research Data Management Practices
  • Scientific Computing and Data Management
  • Topic Modeling
  • Misinformation and Its Impacts
  • Natural Language Processing Techniques
  • Handwritten Text Recognition Techniques
  • Text and Document Classification Technologies
  • Mathematics, Computing, and Information Processing
  • Generative Adversarial Networks and Image Synthesis
  • Human Pose and Action Recognition
  • Advanced Vision and Imaging
  • Music and Audio Processing
  • Digital Media Forensic Detection
  • Brain Tumor Detection and Classification
  • Aesthetic Perception and Analysis
  • Image Processing Techniques and Applications
  • Web Data Mining and Analysis
  • Academic Publishing and Open Access
  • Face recognition and analysis
  • Biomedical Text Mining and Ontologies
  • Robotics and Sensor-Based Localization

Technische Informationsbibliothek (TIB)
2016-2024

Leibniz University Hannover
2023

L3S Research Center
2023

Abstract The beneficial, complementary nature of visual and textual information to convey is widely known, for example, in entertainment, news, advertisements, science, or education. While the complex interplay image text form semantic meaning has been thoroughly studied linguistics communication sciences several decades, computer vision multimedia research remained on surface problem more less. An exception previous work that introduced two metrics Cross-Modal Mutual Information Semantic...

10.1007/s13735-019-00187-6 article EN cc-by International Journal of Multimedia Information Retrieval 2020-01-22

Gesture as language of non-verbal communication has been theoretically established since the 17th century. However, its relevance for visual arts expressed only sporadically. This may be primarily due to sheer overwhelming amount data that traditionally had processed by hand. With steady progress digitization, though, a growing number historical artifacts have indexed and made available public, creating need automatic retrieval art-historical motifs with similar body constellations or poses....

10.1145/3503161.3548371 article EN Proceedings of the 30th ACM International Conference on Multimedia 2022-10-10

Two modalities are often used to convey information in a complementary and beneficial manner, e.g., online news, videos, educational resources, or scientific publications. The automatic understanding of semantic correlations between text associated images as well their interplay has great potential for enhanced multimodal web search recommender systems. However, is still an unsolved research problem. Recent approaches such image captioning focus on precisely describing visual content...

10.1145/3323873.3325049 preprint EN 2019-06-05

Iconography refers to the methodical study and interpretation of thematic content in visual arts, distinguishing it, e.g., from purely formal or aesthetic considerations. In iconographic studies, Iconclass is a widely used taxonomy that encapsulates historical, biblical, literary themes, among others. However, given hierarchical nature inherent complexity such taxonomy, it highly desirable use automated methods for (Iconclass-based) image classification. Previous studies either focused...

10.1109/wacv57701.2024.00705 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Event classification can add valuable information for semantic search and the increasingly important topic of fact validation in news. So far, only few approaches address image newsworthy event types such as natural disasters, sports events, or elections. Previous work distinguishes between a limited number relies on rather small datasets training. In this paper, we present novel ontology-driven approach images. We leverage large real-world news events to pursue two objectives: First, create...

10.1109/wacv48630.2021.00297 article EN 2021-01-01

The World Wide Web and social media platforms have become popular sources for news information. Typically, multimodal information, e.g., image text is used to convey information more effectively attract attention. While in most cases content decorative or depicts additional it has also been leveraged spread misinformation rumors recent years. In this paper, we present a web-based demo application that automatically quantifies the cross-modal relations of entities~(persons, locations, events)...

10.1145/3404835.3462796 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021-07-11

In this paper, we introduce iART: an open Web platform for art-historical research that facilitates the process of comparative vision. The system integrates various machine learning techniques keyword- and content-based image retrieval as well category formation via clustering. An intuitive GUI supports users to define queries explore results. By using a state-of-the-art cross-modal deep approach, it is possible search concepts were not previously detected by trained classification models....

10.1145/3474085.3478564 preprint EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

The recognition of handwritten mathematical expressions in images and video frames is a difficult unsolved problem yet. Deep convectional neural networks are basically promising approach, but typically require large amount labeled training data. However, such dataset does not exist for the task formula recognition. In this paper, we introduce system that creates set synthesized examples which derived from LaTeX documents. For purpose, propose novel attention-based generative adversarial...

10.1145/3463945.3469059 article EN 2021-08-21

Event classification in images plays a vital role multimedia analysis especially with the prevalence of fake news on social media and Web. The majority approaches for event rely large sets labeled training data. However, image labels fine-grained instances (e.g., 2016 Summer Olympics) can be sparse, incorrect, ambiguous, etc. A few have addressed lack data but cover only events. Moreover, vision-language models that allow zero-shot few-shot prompting not yet been extensively exploited. In...

10.1109/wacv57701.2024.00712 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

In this paper, we present a novel approach to estimate the relative depth of regions in monocular images. There are several contributions. First, task estimation is considered as learning-to-rank problem which offers advantages compared regression approaches. Second, clues human perception modeled systematic manner. Third, show that these can be and integrated appropriately Rankboost framework. For purpose, space-efficient version derived makes it applicable rank large number objects, posed...

10.1109/icme.2017.8019434 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2017-07-01

Deep neural networks have been successfully applied to the task of visual concept classification. However, they require a large number training examples for learning. Although pre-trained deep are available some domains, usually be fine-tuned an envisaged target domain. Recently, approaches suggested that aimed at incrementally (or even endlessly) learning concepts based on Web data. Since tags images often noisy, normally filtering mechanisms employed in order remove ``spam'' not...

10.1145/2911996.2912072 article EN 2016-06-06

Monocular depth estimation is an essential but ill-posed (computer) vision task. While human visual perception of relies on several monocular clues, such as occlusion objects, relative height, usual object size, linear perspective, deep learning models have to implicitly learn these cues from labeled training data determine depth. In this paper, we investigate whether criteria are violated for certain image instances given a model's predictions. We consider the task ranking problem, i.e.,...

10.1109/cvprw59228.2023.00385 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Video analysis platforms that integrate automatic solutions for multimedia and information retrieval enable various applications in many disciplines including film media studies, communication science, education. However, current video either focus on manual annotations or include only a few tools content analysis. In this paper, we present novel web-based platform called TIB AV-Analytics (TIB-AV-A). Unlike previous platforms, TIB-AV-A integrates state-of-the-art approaches the fields of...

10.1145/3539618.3591820 article EN Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2023-07-18

arXiv is a popular pre-print server focusing on natural science disciplines (e.g. physics, computer science, quantitative biology). As platform with focus easy publishing services it does not provide enhanced search functionality -- but offers programming interfaces which allow external parties to add these services. This paper presents extensions of the open source framework Sanity Preserver (SP). With respect original framework, derestricts topical and allows for text-based visualisation...

10.48550/arxiv.1806.06796 preprint EN other-oa arXiv (Cornell University) 2018-01-01
Coming Soon ...