- Video Analysis and Summarization
- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Topic Modeling
- Misinformation and Its Impacts
- Handwritten Text Recognition Techniques
- Advanced Vision and Imaging
- Music and Audio Processing
- Natural Language Processing Techniques
- Online Learning and Analytics
- Hate Speech and Cyberbullying Detection
- Semantic Web and Ontologies
- Human Pose and Action Recognition
- Text and Document Classification Technologies
- Scientific Computing and Data Management
- Information Retrieval and Search Behavior
- Recommender Systems and Techniques
- Advanced Text Analysis Techniques
- Sentiment Analysis and Opinion Mining
- Video Coding and Compression Technologies
- Advanced Graph Neural Networks
- Spam and Phishing Detection
- Research Data Management Practices
- Digital Media Forensic Detection
Leibniz University Hannover
2016-2024
Technische Informationsbibliothek (TIB)
2016-2024
L3S Research Center
2017-2024
PRG S&Tech (South Korea)
2019-2022
Carl Zeiss (Germany)
2015
Information Technology University
2014
Philipps University of Marburg
2006-2012
University of Siegen
2004-2007
Abstract The automatic semantic structuring of scientific text allows for more efficient reading research articles and is an important indexing step academic search engines. Sequential sentence classification essential task targets the categorisation sentences based on their content context. However, potential transfer learning across different domains types, such as full papers abstracts, has not yet been explored in prior work. In this paper, we present a systematic analysis sequential...
Text localization and recognition in images is important for searching information digital photo archives, video databases Web sites. However, since text often printed against a complex background, it difficult to detect. In this paper, robust approach presented, which can automatically detect horizontally aligned with different sizes, fonts, colors languages. First, wavelet transform applied the image distribution of high-frequency coefficients considered statistically characterize non-text...
Sports field registration in broadcast videos is typically interpreted as the task of homography estimation, which provides a mapping between planar and corresponding visible area image. In contrast to previous approaches, we consider camera calibration problem. First, introduce differentiable objective function that able learn pose focal length from segment correspondences (e.g., lines, point clouds), based on pixel-level annotations for segments known object. The module iteratively...
In the context of inclusive education, Universal Design for Learning (UDL) is a framework used worldwide to create learning opportunities accessible all learners. While much research focused on design and students' perceptions UDL-based settings, studies usage patterns in UDL-guided elements, particularly digital environments, are still scarce. Therefore, we analyze cluster 9th 10th graders web-based platform called [anonymized project name]. The focuses chemistry learning, UDL principles...
Text localization and recognition in images is important for searching information digital photo archives, video databases Web sites. However, since text often printed against a complex background, it difficult to detect. In this paper, robust approach presented, which can automatically detect horizontally aligned with different sizes, fonts, colors languages. First, wavelet transform applied the image distribution of high-frequency coefficients considered statistically characterize non-text...
The assignment of importance scores to particular frames or (short) segments in a video is crucial for summarization, but also difficult task. Previous work utilizes only one source visual features. In this paper, we suggest novel model architecture that combines three feature sets content and motion predict scores. proposed an attention mechanism before fusing features representing the (static) content, i.e., derived from image classification model. Comprehensive experimental evaluations...
The integration of learning analytics and artificial intelligence methods into education is part the latest developments significantly affects chemistry (research): researchers might face challenge collecting analyzing content-rich data sets involving interdisciplinary approaches from computer science, chemistry, education. Developing a platform offers higher degree freedom compared to using existing Learning Management Systems. This paper presents step-by-step overview how we designed...
Information search has become essential for learning and knowledge acquisition, offering broad access to information resources. The visual complexity of web pages is known influence behavior, with previous work suggesting that searchers make evaluative judgments within the first second on a page. However, there significant gap in our understanding how impacts searches specifically conducted intent. This particularly relevant development optimized retrieval (IR) systems effectively support...
The web has become a crucial source of information, but it is also used to spread disinformation, often conveyed through multiple modalities like images and text. identification inconsistent cross-modal in particular entities such as persons, locations, events, critical detect disinformation. Previous works either identify out-of-context disinformation by assessing the consistency whole document, neglecting relations individual entities, or focus on generic that are not relevant news. So...
Patent figure classification facilitates faceted search in patent retrieval systems, enabling efficient prior art search. Existing approaches have explored for only a single aspect and aspects with limited number of concepts. In recent years, large vision-language models (LVLMs) shown tremendous performance across numerous computer vision downstream tasks, however, they remain unexplored classification. Our work explores the efficacy LVLMs visual question answering (VQA) classification,...
Text detection in images or videos is an important step to achieve multimedia content retrieval. In this paper, efficient algorithm which can automatically detect, localize and extract horizontally aligned text (and digital videos) with complex backgrounds presented. The proposed approach based on the application of a color reduction technique, method for edge detection, localization regions using projection profile analyses geometrical properties. output are boxes simplified background,...
The World Wide Web has become a popular source for gathering information and news. Multimodal information, e.g., enriching text with photos, is typically used to convey the news more effectively or attract attention. photos can be decorative, depict additional details, even contain misleading information. Quantifying cross-modal consistency of entity representations assist human assessors in evaluating overall multimodal message. In some cases such measures might give hints detect fake news,...
Computer-aided support and analysis are becoming increasingly important in the modern world of sports. The scouting potential prospective players, performance as well match analysis, monitoring training programs rely more on data-driven technologies to ensure success. Therefore, many approaches require large amounts data, which are, however, not easy obtain general. In this paper, we propose a pipeline for fully-automated extraction positional data from broadcast video recordings soccer...
The SoccerNet 2022 challenges were the second annual video understanding organized by team. In 2022, composed of 6 vision-based tasks: (1) action spotting, focusing on retrieving timestamps in long untrimmed videos, (2) replay grounding, live moment an shown a replay, (3) pitch localization, detecting line and goal part elements, (4) camera calibration, dedicated to intrinsic extrinsic parameters, (5) player re-identification, same players across multiple views, (6) object tracking, tracking...
With the recent surge in digitalization across all levels of education, online video platforms gained educational relevance. Therefore, optimizing such line with learners' actual needs should be considered a priority for scientists and educators alike. In this project, we triangulate logfiles large German platform videos behavioral data from laboratory study objective characteristics selected videos. We aim to understand potential motives why participants pause while watching online. Our...
Planet-scale photo geolocalization is the complex task of estimating location depicted in an image solely based on its visual content. Due to success convolutional neural networks (CNNs), current approaches achieve superhuman performance. However, previous work has exclusively focused optimizing accuracy. black-box property deep learning systems, their predictions are difficult validate for humans. State-of-the-art methods treat as a classification problem, where choice classes, that...
The video coding standard H.264 supports compression with a higher efficiency than previous standards. However, this comes at the expense of an increased encoding complexity, in particular for motion estimation which becomes very time consuming task even today's central processing units (CPU). On other hand, modern graphics hardware includes powerful unit (GPU) whose computing power remains idle most time. In paper, we present GPU based approach to purpose encoding. A small diamond search is...
Abstract The beneficial, complementary nature of visual and textual information to convey is widely known, for example, in entertainment, news, advertisements, science, or education. While the complex interplay image text form semantic meaning has been thoroughly studied linguistics communication sciences several decades, computer vision multimedia research remained on surface problem more less. An exception previous work that introduced two metrics Cross-Modal Mutual Information Semantic...
In this paper, we suggest a novel method to help learners find relevant open educational videos master skills demanded on the labour market. We have built prototype, which 1) applies text classification and mining methods job vacancy announcements match jobs their required skills; 2) predicts quality of videos; 3) creates an video recommender system personalized learning content learners. For first evaluation prototype focused area data science related jobs. Our was evaluated by in-depth,...
The proliferation of news sources on the web amplifies problem disinformation and misinformation, impacting public perception societal stability. These issues necessitate identification bias in broadcasts, whereby analysis understanding speaker roles contexts are essential prerequisites. Although there is prior research multimodal role recognition (mostly) domain, modern feature representations have not been explored yet, no comprehensive dataset available. In this paper, we propose novel...
Several algorithms have been proposed to solve the problem of camera motion estimation in digital videos. However, distinction between translation along x-axis (y-axis) and rotation around y-axis (x-axis) has only rarely considered, no approach this kind is known us for MPEG domain. In paper, we present such an algorithm For performance reasons it reasonable extract vectors directly from compressed stream. since are optimal with respect compression, they often do not model real adequately...