NFDI4DS | UHH-SEMS - Publication Details

Christian Wartena

ORCID: 0000-0001-5483-1529

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5086087681

Research Areas

Natural Language Processing Techniques
Topic Modeling
Advanced Text Analysis Techniques
Semantic Web and Ontologies
Text and Document Classification Technologies
Video Analysis and Summarization
Biomedical Text Mining and Ontologies
Recommender Systems and Techniques
Image Retrieval and Classification Techniques
Information Retrieval and Search Behavior
linguistics and terminology studies
Speech and dialogue systems
Web Data Mining and Analysis
Multimodal Machine Learning Applications
Authorship Attribution and Profiling
Advanced Database Systems and Queries
Service-Oriented Architecture and Web Services
Data Mining Algorithms and Applications
Advanced Image and Video Retrieval Techniques
Law, logistics, and international trade
Information Architecture and Usability
Complex Network Analysis Techniques
Translation Studies and Practices
Web visibility and informetrics
Library Science and Information Systems

Hochschule Hannover
2012-2023

University of Hildesheim
2019-2022

Inform (Germany)
2021

Deutsche Nationalbibliothek
2021

Allgemeine Unfallversicherungsanstalt
2021

Institut für Automatisierung und Informatik
2021

East Stroudsburg University
2019

Heidelberg University
2019

National Technical University "Kharkiv Polytechnic Institute"
2019

Topic Detection by Clustering Keywords

OPENALEX - Publications

Christian Wartena Rogier Brussee

We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation Wikipedia articles shows that clusters keywords correlate strongly with categories articles. In addition, we find a distance measure Jensen-Shannon divergence probability distributions outperforms cosine similarity. particular, newly proposed term...

10.1109/dexa.2008.120 article EN 2008-09-01

Automatic tagging and geotagging in video collections and communities

OPENALEX - Publications

Martha Larson Mohammad Soleymani Pavel Serdyukov Stevan Rudinac Christian Wartena and 4 more

Automatically generated tags and geotags hold great promise to improve access video collections online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition data set released. For each task, a reference algorithm is presented that was used within comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes collection of Dutch television with subject...

10.1145/1991996.1992047 article EN 2011-04-18

Keyword Extraction Using Word Co-occurrence

OPENALEX - Publications

Christian Wartena Rogier Brussee Wout Slakhorst

A common strategy to assign keywords documents is select the most appropriate words from document text. One of important criteria for a word be selected as keyword its relevance The tf.idf score term widely used measure. While easy compute and giving quite satisfactory results, this measure does not take (semantic) relations between into account. In paper we study some alternative measures that do use words. They are computed by defining co-occurrence distributions comparing these with...

10.1109/dexa.2010.32 article EN 2010-08-01

Using Tag Co-occurrence for Recommendation

OPENALEX - Publications

Christian Wartena Rogier Brussee Martin Wibbels

Tagging with free form tags is becoming an increasingly important indexing mechanism. However, have characteristics that require special treatment when used for searching or recommendation because they show much more variation than controlled keywords. In this paper we present a method puts large to good use. We introduce second order co-occurrence and related distance measure tag similarities robust against the in tags. From it straightforward derive methods analyze user interest compute...

10.1109/isda.2009.130 article EN 2009-01-01

Thesaurus Based Term Ranking for Keyword Extraction

OPENALEX - Publications

Luit Gazendam Christian Wartena Rogier Brussee

In many cases keywords from a restricted set of possible have to be assigned texts. A common way find the best is rank terms occurring in text according their tf.idf value. This requires corpus texts which document frequencies can derived. this paper we show that obtain results same quality without usage background corpus, using relations between provided thesaurus.

10.1109/dexa.2010.31 article EN 2010-08-01

Using Word Embeddings for Unsupervised Acronym Disambiguation

OPENALEX - Publications

Jean Charbonnier Christian Wartena

Scientific papers from all disciplines contain many abbreviations and acronyms. In cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each thus can be applied large number different with only few instances. constructed set 19,954 examples 4,365 ambiguous image captions in scientific along their contextually domains. learn word embeddings words corpus compare averaged context vector expansion...

10.25968/opus-1265 article EN International Conference on Computational Linguistics 2018-08-20

Automatic Annotation Suggestions for Audiovisual Archives: Evaluation Aspects

OPENALEX - Publications

Luit Gazendam Christian Wartena Véronique Malaisé Guus Schreiber Annemieke de Jong and 1 more

In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to documents be archived is an interesting option in theory. It could save a lot work time consuming expensive task manual it help cataloguers attain higher inter-annotator agreement. However, some questions arise practice: what quality produced annotations? How do they compare with annotations requirements for that were defined archive? If different annotations,...

10.1179/174327909x441090 article EN Interdisciplinary Science Reviews 2009-07-10

A Probabilistic Morphology Model for German Lemmatization

OPENALEX - Publications

Christian Wartena

Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German very limited. To fill gap, we developed simple lemmatizer that can be trained on any lemmatized corpus. For full form word tagger tries find sequence morphemes most likely generate word. From tags easily derive stem, lemma part speech (PoS) We show (i) quality approach comparable state art methods (ii) improve results Part-of-Speech tagging when...

10.25968/opus-1527 article EN 2019-01-01

Predicting Word Concreteness and Imagery

OPENALEX - Publications

Jean Charbonnier Christian Wartena

Concreteness of words has been studied extensively in psycholinguistic literature. A number datasets have created with average values for perceived concreteness words. We show that we can train a regression model on these data, using word embeddings and morphological features, predict high accuracy. evaluate the 7 publicly available datasets. Only few small subsets prediction are found Our results clearly outperform reported

10.18653/v1/w19-0415 article EN cc-by 2019-01-01

Comparing retrieval effectiveness of alternative content segmentation methods for Internet video search

OPENALEX - Publications

Maria Eskevich Gareth J. F. Jones Christian Wartena Martha Larson Robin Aly and 2 more

We present an exploratory study of the retrieval semiprofessional user-generated Internet video. The is based on MediaEval 2011 Rich Speech Retrieval (RSR) task for which dataset was taken from sharing platform blip.tv, and search queries associated with specific speech acts occurring in compare results three participant groups using: automatic recognition system transcript (ASR), metadata manually assigned to each video by user who uploaded it, their combination. RSR a known-item single...

10.1109/cbmi.2012.6269810 article EN 2012-06-01

Selecting keywords for content based recommendation

OPENALEX - Publications

Christian Wartena Wout Slakhorst Martin Wibbels

The continued growth of online content makes personalized recommendation an increasingly important tool for media consumption. While collaborative filtering techniques have shown to be very successful in stable collections, based approaches are necessary recommending new items. Content uses the similarity between items and consumed predict whether a item is interesting user. computed by comparing or meta-data In this paper we consider TV-broadcasts which synopses available. We thereby...

10.1145/1871437.1871665 article EN 2010-10-26

Comparing segmentation strategies for efficient video passage retrieval

OPENALEX - Publications

Christian Wartena

We compare the effect of different text segmentation strategies on speech based passage retrieval video. Passage has mainly been studied to improve document and enable question answering. In these domains best results were obtained using passages defined by paragraph structure source documents or arbitrary overlapping passages. For relevant in a video, transcripts, no author is available. from 4 types segments channel video: fixed length segments, sliding window, semantically coherent...

10.1109/cbmi.2012.6269850 article EN 2012-06-01

HsH: Estimating Semantic Similarity of Words and Short Phrases with Frequency Normalized Distance Measures

OPENALEX - Publications

Christian Wartena

This paper describes the approach of Hochschule Hannover to SemEval 2013 Task Evaluating Phrasal Semantics. In order compare a single word with two phrase we compute various distributional similarities, among which new similarity measure, based on Jensen-Shannon Divergence correction for frequency effects. The classification is done by support vector machine that uses all similarities as features. turned out be most successful one in task.

10.25968/opus-2077 article EN Joint Conference on Lexical and Computational Semantics 2013-06-01

Distributional Similarity of Words with Different Frequencies

OPENALEX - Publications

Christian Wartena

Distributional semantics tries to characterize the meaning of words by contexts in which they occur. Similarity hence can be derived from similarity contexts. Contexts a word are usually vectors appearing near that corpus. It was observed previous research measures for context two depend on frequency these words. In present paper we investigate this dependency more detail one measure, Jensen-Shannon divergence. We give an empirical model and propose deviation divergence expected basis...

10.25968/opus-335 article EN 2013-04-29

On the effect of word frequency on distributional similarity

OPENALEX - Publications

Christian Wartena

The dependency of word similarity in vector space models on the frequency words has been noted a few studies, but received very little attention. We study influence set 10 000 randomly selected pairs for number different combinations feature weighting schemes and measures. find that all methods, except one using singular value decomposition to reduce dimensionality space, is determined large extent by words. In binary classification task synonyms unrelated we measures results can be improved...

10.25968/opus-870 article EN 2014-01-01

Classifying Medical Literature Using k-Nearest-Neighbours Algorithm

OPENALEX - Publications

Andreas Lüschow Christian Wartena

The amount of papers published yearly increases since decades. Libraries need to make these resources accessible and available with classification being an important aspect part this process. This paper analyzes prerequisites possibilities automatic medical literature. We explain the selection, preprocessing analysis data consisting catalogue datasets from library Hanover Medical School, Lower Saxony, Germany. In present study, 19,348 documents, represented by notations systems such as e.g....

10.25968/opus-1146 article EN International Conference Theory and Practice Digital Libraries 2017-01-01

Regional Knowledge Maps - Potential and Challenges

OPENALEX - Publications

Montserrat García Alsina Christian Wartena Sönke Lieberam-Schmidt

Regional knowledge map is a tool recently demanded by some actors in an institutional level to help regional policy and innovation territory. Besides, maps facilitate the interaction between of territory collective learning. This paper reports work progress research project which objective define methodology efficiently design territorial maps, extracting information big volumes data contained diverse sources related region. Knowledge management intellectual capital organisations....

10.25968/opus-390 article EN International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 2013-01-01

Challenges and Potentials for Keyword Extraction from Company Websites for the Development of Regional Knowledge Maps

OPENALEX - Publications

Christian Wartena Montserrat García Alsina

Regional Innovation Systems describe the relations between actors, structures and infrastructures in a region order to stimulate innovation regional development. For these systems collection organization of information is crucial. In present paper we investigate possibilities extract from websites companies. First types that are necessary create them. Then discuss text mining keyword extraction techniques this company websites. Finally, small scale experiment which keywords related economic...

10.25968/opus-391 article EN International Conference on Knowledge Discovery and Information Retrieval 2013-01-01

Coming Soon ...