Christian Wartena

ORCID: 0000-0001-5483-1529
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Natural Language Processing Techniques
  • Topic Modeling
  • Advanced Text Analysis Techniques
  • Semantic Web and Ontologies
  • Text and Document Classification Technologies
  • Video Analysis and Summarization
  • Biomedical Text Mining and Ontologies
  • Recommender Systems and Techniques
  • Image Retrieval and Classification Techniques
  • Information Retrieval and Search Behavior
  • linguistics and terminology studies
  • Speech and dialogue systems
  • Web Data Mining and Analysis
  • Multimodal Machine Learning Applications
  • Authorship Attribution and Profiling
  • Advanced Database Systems and Queries
  • Service-Oriented Architecture and Web Services
  • Data Mining Algorithms and Applications
  • Advanced Image and Video Retrieval Techniques
  • Law, logistics, and international trade
  • Information Architecture and Usability
  • Complex Network Analysis Techniques
  • Translation Studies and Practices
  • Web visibility and informetrics
  • Library Science and Information Systems

Hochschule Hannover
2012-2023

University of Hildesheim
2019-2022

Inform (Germany)
2021

Deutsche Nationalbibliothek
2021

Allgemeine Unfallversicherungsanstalt
2021

Institut für Automatisierung und Informatik
2021

East Stroudsburg University
2019

Heidelberg University
2019

Heidelberg University
2019

National Technical University "Kharkiv Polytechnic Institute"
2019

We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation Wikipedia articles shows that clusters keywords correlate strongly with categories articles. In addition, we find a distance measure Jensen-Shannon divergence probability distributions outperforms cosine similarity. particular, newly proposed term...

10.1109/dexa.2008.120 article EN 2008-09-01

Automatically generated tags and geotags hold great promise to improve access video collections online communities. We overview three tasks offered in the MediaEval 2010 benchmarking initiative, for each, describing its use scenario, definition data set released. For each task, a reference algorithm is presented that was used within comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes collection of Dutch television with subject...

10.1145/1991996.1992047 article EN 2011-04-18

A common strategy to assign keywords documents is select the most appropriate words from document text. One of important criteria for a word be selected as keyword its relevance The tf.idf score term widely used measure. While easy compute and giving quite satisfactory results, this measure does not take (semantic) relations between into account. In paper we study some alternative measures that do use words. They are computed by defining co-occurrence distributions comparing these with...

10.1109/dexa.2010.32 article EN 2010-08-01

Tagging with free form tags is becoming an increasingly important indexing mechanism. However, have characteristics that require special treatment when used for searching or recommendation because they show much more variation than controlled keywords. In this paper we present a method puts large to good use. We introduce second order co-occurrence and related distance measure tag similarities robust against the in tags. From it straightforward derive methods analyze user interest compute...

10.1109/isda.2009.130 article EN 2009-01-01

In many cases keywords from a restricted set of possible have to be assigned texts. A common way find the best is rank terms occurring in text according their tf.idf value. This requires corpus texts which document frequencies can derived. this paper we show that obtain results same quality without usage background corpus, using relations between provided thesaurus.

10.1109/dexa.2010.31 article EN 2010-08-01

Scientific papers from all disciplines contain many abbreviations and acronyms. In cases these acronyms are ambiguous. We present a method to choose the contextual correct definition of an acronym that does not require training for each thus can be applied large number different with only few instances. constructed set 19,954 examples 4,365 ambiguous image captions in scientific along their contextually domains. learn word embeddings words corpus compare averaged context vector expansion...

10.25968/opus-1265 article EN International Conference on Computational Linguistics 2018-08-20

In the context of large and ever growing archives, generating annotation suggestions automatically from textual resources related to documents be archived is an interesting option in theory. It could save a lot work time consuming expensive task manual it help cataloguers attain higher inter-annotator agreement. However, some questions arise practice: what quality produced annotations? How do they compare with annotations requirements for that were defined archive? If different annotations,...

10.1179/174327909x441090 article EN Interdisciplinary Science Reviews 2009-07-10

Lemmatization is a central task in many NLP applications. Despite this importance, the number of (freely) available and easy to use tools for German very limited. To fill gap, we developed simple lemmatizer that can be trained on any lemmatized corpus. For full form word tagger tries find sequence morphemes most likely generate word. From tags easily derive stem, lemma part speech (PoS) We show (i) quality approach comparable state art methods (ii) improve results Part-of-Speech tagging when...

10.25968/opus-1527 article EN 2019-01-01

Concreteness of words has been studied extensively in psycholinguistic literature. A number datasets have created with average values for perceived concreteness words. We show that we can train a regression model on these data, using word embeddings and morphological features, predict high accuracy. evaluate the 7 publicly available datasets. Only few small subsets prediction are found Our results clearly outperform reported

10.18653/v1/w19-0415 article EN cc-by 2019-01-01

We present an exploratory study of the retrieval semiprofessional user-generated Internet video. The is based on MediaEval 2011 Rich Speech Retrieval (RSR) task for which dataset was taken from sharing platform blip.tv, and search queries associated with specific speech acts occurring in compare results three participant groups using: automatic recognition system transcript (ASR), metadata manually assigned to each video by user who uploaded it, their combination. RSR a known-item single...

10.1109/cbmi.2012.6269810 article EN 2012-06-01

The continued growth of online content makes personalized recommendation an increasingly important tool for media consumption. While collaborative filtering techniques have shown to be very successful in stable collections, based approaches are necessary recommending new items. Content uses the similarity between items and consumed predict whether a item is interesting user. computed by comparing or meta-data In this paper we consider TV-broadcasts which synopses available. We thereby...

10.1145/1871437.1871665 article EN 2010-10-26

We compare the effect of different text segmentation strategies on speech based passage retrieval video. Passage has mainly been studied to improve document and enable question answering. In these domains best results were obtained using passages defined by paragraph structure source documents or arbitrary overlapping passages. For relevant in a video, transcripts, no author is available. from 4 types segments channel video: fixed length segments, sliding window, semantically coherent...

10.1109/cbmi.2012.6269850 article EN 2012-06-01

This paper describes the approach of Hochschule Hannover to SemEval 2013 Task Evaluating Phrasal Semantics. In order compare a single word with two phrase we compute various distributional similarities, among which new similarity measure, based on Jensen-Shannon Divergence correction for frequency effects. The classification is done by support vector machine that uses all similarities as features. turned out be most successful one in task.

10.25968/opus-2077 article EN Joint Conference on Lexical and Computational Semantics 2013-06-01

Distributional semantics tries to characterize the meaning of words by contexts in which they occur. Similarity hence can be derived from similarity contexts. Contexts a word are usually vectors appearing near that corpus. It was observed previous research measures for context two depend on frequency these words. In present paper we investigate this dependency more detail one measure, Jensen-Shannon divergence. We give an empirical model and propose deviation divergence expected basis...

10.25968/opus-335 article EN 2013-04-29

The dependency of word similarity in vector space models on the frequency words has been noted a few studies, but received very little attention. We study influence set 10 000 randomly selected pairs for number different combinations feature weighting schemes and measures. find that all methods, except one using singular value decomposition to reduce dimensionality space, is determined large extent by words. In binary classification task synonyms unrelated we measures results can be improved...

10.25968/opus-870 article EN 2014-01-01

The amount of papers published yearly increases since decades. Libraries need to make these resources accessible and available with classification being an important aspect part this process. This paper analyzes prerequisites possibilities automatic medical literature. We explain the selection, preprocessing analysis data consisting catalogue datasets from library Hanover Medical School, Lower Saxony, Germany. In present study, 19,348 documents, represented by notations systems such as e.g....

10.25968/opus-1146 article EN International Conference Theory and Practice Digital Libraries 2017-01-01

Regional knowledge map is a tool recently demanded by some actors in an institutional level to help regional policy and innovation territory. Besides, maps facilitate the interaction between of territory collective learning. This paper reports work progress research project which objective define methodology efficiently design territorial maps, extracting information big volumes data contained diverse sources related region. Knowledge management intellectual capital organisations....

10.25968/opus-390 article EN International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 2013-01-01

Regional Innovation Systems describe the relations between actors, structures and infrastructures in a region order to stimulate innovation regional development. For these systems collection organization of information is crucial. In present paper we investigate possibilities extract from websites companies. First types that are necessary create them. Then discuss text mining keyword extraction techniques this company websites. Finally, small scale experiment which keywords related economic...

10.25968/opus-391 article EN International Conference on Knowledge Discovery and Information Retrieval 2013-01-01
Coming Soon ...