- Topic Modeling
- Natural Language Processing Techniques
- Semantic Web and Ontologies
- Advanced Database Systems and Queries
- Web Data Mining and Analysis
- Data Management and Algorithms
- Peer-to-Peer Network Technologies
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Data Quality and Management
- Caching and Content Delivery
- Advanced Graph Neural Networks
- Service-Oriented Architecture and Web Services
- Distributed and Parallel Computing Systems
- Advanced Text Analysis Techniques
- Business Process Modeling and Analysis
- Algorithms and Data Compression
- Recommender Systems and Techniques
- Data Mining Algorithms and Applications
- Text and Document Classification Technologies
- Complex Network Analysis Techniques
- Biomedical Text Mining and Ontologies
- Spam and Phishing Detection
- Sentiment Analysis and Opinion Mining
- Parallel Computing and Optimization Techniques
Max Planck Institute for Informatics
2015-2024
Max Planck Society
2013-2024
Robert Bosch (India)
2023
University of Amsterdam
2023
Max Planck Institute for the History of Science
2008-2021
Microsoft Research (United Kingdom)
1998-2019
Institute of Informatics of the Slovak Academy of Sciences
2018
Cornell University
1995-2017
Klinikum Saarbrücken
2017
Hewlett-Packard (United States)
2011
This paper introduces a new approach to database disk buffering, called the LRU-K method. The basic idea of is keep track times last K references popular pages, using this information statistically estimate interarrival on page by basis. Although performs optimal statistical inference under relatively standard assumptions, it fairly simple and incurs little bookkeeping overhead. As we demonstrate with simulation experiments, algorithm surpasses conventional buffering algorithms in...
No abstract available.
RDF is a data representation format for schema-free structured information that gaining momentum in the context of Semantic-Web corpora, life sciences, and also Web 2.0 platforms. The "pay-as-you-go" nature flexible pattern-matching capabilities its query language SPARQL entail efficiency scalability challenges complex queries including long join paths. This paper presents RDF-3X engine, an implementation achieves excellent performance by pursuing RISC-style architecture with streamlined...
We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, WordNet, contains nearly 10 million entities events, as well 80 facts representing general world An enhanced data representation introduces time location first-class citizens. The wealth spatio-temporal information in can be explored either graphically or through a special time- space-aware query language.
Misinformation such as fake news is one of the big challenges our society. Research on automated fact-checking has proposed methods based supervised learning, but these approaches do not consider external evidence apart from labeled training instances. Recent counter this deficit by considering sources related to a claim. However, require substantial feature modeling and rich lexicons. This paper overcomes limitations prior work with an end-to-end model for evidence-aware credibility...
Rankings of people and items are at the heart selection-making, match-making, recommender systems, ranging from employment sites to sharing economy platforms. As ranking positions influence amount attention ranked subjects receive, biases in rankings can lead unfair distribution opportunities resources, such as jobs or income. This paper proposes new measures mechanisms quantify mitigate unfairness a bias inherent all rankings, namely, position bias, which leads disproportionately less being...
Templates are an important asset for question answering over knowledge graphs, simplifying the semantic parsing of input utterances and generating structured queries interpretable answers. State-of-the-art methods rely on hand-crafted templates with limited coverage. This paper presents QUINT, a system that automatically learns utterance-query solely from user questions paired their Additionally, QUINT is able to harness language compositionality complex without having any entire question....
Measuring the semantic relatedness between two entities is basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names a Web or text document by jointly mapping all onto semantically related registered base. To this end, we have developed novel notion of represented as sets weighted (multi-word) keyphrases, with consideration partially overlapping phrases. measure improves quality prior link-based models, also eliminates need (usually...
The web is a huge source of valuable information. However, in recent times, there an increasing trend towards false claims social media, other web-sources, and even news. Thus, factchecking websites have become increasingly popular to identify such misinformation based on manual analysis. Recent research proposed methods assess the credibility automatically. are major limitations: most works assume be structured form, few deal with textual but require that sources evidence or...
There is an increasing amount of false claims in news, social media, and other web sources. While prior work on truth discovery has focused the case checking factual statements, this paper addresses novel task assessing credibility arbitrary made natural-language text - open-domain setting without any assumptions about structure claim, or community where it made. Our solution based automatically finding sources news feeding these into a distantly supervised classifier for claim (i.e., true...
This paper introduces a new approach to database disk buffering, called the LRU-K method. The basic idea of is keep track times last K references popular pages, using this information statistically estimate interarrival on page by basis. Although performs optimal statistical inference under relatively standard assumptions, it fairly simple and incurs little bookkeeping overhead. As we demonstrate with simulation experiments, algorithm surpasses conventional buffering algorithms in...
One of the demands database system transaction management is to achieve a high degree concurrency by taking into consideration semantics high-level operations. On other hand, implementation such operations must pay attention conflicts on storage representation levels below. To meet these requirements in layered architecture, we propose multilevel utilizing layer-specific semantics. Based theoretical notion serializability, family control strategies developed. Suitable recovery protocols are...
The Web has the potential to become world's largest knowledge base. In order unleash this potential, wealth of information available on needs be extracted and organized. There is a need for new querying techniques that are simple yet more expressive than those provided by standard keyword-based search engines. Searching rather pages consider inherent semantic structures like entities (person, organization, etc.) relationships (isA, located In, etc.). paper, we propose NAGA, engine. NAGA...
With the proliferation of RDF data format, engines for query processing are faced with very large graphs that contain hundreds millions triples. This paper addresses resulting scalability problems. Recent prior work along these lines has focused on indexing and other physical-design issues. The current focuses join processing, as fine-grained schema-relaxed use often entails star- chain-shaped queries many input streams from index scans.