- Advanced Database Systems and Queries
- semigroups and automata theory
- Semantic Web and Ontologies
- Machine Learning and Algorithms
- Algorithms and Data Compression
- Natural Language Processing Techniques
- Formal Methods in Verification
- Complexity and Algorithms in Graphs
- Data Management and Algorithms
- Logic, programming, and type systems
- Service-Oriented Architecture and Web Services
- Graph Theory and Algorithms
- Logic, Reasoning, and Knowledge
- Distributed systems and fault tolerance
- Data Quality and Management
- Scientific Computing and Data Management
- Web Data Mining and Analysis
- Data Stream Mining Techniques
- Markov Chains and Monte Carlo Methods
- Optimization and Search Problems
- Constraint Satisfaction and Optimization
- Graph theory and applications
- Advanced Graph Neural Networks
- Chemical Synthesis and Analysis
- DNA and Biological Computing
Pontificia Universidad Católica de Chile
2015-2024
Millennium Institute for Integrative Biology
2015-2024
Chennai Mathematical Institute
2021
Université de Bordeaux
2021
Max Planck Institute for Software Systems
2021
Millennium Institute
2020
University of Oxford
2009-2013
Science Oxford
2011-2012
A schema mapping is a specification that describes how data from source to be mapped target schema. Once the has been transferred target, natural question whether one can undo process and recover initial data, or at least part of it. In fact, it would desirable find reverse specifies bring exchanged back. this article, we introduce notion recovery mapping: mapping, M′ for M, recovers sound with respect M. We further an order relation on recoveries. This allows us choose mappings maximum...
We study the problem of enumerating results from a query over compressed document. The model we use for compression are straight-line programs (SLPs), which defined by context-free grammar that produces single string. For our queries, called Annotated Automata, an extension regular automata allows annotations on letters. This extends notion Regular Spanners as it arbitrarily long outputs. Our main result is algorithm evaluates such all with output-linear delay after preprocessing phase takes...
Some of the most relevant document schemas used online, such as XML and JSON, have a nested format. In last decade, task extracting data from documents over streams has become especially relevant. We focus on streaming evaluation queries with outputs varied sizes documents. model this kind Visibly Pushdown Annotators (VPAnn), computational that extends visibly pushdown automata same expressive power MSO Since processing through VPAnn can generate massive number results, we are interested in...
Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in last few years. Although systems are intended to deal semistructured data, all language proposals introduced so far designed output relations, thus making them incapable handling incomplete information. To remedy situation, we propose extend ability use mappings, allowing us work documents which have missing or optional parts. Using this approach,...
Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use languages order to locate the data that a user wants extract from text document, then store this into variables. Since spanners can easily generate large outputs, it is important have good evaluation algorithms extracted quick succession, relatively little precomputation time. Towards goal, we present practical...
research-article Composition and inversion of schema mappings Share on Authors: Marcelo Arenas PUC Chile ChileView Profile , Jorge Pérez Juan Reutter U. Edinburgh EdinburghView Cristian Riveros Oxford University UniversityView Authors Info & Claims ACM SIGMOD RecordVolume 38Issue 3September 2009 pp 17–28https://doi.org/10.1145/1815933.1815938Online:15 December 2010Publication History 23citation278DownloadsMetricsTotal Citations23Total Downloads278Last 12 Months14Last 6 weeks1 Get Citation...
In the last few years, a lot of attention has been paid to specification and subsequent manipulation schema mappings, problem which is fundamental importance in metadata management. There have many achievements this area, semantics defined for operators on mappings such as composition inverse. However, little research pursued towards providing formal tools compare terms their ability transfer data avoid storing redundant information, hampered development foundations more complex them involve...
While monadic second-order logic is a prominent for specifying languages of finite words, it lacks the power to compute quantitative properties, e.g. count. An automata model capable computing such properties are weighted automata, but logics equivalent these have only recently emerged. We propose new framework adding Boolean words. use this define Quantitative Monadic Second-Order Logic (QMSO). In way we obtain simple which equally expressive automata. analyse its evaluation complexity,...
In this work, we study two simple yet general complexity classes, based on logspace Turing machines, which provide a unifying framework for efficient query evaluation in areas like information extraction and graph databases, among others. We investigate the of three fundamental algorithmic problems these classes: enumeration, counting uniform generation solutions, show that they have several desirable properties respect. Both classes are defined terms nondeterministic transducers (NL...
Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners , use languages to locate the data that a user wants extract from text then store this into variables. Since can easily generate large outputs, it is important have efficient evaluation algorithms extracted quick succession, relatively little precomputation time. Toward goal, we present practical algorithm allows...
The inversion of schema mappings has been identified as one the fundamental operators for development a general framework metadata management. In fact, during last years three alternative notions have proposed (Fagin-inverse [10], quasi-inverse [14] and maximum recovery [2]). However, procedures that developed computing these some features limit their practical applicability. First, algorithms work in exponential time produce inverse size. Second, express inverses languages which include are...
Complex Event Processing (CEP) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real-time. CEP finds applications diverse domains, which resulted a large number of proposals expressing complex events. However, existing languages lack from clear semantics, making them hard to understand generalize. Moreover, there are no general techniques evaluating query with performance guarantees. In this paper we embark on task giving...
We present the theoretical foundations and first experimental study of a new approach in centrality measures for graph data. The main principle is straightforward: more relevant subgraphs around vertex, central it network. formalize notion “relevant subgraphs” by choosing family that, given G vertex v , assigns subset connected that contains . Any such families defines measure counting number assigned to i.e., will be important network if belongs family. show several examples this approach....
A schema mapping is a specification that describes how data from source to be mapped target schema. Once the has been transferred target, natural question whether one can undo process and recover initial data, or at least part of it. In fact, it would desirable find reverse specifies bring exchanged back.In this paper, we introduce notion recovery mapping: M' for M recovers sound with respect M. We further an order relation on recoveries. This allows us choose mappings maximum amount...
Complex event recognition (CER) has emerged as the unifying field for technologies that require processing and correlating distributed data sources in real time. CER finds applications diverse domains, which resulted a large number of proposals expressing complex events. Existing languages lack clear semantics, however, makes them hard to understand generalize. Moreover, there are no general techniques evaluating query with performance guarantees. In this article, we embark on task giving...
ABSTRACT In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB based on data model, which call domain graphs, provides simple abstraction upon variety of popular models can be supported, thus providing flexible management for diverse types knowledge graph. The itself founded combination tried tested techniques from relational management, state-of-the-art algorithms worst-case-optimal joins, as well...
In this paper, we propose a simple and expressive framework for adding metadata to CSV documents their noisy variants. The is based on annotating parts of the document that can be later used read, query, or exchange data. core our language extended regular expressions are selecting These then combined using set rules in order annotate We study computational complexity implementing present an efficient evaluation algorithm runs time proportional its output linear input. As proof concept, test...
Conjunctive queries are one of the most common class used in database systems, and best studied literature. A seminal result Grohe, Schwentick, Segoufin (STOC 2001) demonstrates that for every G graphs, evaluation all conjunctive whose underlying graph is tractable if, only has bounded treewidth. In this work, we extend characterization to counting problem queries. Specifically, C with treewidth, introduce first fully polynomial-time randomized approximation scheme (FPRAS) answers a query C,...
Abstract In this systems paper, we present MillenniumDB: a novel graph database engine that is modular, persistent, and open source. MillenniumDB based on data model, which call domain graphs, provides simple abstraction upon variety of popular models can be supported, thus providing flexible management for diverse types knowledge graph. The itself founded combination tried tested techniques from relational management, state-of-the-art algorithms worst-case-optimal joins, as well...
What do you if a computational object (e.g. program trace) fails specification? An obvious approach is to perform repair: modify the minimally get something that satisfies constraints. In this paper we study repair of temporal constraints, given as automata or logic formulas. We focus on determining number repairs must be applied word satisfying input constraint in order ensure it target constraint. This may well unbounded; one our main contributions isolate complexity "bounded problem",...