- Advanced Database Systems and Queries
- Scientific Computing and Data Management
- Semantic Web and Ontologies
- Data Management and Algorithms
- Distributed and Parallel Computing Systems
- Logic, Reasoning, and Knowledge
- Research Data Management Practices
- Logic, programming, and type systems
- Service-Oriented Architecture and Web Services
- Data Quality and Management
- Distributed systems and fault tolerance
- Advanced Data Storage Technologies
- Advanced Algebra and Logic
- Formal Methods in Verification
- Business Process Modeling and Analysis
- Data Mining Algorithms and Applications
- Bayesian Modeling and Causal Inference
- Parallel Computing and Optimization Techniques
- Peer-to-Peer Network Technologies
- Optimization and Search Problems
- Algorithms and Data Compression
- Cloud Computing and Resource Management
- Genomics and Phylogenetic Studies
- Genetics, Bioinformatics, and Biomedical Research
- Mobile Agent-Based Network Management
University of Pennsylvania
2015-2024
California University of Pennsylvania
2006-2024
Pennsylvania State University
2024
Philadelphia University
1994-2023
We show that relational algebra calculations for incomplete databases, probabilistic bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation uses semirings polynomials. extend these considerations to datalog formal power series. give calculation as well evaluation databases. Finally, we some containment conjunctive queries is standard set semantics.
The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). developing uniquely enables scientists throughout the diverse fields comprise address Grand Challenges new ways, stimulate facilitate cross-disciplinary research, promote computer science interactions, train next generation on use education. Meeting...
We present a new principle for the development of database query languages that primitive operations should be organized around types. Viewing relational as consisting sets records, this dectates we investigate separately records and sets. There are two immediate advantages approach, which is partly inspired by basic ideas from category theoryl. First, it provides language structures in record set types may freely combined: nested relations or complex objects. Second, fundamental closely...
The syntax of comprehensions is very close to the a number practical database query languages and is, we believe, better starting point than first-order logic for development languages. We give an informal account language based on comprehension that deals uniformly with variety collection types; it also includes pattern matching, variant types function definition. show, again informally, how natural fragment structural recursion, much more powerful programming paradigm types. show small...
The integrated access to heterogeneous data sources is a major challenge for the biomedical community. Several solution strategies have been explored: link-driven federation of databases, view integration, and warehousing. In this paper we report on our experiences with two systems that were developed at University Pennsylvania: K2, integration implementation, GUS, warehouse. Although warehouse approaches each advantages, there no clear "winner." Therefore, in selecting best strategy...
Many advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was produced, e.g., to determine its score existence. This requires answers queries such as, "Is this derivable from trusted tuples?"; "What tuples are derived relation?"; should answer receive, given initial scores of the base tuples?". Such questions...
We study in this paper provenance information for queries with aggregation. Provenance was studied the context of various query languages that do not allow aggregation, and recent work has suggested to capture by annotating different database tuples elements a commutative semiring propagating annotations through evaluation. show aggregate pose novel challenges rendering approach inapplicable. Consequently, we propose new approach, where annotate just but also individual values within tuples,...
Workflow provenance typically assumes that each module is a "black-box", so output depends on all inputs ( coarse-grained dependencies). Furthermore, it does not model the internal state of module, which can change between repeated executions. In practice, however, an may depend only small subset fine-grained dependencies) as well module. We present novel framework marries database-style and workflow-style provenance, by using Pig Latin to expose functionality modules, thus capturing...
Let Σ 1 , 2 be two schemas, which may overlap, C a set of constraints on the joint schema ∪ and q -query. An (equivalent) reformulation in presence is -query, such that gives same answers as any -database instance satisfies . In general, there exist multiple reformulations choosing among them require, for example, cost model.
We present a formal framework for capturing the provenance of data appearing in XQuery views XML. Building on previous work relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings show that these suffice large positive fragment applied to this data. In addition tracking metadata, can be used represent process repetitions, incomplete XML, probabilistic provides basis enforcing access control policies security applications.
Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in queriable mediated instance. However, for settings which is being collaboratively authored by large community, e.g., sciences, there often lack consensus about how it should be represented, what correct, sources are authoritative. Moreover, such seldom static: frequently updated, cleaned, annotated. The ORCHESTRA collaborative sharing system develops new...
Article ORCHESTRA: facilitating collaborative data sharing Share on Authors: Todd J. Green University of Pennsylvania, Philadelphia, PA PAView Profile , Grigoris Karvounarakis Nicholas E. Taylor Olivier Biton Zachary G. Ives Val Tannen Authors Info & Claims SIGMOD '07: Proceedings the 2007 ACM international conference Management dataJune Pages 1131–1133https://doi.org/10.1145/1247480.1247631Published:11 June 41citation70DownloadsMetricsTotal Citations41Total Downloads70Last 12 Months20Last 6...
Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about module executions used to produce data item, as well parameter settings and intermediate items passed between executions, enables transparency reproducibility of results. other workflow often contains private or confidential uses proprietary modules. Hence, providing exact answers provenance queries over all may reveal information. In this paper we discuss privacy concerns -- data,...
Imagine a computational process that uses complex input consisting of multiple "items" (e.g.,files, tables, tuples, parameters, configuration rules) The provenance analysis such allows us to understand how the different items affect output computation. It can be used, for example, derive confidence in (given confidences items), minimum access clearance with classifications), minimize cost obtaining item pricing scheme). also applies probabilistic reasoning about an distributions), as well...
Incremental view maintenance (IVM) has long been a central problem in database theory. Many solutions have proposed for restricted classes of languages, such as the relational algebra, or Datalog. These techniques do not naturally generalize to richer languages. In this paper we give general, heuristic-free solution 3 steps: (1) describe simple but expressive language called DBSP describing computations over data streams; (2) new mathematical definition IVM and general algorithm solving...