Val Tannen

ORCID: 0009-0008-6847-7274
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Database Systems and Queries
  • Scientific Computing and Data Management
  • Semantic Web and Ontologies
  • Data Management and Algorithms
  • Distributed and Parallel Computing Systems
  • Logic, Reasoning, and Knowledge
  • Research Data Management Practices
  • Logic, programming, and type systems
  • Service-Oriented Architecture and Web Services
  • Data Quality and Management
  • Distributed systems and fault tolerance
  • Advanced Data Storage Technologies
  • Advanced Algebra and Logic
  • Formal Methods in Verification
  • Business Process Modeling and Analysis
  • Data Mining Algorithms and Applications
  • Bayesian Modeling and Causal Inference
  • Parallel Computing and Optimization Techniques
  • Peer-to-Peer Network Technologies
  • Optimization and Search Problems
  • Algorithms and Data Compression
  • Cloud Computing and Resource Management
  • Genomics and Phylogenetic Studies
  • Genetics, Bioinformatics, and Biomedical Research
  • Mobile Agent-Based Network Management

University of Pennsylvania
2015-2024

California University of Pennsylvania
2006-2024

Pennsylvania State University
2024

Philadelphia University
1994-2023

We show that relational algebra calculations for incomplete databases, probabilistic bag semantics and why-provenance are particular cases of the same general algorithms involving semirings. This further suggests a comprehensive provenance representation uses semirings polynomials. extend these considerations to datalog formal power series. give calculation as well evaluation databases. Finally, we some containment conjunctive queries is standard set semantics.

10.1145/1265530.1265535 article EN 2007-06-11

The iPlant Collaborative (iPlant) is a United States National Science Foundation (NSF) funded project that aims to create an innovative, comprehensive, and foundational cyberinfrastructure in support of plant biology research (PSCIC, 2006). developing uniquely enables scientists throughout the diverse fields comprise address Grand Challenges new ways, stimulate facilitate cross-disciplinary research, promote computer science interactions, train next generation on use education. Meeting...

10.3389/fpls.2011.00034 article EN cc-by Frontiers in Plant Science 2011-01-01

We present a new principle for the development of database query languages that primitive operations should be organized around types. Viewing relational as consisting sets records, this dectates we investigate separately records and sets. There are two immediate advantages approach, which is partly inspired by basic ideas from category theoryl. First, it provides language structures in record set types may freely combined: nested relations or complex objects. Second, fundamental closely...

10.1016/0304-3975(95)00024-q article EN cc-by-nc-nd Theoretical Computer Science 1995-09-01

The syntax of comprehensions is very close to the a number practical database query languages and is, we believe, better starting point than first-order logic for development languages. We give an informal account language based on comprehension that deals uniformly with variety collection types; it also includes pattern matching, variant types function definition. show, again informally, how natural fragment structural recursion, much more powerful programming paradigm types. show small...

10.1145/181550.181564 article EN ACM SIGMOD Record 1994-03-01

The integrated access to heterogeneous data sources is a major challenge for the biomedical community. Several solution strategies have been explored: link-driven federation of databases, view integration, and warehousing. In this paper we report on our experiences with two systems that were developed at University Pennsylvania: K2, integration implementation, GUS, warehouse. Although warehouse approaches each advantages, there no clear "winner." Therefore, in selecting best strategy...

10.1147/sj.402.0512 article EN IBM Systems Journal 2001-01-01

Many advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was produced, e.g., to determine its score existence. This requires answers queries such as, "Is this derivable from trusted tuples?"; "What tuples are derived relation?"; should answer receive, given initial scores of the base tuples?". Such questions...

10.1145/1807167.1807269 article EN 2010-06-06

We study in this paper provenance information for queries with aggregation. Provenance was studied the context of various query languages that do not allow aggregation, and recent work has suggested to capture by annotating different database tuples elements a commutative semiring propagating annotations through evaluation. show aggregate pose novel challenges rendering approach inapplicable. Consequently, we propose new approach, where annotate just but also individual values within tuples,...

10.1145/1989284.1989302 article EN 2011-06-13

Workflow provenance typically assumes that each module is a "black-box", so output depends on all inputs ( coarse-grained dependencies). Furthermore, it does not model the internal state of module, which can change between repeated executions. In practice, however, an may depend only small subset fine-grained dependencies) as well module. We present novel framework marries database-style and workflow-style provenance, by using Pig Latin to expose functionality modules, thus capturing...

10.14778/2095686.2095693 article EN Proceedings of the VLDB Endowment 2011-12-01

Let Σ 1 , 2 be two schemas, which may overlap, C a set of constraints on the joint schema ∪ and q -query. An (equivalent) reformulation in presence is -query, such that gives same answers as any -database instance satisfies . In general, there exist multiple reformulations choosing among them require, for example, cost model.

10.1145/1121995.1122010 article EN ACM SIGMOD Record 2006-03-01

We present a formal framework for capturing the provenance of data appearing in XQuery views XML. Building on previous work relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings show that these suffice large positive fragment applied to this data. In addition tracking metadata, can be used represent process repetitions, incomplete XML, probabilistic provides basis enforcing access control policies security applications.

10.1145/1376916.1376954 article EN 2008-06-09

Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in queriable mediated instance. However, for settings which is being collaboratively authored by large community, e.g., sciences, there often lack consensus about how it should be represented, what correct, sources are authoritative. Moreover, such seldom static: frequently updated, cleaned, annotated. The ORCHESTRA collaborative sharing system develops new...

10.1145/1462571.1462577 article EN ACM SIGMOD Record 2008-09-30

Article ORCHESTRA: facilitating collaborative data sharing Share on Authors: Todd J. Green University of Pennsylvania, Philadelphia, PA PAView Profile , Grigoris Karvounarakis Nicholas E. Taylor Olivier Biton Zachary G. Ives Val Tannen Authors Info & Claims SIGMOD '07: Proceedings the 2007 ACM international conference Management dataJune Pages 1131–1133https://doi.org/10.1145/1247480.1247631Published:11 June 41citation70DownloadsMetricsTotal Citations41Total Downloads70Last 12 Months20Last 6...

10.1145/1247480.1247631 article EN 2007-06-11

Provenance in scientific workflows is a double-edged sword. On the one hand, recording information about module executions used to produce data item, as well parameter settings and intermediate items passed between executions, enables transparency reproducibility of results. other workflow often contains private or confidential uses proprietary modules. Hence, providing exact answers provenance queries over all may reveal information. In this paper we discuss privacy concerns -- data,...

10.1145/1938551.1938554 article EN 2011-02-08

Imagine a computational process that uses complex input consisting of multiple "items" (e.g.,files, tables, tuples, parameters, configuration rules) The provenance analysis such allows us to understand how the different items affect output computation. It can be used, for example, derive confidence in (given confidences items), minimum access clearance with classifications), minimize cost obtaining item pricing scheme). also applies probabilistic reasoning about an distributions), as well...

10.1145/3034786.3056125 article EN 2017-05-09

Incremental view maintenance (IVM) has long been a central problem in database theory. Many solutions have proposed for restricted classes of languages, such as the relational algebra, or Datalog. These techniques do not naturally generalize to richer languages. In this paper we give general, heuristic-free solution 3 steps: (1) describe simple but expressive language called DBSP describing computations over data streams; (2) new mathematical definition IVM and general algorithm solving...

10.14778/3587136.3587137 article EN Proceedings of the VLDB Endowment 2023-03-01
Coming Soon ...