NFDI4DS | UHH-SEMS - Publication Details

Alon Halevy

ORCID: 0000-0002-8717-7356

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5067621853

Research Areas

Advanced Database Systems and Queries
Semantic Web and Ontologies
Data Quality and Management
Data Management and Algorithms
Web Data Mining and Analysis
Topic Modeling
Natural Language Processing Techniques
Scientific Computing and Data Management
Service-Oriented Architecture and Web Services
Sentiment Analysis and Opinion Mining
Data Mining Algorithms and Applications
Multimodal Machine Learning Applications
Personal Information Management and User Behavior
Misinformation and Its Impacts
Spam and Phishing Detection
Peer-to-Peer Network Technologies
Big Data and Business Intelligence
Advanced Text Analysis Techniques
Big Data Technologies and Applications
Logic, Reasoning, and Knowledge
Advanced Data Storage Technologies
Mental Health via Writing
Biomedical Text Mining and Ontologies
Geographic Information Systems Studies
Web visibility and informetrics

Alpha Omega Alpha Medical Honor Society
2022-2024

Menlo School
2020-2024

Amazon (United States)
2024

Stanford University
2023

META Health
2022-2023

Cornell University
2023

Georgia Institute of Technology
2022

Meta (United States)
2020-2022

University of Washington
2000-2021

Meta (Israel)
2021

The Unreasonable Effectiveness of Data

OPENALEX - Publications

Alon Halevy Peter Norvig Fernando Pereira

Problems that involve interacting with humans, such as natural language understanding, have not proven to be solvable by concise, neat formulas like F = ma. Instead, the best approach appears embrace complexity of domain and address it harnessing power data: if other humans engage in tasks generate large amounts unlabeled, noisy data, new algorithms can used build high-quality models from data.

10.1109/mis.2009.36 article EN IEEE Intelligent Systems 2009-03-01

Answering queries using views: A survey

OPENALEX - Publications

Alon Halevy

10.1007/s007780100054 article EN The VLDB Journal 2001-12-01

Crowdsourcing systems on the World-Wide Web

OPENALEX - Publications

AnHai Doan Raghu Ramakrishnan Alon Halevy

The practice of crowdsourcing is transforming the Web and giving rise to a new field.

10.1145/1924421.1924442 article EN Communications of the ACM 2011-03-22

Learning to map between ontologies on the semantic web

OPENALEX - Publications

AnHai Doan Jayant Madhavan Pedro Domingos Alon Halevy

Ontologies play a prominent role on the Semantic Web. They make possible widespread publication of machine understandable data, opening myriad opportunities for automated information processing. However, because Web's distributed nature, data it will inevitably come from many different ontologies. Information processing across ontologies is not without knowing semantic mappings between their elements. Manually finding such tedious, error-prone, and clearly at Web scale. Hence, development...

10.1145/511446.511532 article EN 2002-05-07

Reconciling schemas of disparate data sources

OPENALEX - Publications

AnHai Doan Pedro Domingos Alon Halevy

A data-integration system provides access to a multitude of data sources through single mediated schema. key bottleneck in building such systems has been the laborious manual construction semantic mappings between source schemas and We describe LSD, that employs extends current machine-learning techniques semi-automatically find mappings. LSD first asks user provide for small set sources, then uses these together with train learners. Each learner exploits different type information either or...

10.1145/375663.375731 article EN 2001-05-01

From databases to dataspaces

OPENALEX - Publications

Michael J. Franklin Alon Halevy David Maier

The development of relational database management systems served to focus the data community for decades, with spectacular results. In recent years, however, rapidly-expanding demands "data everywhere" have led a field comprised interesting and productive efforts, but without central or coordinated agenda. most acute information challenges today stem from organizations (e.g., enterprises, government agencies, libraries, "smart" homes) relying on large number diverse, interrelated sources,...

10.1145/1107499.1107502 article EN ACM SIGMOD Record 2005-12-01

WebTables

OPENALEX - Publications

Michael Cafarella Alon Halevy Daisy Zhe Wang Eugene Wu Yang Zhang

The World-Wide Web consists of a huge number unstructured documents, but it also contains structured data in the form HTML tables. We extracted 14.1 billion tables from Google's general-purpose web crawl, and used statistical classification techniques to find estimated 154M that contain high-quality relational data. Because each table has its own "schema" labeled typed columns, such can be considered small database. resulting corpus databases is larger than any other we are aware of, by at...

10.14778/1453856.1453916 article EN Proceedings of the VLDB Endowment 2008-08-01

Data integration: the teenage years

OPENALEX - Publications

Alon Halevy Anand Rajaraman Joann J. Ordille

10.5555/1182635.1164130 article EN Very Large Data Bases 2006-09-01

Reference reconciliation in complex information spaces

OPENALEX - Publications

Xin Luna Dong Alon Halevy Jayant Madhavan

Reference reconciliation is the problem of identifying when different references (i.e., sets attribute values) in a dataset correspond to same real-world entity. Most previous literature assumed single class that had fair number attributes (e.g., research publications). We consider complex information spaces: our belong multiple related classes and each reference may have very few values. A prime example such space Personal Information Management, where goal provide coherent view all on...

10.1145/1066157.1066168 article EN 2005-06-14

ULDBs: databases with uncertainty and lineage

OPENALEX - Publications

Omar Benjelloun Anish Das Sarma Alon Halevy Jennifer Widom

This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage uncertainty. Uncertain data are two important areas management that have been considered extensively in isolation, however many applications require the features tandem. Fundamentally, enables consistent representation uncertain data, it correlates uncertainty query results input processing together presents computational benefits over...

10.5555/1182635.1164209 article EN Very Large Data Bases 2006-09-01

Learning to match ontologies on the Semantic Web

OPENALEX - Publications

AnHai Doan Jayant Madhavan Robin Dhamankar Pedro Domingos Alon Halevy

10.1007/s00778-003-0104-2 article EN The VLDB Journal 2003-11-01

Schema mediation in peer data management systems

OPENALEX - Publications

Alon Halevy Zachary G. Ives Dan Suciu Igor Tatarinov

Intuitively, data management and integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: typically require comprehensive schema design before can used to store or share information, are difficult extend because evolution is heavyweight may break backwards compatibility. As result, many small-scale sharing tasks more easily facilitated by nondatabase-oriented that have little support...

10.1109/icde.2003.1260817 article EN 2004-05-06

Semantic-integration research in the database community: A brief survey

OPENALEX - Publications

AnHai Doan Alon Halevy

Semantic integration has been a long-standing challenge for the database community. It received steady attention over past two decades, and now become prominent area of research. In this article, we first review applications that require semantic discuss difficulties underlying process. We then describe recent progress identify open research issues. focus in particular on schema matching, topic much community, but also data matching (for example, tuple deduplication) issues beyond match...

10.1609/aimag.v26i1.1801 article EN AI Magazine 2005-03-15

iMAP

OPENALEX - Publications

Robin Dhamankar Yoonkyong Lee AnHai Doan Alon Halevy Pedro Domingos

Creating semantic matches between disparate data sources is fundamental to numerous sharing efforts. Manually creating extremely tedious and error-prone. Hence many recent works have focused on automating the matching process. To date, however, virtually all of these deal only with one-to-one (1-1) matches, such as address = location. They do not consider important class more complex concat (city, state) room-pric room-rate*(1 + tax-rate).We describe iMAP system which semi-automatically...

10.1145/1007568.1007612 article EN 2004-06-13

Corpus-Based Schema Matching

OPENALEX - Publications

Jayant Madhavan P.A. Bernstein AnHai Doan Alon Halevy

Schema matching is the problem of identifying corresponding elements in different schemas. Discovering these correspondences or matches inherently difficult to automate. Past solutions have proposed a principled combination multiple algorithms. However, sometimes perform rather poorly due lack sufficient evidence schemas being matched. In this paper we show how corpus and mappings can be used augment about matched, so they matched better. Such typically contains that model similar concepts...

10.1109/icde.2005.39 article EN 2005-04-19

Working Models for Uncertain Data

OPENALEX - Publications

Akash Das Sarma Omar Benjelloun Alon Halevy Jennifer Widom

This paper explores an inherent tension in modeling and querying uncertain data: simple, intuitive representations of data capture many application requirements, but these are generally incomplete―standard operations over the may result unrepresentable types uncertainty. Complete models theoretically attractive, they can be nonintuitive more complex than necessary for applications. To address this tension, we propose a two-layer approach to managing underlying logical model that is complete,...

10.1109/icde.2006.174 article EN 2006-01-01

Principles of dataspace systems

OPENALEX - Publications

Alon Halevy Michael J. Franklin David Maier

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means managing them in convenient, integrated, or principled fashion. These arise enterprise and government management, digital libraries, "smart" homes personal management. We have proposed dataspaces as abstraction for these diverse applications DataSpace Support Platforms (DSSPs) systems that should be built to provide the required...

10.1145/1142351.1142352 article EN 2006-06-26

Google's Deep Web crawl

OPENALEX - Publications

Jayant Madhavan David Ko Łucja Kot Vignesh Ganapathy Alex Rasmussen and 1 more

The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents large portion of the structured data on accessing Deep-Web long-standing challenge for database community. This paper describes system surfacing content, pre-computing submissions each form and adding resulting pages into index. results our have incorporated Google today drive more than thousand queries per second to content. Surfacing Web poses...

10.14778/1454159.1454163 article EN Proceedings of the VLDB Endowment 2008-08-01

Recovering semantics of tables on the web

OPENALEX - Publications

Petros Venetis Alon Halevy Jayant Madhavan Marius Paşca Warren Shen and 3 more

The Web offers a corpus of over 100 million tables [6], but the meaning each table is rarely explicit from itself. Header rows exist in few cases and even when they do, attribute names are typically useless. We describe system that attempts to recover semantics by enriching with additional annotations. Our annotations facilitate operations such as searching for finding related tables. To tables, we leverage database class labels relationships automatically extracted Web. classes has very...

10.14778/2002938.2002939 article EN Proceedings of the VLDB Endowment 2011-06-01

Data integration with uncertainty

OPENALEX - Publications

Xin Luna Dong Alon Halevy Cong Yu

10.1007/s00778-008-0119-9 article EN The VLDB Journal 2008-11-13

Piazza

OPENALEX - Publications

Alon Halevy Zachary G. Ives Peter Mork Igor Tatarinov

The Semantic Web envisions a World Wide in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings concepts developed techniques reasoning about them, using RDF as the model. flourish, needs to be able accommodate huge amounts of existing operating on them. achieve this, we are faced two problems. First, most world's available not but XML; XML consuming it rely only domain structure...

10.1145/775152.775231 article EN 2003-01-01

Updating XML

OPENALEX - Publications

Igor Tatarinov Zachary G. Ives Alon Halevy Daniel S. Weld

As XML has developed over the past few years, its role expanded beyond original domain as a semantics-preserving markup language for online documents, and it is now also de facto format interchanging data between heterogeneous systems. Data sources expert "views" their data, other system can directly import or query these views. result, there been great interest in languages systems expressing queries whether stored repository generated view some storage format.

10.1145/375663.375720 article EN 2001-05-01

Coming Soon ...