NFDI4DS | UHH-SEMS - Publication Details

Laure Berti‐Équille

ORCID: 0000-0002-8046-0570

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5091872345

Research Areas

Data Quality and Management
Semantic Web and Ontologies
Data Mining Algorithms and Applications
Advanced Database Systems and Queries
Data Management and Algorithms
Big Data and Business Intelligence
Scientific Computing and Data Management
Anomaly Detection Techniques and Applications
Topic Modeling
Image Retrieval and Classification Techniques
Biomedical Text Mining and Ontologies
Big Data Technologies and Applications
Data Stream Mining Techniques
Advanced Image and Video Retrieval Techniques
Privacy-Preserving Technologies in Data
Time Series Analysis and Forecasting
Rough Sets and Fuzzy Logic
Mobile Crowdsensing and Crowdsourcing
Machine Learning and Data Classification
Advanced Text Analysis Techniques
Misinformation and Its Impacts
Explainable Artificial Intelligence (XAI)
Remote-Sensing Image Classification
Web Data Mining and Analysis
Data Visualization and Analytics

Acteurs, Ressources et Territoires dans le Développement
2015-2025

Institut de Recherche pour le Développement
2016-2025

UMR Espace-Dev
2016-2025

Office of Scientific and Technical Information
2024

Oak Ridge National Laboratory
2024

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
2024

Aix-Marseille Université
2012-2023

Hamad bin Khalifa University
2015-2021

Laboratoire d’Informatique et Systèmes
2018-2021

Centre National de la Recherche Scientifique
2012-2021

Integrating conflicting data

OPENALEX - Publications

Xin Luna Dong Laure Berti‐Équille Divesh Srivastava

Many data management applications, such as setting up Web portals, managing enterprise data, community and sharing scientific require integrating from multiple sources. Each of these sources provides a set values different can often provide conflicting values. To present quality to users, it is critical that integration systems resolve conflicts discover true Typically, we expect value be provided by more than any particular false one, so take the majority truth. Unfortunately, spread...

10.14778/1687627.1687690 article EN Proceedings of the VLDB Endowment 2009-08-01

Truth discovery and copying detection in a dynamic world

OPENALEX - Publications

Xin Luna Dong Laure Berti‐Équille Divesh Srivastava

Modern information management applications often require integrating data from a variety of sources, some which may copy or buy other sources. When these sources model dynamically changing world ( e.g. , people's contact changes over time, restaurants open and go out business), provide out-of-date data. Errors can also creep into when are updated often. Given erroneous provided by different, possibly dependent, it is challenging for integration systems to the true values. Straightforward...

10.14778/1687627.1687691 article EN Proceedings of the VLDB Endowment 2009-08-01

Global detection of complex copying relationships between sources

OPENALEX - Publications

Dong Xin Laure Berti‐Équille Yifan Hu Divesh Srivastava

Web technologies have enabled data sharing between sources but also simplified copying (and often publishing without proper attribution). The relationships can be complex: some copy from multiple on different subsets of data; co-copy the same source, and transitively another. Understanding such is desirable both for business purposes improving many key components in integration, as resolving conflicts across various sources, reconciling distinct references to real-world entity, efficiently...

10.14778/1920841.1921008 article EN Proceedings of the VLDB Endowment 2010-09-01

Don't be SCAREd

OPENALEX - Publications

Mohamed Yakout Laure Berti‐Équille Ahmed K. Elmagarmid

Various computational procedures or constraint-based methods for data repairing have been proposed over the last decades to identify errors and, when possible, correct them. However, these approaches several limitations including scalability and quality of values be used in replacement errors. In this paper, we propose a new approach that is based on maximizing likelihood given distribution, which can modeled using statistical machine learning techniques. This novel combining cleaning dirty...

10.1145/2463676.2463706 preprint EN 2013-06-22

Phytolith signal of aquatic plants and soils in Chad, Central Africa

OPENALEX - Publications

Alice Novello Doris Barboni Laure Berti‐Équille Jean-Charles Mazur Pierre Poilecot and 1 more

10.1016/j.revpalbo.2012.03.010 article EN Review of Palaeobotany and Palynology 2012-04-06

Discovery of complex glitch patterns: A novel approach to Quantitative Data Cleaning

OPENALEX - Publications

Laure Berti‐Équille Tamraparni Dasu Divesh Srivastava

Quantitative Data Cleaning (QDC) is the use of statistical and other analytical techniques to detect, quantify, correct data quality problems (or glitches). Current QDC approaches focus on addressing each category glitch individually. However, in real-world data, different types glitches co-occur complex patterns. These patterns interactions between offer valuable clues for developing effective domain-specific quantitative cleaning strategies. In this paper, we address shortcomings extant...

10.1109/icde.2011.5767864 preprint EN 2011-04-01

Truth Discovery Algorithms: An Experimental Evaluation

OPENALEX - Publications

Dalia Attia Waguih Laure Berti‐Équille

A fundamental problem in data fusion is to determine the veracity of multi-source order resolve conflicts. While previous work truth discovery has proved be useful practice for specific settings, sources' behavior or set characteristics, there been limited systematic comparison competing methods terms efficiency, usability, and repeatability. We remedy this deficit by providing a comprehensive review 12 state-of-the art algorithms discovery. provide reference implementations an in-depth...

10.48550/arxiv.1409.6428 preprint EN cc-by arXiv (Cornell University) 2014-01-01

VERA

OPENALEX - Publications

Mouhamadou Ba Laure Berti‐Équille Kushal Shah Hossam M. Hammady

Social networks and the Web in general are characterized by multiple information sources often claiming conflicting data values. Data veracity is hard to estimate, especially when there no prior knowledge about or claims time-dependent scenarios where initially very few observers can report first information. Despite wide set of recently proposed truth discovery approaches, "no-one-fits-all" solution emerges for estimating on-line open contexts. However, analyzing space disagreeing might be...

10.1145/2872518.2890536 preprint EN 2016-01-01

Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion

OPENALEX - Publications

Grigor Bezirganyan Sana Sellami Laure Berti‐Équille Sébastien Fournier

Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving, where information is drawn from multiple sources or modalities such as images, texts, audios, videos. However, effectively managing uncertainty - arising noise, insufficient evidence, conflicts between crucial for reliable decision-making. Current uncertainty-aware ML methods leveraging, example, evidence averaging, accumulation underestimate uncertainties high-conflict scenarios. Moreover,...

10.48550/arxiv.2412.18024 preprint EN other-oa 2025-02-13

Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence

OPENALEX - Publications

Laure Berti‐Équille Anish Das Sarma Xin Dong Amélie Marian and 1 more

The Web has enabled the availability of a huge amount useful information, but also eased ability to spread false information and rumors across multiple sources, making it hard distinguish between what is true not. Recent examples include premature Steve Jobs obituary, second bankruptcy United airlines, creation Black Holes by operation Large Hadron Collider, etc. Since important permit expression dissenting conflicting opinions, would be fallacy try ensure that provides only consistent...

10.48550/arxiv.0909.1776 preprint EN cc-by arXiv (Cornell University) 2009-01-01

Remote sensing image analysis by aggregation of segmentation-classification collaborative agents

OPENALEX - Publications

Andrès Troya-Galvis Pierre Gançarski Laure Berti‐Équille

10.1016/j.patcog.2017.08.030 article EN Pattern Recognition 2017-08-31

Discovery of genuine functional dependencies from relational data with missing values

OPENALEX - Publications

Laure Berti‐Équille Hazar Harmouch Felix Naumann Noël Novelli Saravanan Thirumuruganathan

Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce consistency and guide repairs over a database. In this work, we investigate the problem of missing values its impact on FD discovery. When using existing discovery algorithms, some genuine FDs could not detected precisely due or non-genuine discovered even though they are caused by with certain NULL semantics. We define notion genuineness propose algorithms compute score FD. This...

10.14778/3204028.3204032 article EN Proceedings of the VLDB Endowment 2018-04-01

Learn2Clean: Optimizing the Sequence of Tasks for Web Data Preparation

OPENALEX - Publications

Laure Berti‐Équille

Data cleaning and preparation has been a long-standing challenge in data science to avoid incorrect results misleading conclusions obtained from dirty data. For given dataset machine learning-based task, plethora of preprocessing techniques alternative curation strategies may lead dramatically different outputs with unequal quality performance. Most current work on automated learning, however, focus developing either algorithms or user-guided systems

10.1145/3308558.3313602 preprint EN 2019-05-13

Potential application of macroinvertebrates indices in bioassessment of Mexican streams

OPENALEX - Publications

Eva Carmina Serrano Balderas Corinne Grac Laure Berti‐Équille M. A. Armienta

10.1016/j.ecolind.2015.10.007 article EN Ecological Indicators 2015-11-07

Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics

OPENALEX - Publications

Laure Berti‐Équille Javier Borge‐Holthoefer

On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information,

10.2200/s00676ed1v01y201509dtm042 article EN Synthesis lectures on data management 2015-12-23

Unsupervised Quantification of Under- and Over-Segmentation for Object-Based Remote Sensing Image Analysis

OPENALEX - Publications

Andrès Troya-Galvis Pierre Gançarski Nicolas Passat Laure Berti‐Équille

Object-based image analysis (OBIA) has been widely adopted as a common paradigm to deal with very high-resolution remote sensing images. Nevertheless, OBIA methods strongly depend on the results of segmentation. Many segmentation quality metrics have proposed. Supervised give accurate estimation but require ground-truth reference. Unsupervised only make use intrinsic and segment properties; yet most them application do not well variability objects in Furthermore, few developed context mainly...

10.1109/jstars.2015.2424457 article EN IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2015-05-01

UGuide

OPENALEX - Publications

Saravanan Thirumuruganathan Laure Berti‐Équille Mourad Ouzzani Jorge-Arnulfo Quiané-Ruiz Nan Tang

Error detection is the process of identifying problematic data cells that are different from their ground truth. Functional dependencies (FDs) have been widely studied in support this process. Oftentimes, it assumed FDs given by experts. Unfortunately, usually hard and expensive for experts to define such FDs. In addition, automatic profiling over dirty order find correct known be a problem. paper, we propose an end-to-end solution detect FD-detectable errors data. The broad intuition...

10.1145/3035918.3064024 preprint EN 2017-05-09

Rheem

OPENALEX - Publications

Divy Agrawal Lamine Ba Laure Berti‐Équille Sanjay Chawla Ahmed K. Elmagarmid and 10 more

Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution applications. It features three-layer abstraction new query optimization approach settings. We will demonstrate the strengths of system by using real-world scenarios three different namely, machine learning, cleaning, fusion.

10.1145/2882903.2899414 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

The Need for Interpretable Features

OPENALEX - Publications

Alexandra Zytek Ignacio Arnaldo Dongyu Liu Laure Berti‐Équille Kalyan Veeramachaneni

Through extensive experience developing and explaining machine learning (ML) applications for real-world domains, we have learned that ML models are only as interpretable their features. Even simple, highly model types such regression can be difficult or impossible to understand if they use uninterpretable Different users, especially those using decision-making in may require different levels of feature interpretability. Furthermore, based on our experiences, claim the term "interpretable...

10.1145/3544903.3544905 article EN ACM SIGKDD Explorations Newsletter 2022-06-02

Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems

OPENALEX - Publications

Hima Patel Shanmukha Guttula Ruhi Sharma Mittal Naresh Manwani Laure Berti‐Équille and 1 more

It is widely accepted that data preparation one of the most time-consuming steps machine learning (ML) lifecycle. also important steps, as quality directly influences a model. In this tutorial, we will discuss importance and role exploratory analysis (EDA) visualisation techniques to find issues for preparation, relevant building ML pipelines. We latest advances in these fields bring out areas need innovation. To make tutorial actionable practitioners, popular open-source packages can get...

10.1145/3534678.3542604 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

AER: Auto-Encoder with Regression for Time Series Anomaly Detection

OPENALEX - Publications

Ka Sing Wong Dongyu Liu Laure Berti‐Équille Sarah Alnegheimish Kalyan Veeramachaneni

Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled ambiguous definitions anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress tackling this problem using either single-timestamp predictions or reconstructions. While traditionally considered separately, are not mutually...

10.1109/bigdata55660.2022.10020857 article EN 2021 IEEE International Conference on Big Data (Big Data) 2022-12-17

A FRAMEWORK FOR QUALITY EVALUATION IN DATA INTEGRATION SYSTEMS

OPENALEX - Publications

Jacky Akoka Laure Berti‐Équille Omar Boucelma Mokrane Bouzeghoub Isabelle Comyn-Wattiau and 6 more

Ensuring and maximizing the quality integrity of information is a crucial process for today enterprise systems (EIS). It requires clear understanding interdependencies between dimensions characterizing data (QoD), conceptual model (QoM) database, keystone EIS, management integration processes (QoP). The improvement one dimension (such as accuracy or expressiveness) may have negative consequences on other (e.g., freshness completeness data). In this paper we briefly present framework, called...

10.5220/0002378301700175 preprint EN cc-by-nc-nd 2007-01-01

Data veracity estimation with ensembling truth discovery methods

OPENALEX - Publications

Laure Berti‐Équille

Estimation of data veracity is recognized as one the grand challenges big data. Typically, goal truth discovery to determine multi-source, conflicting and return, outputs, a label confidence score for each value, along with trustworthiness source claiming it. Although plethora methods has been proposed, it unlikely technique dominates all others across sets. Furthermore, performance evaluation entirely depends on availability labeled ground (i.e., whose manually checked). In context Big...

10.1109/bigdata.2015.7364062 article EN 2021 IEEE International Conference on Big Data (Big Data) 2015-10-01

Coming Soon ...