Laure Berti‐Équille

ORCID: 0000-0002-8046-0570
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Data Quality and Management
  • Semantic Web and Ontologies
  • Data Mining Algorithms and Applications
  • Advanced Database Systems and Queries
  • Data Management and Algorithms
  • Big Data and Business Intelligence
  • Scientific Computing and Data Management
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Image Retrieval and Classification Techniques
  • Biomedical Text Mining and Ontologies
  • Big Data Technologies and Applications
  • Data Stream Mining Techniques
  • Advanced Image and Video Retrieval Techniques
  • Privacy-Preserving Technologies in Data
  • Time Series Analysis and Forecasting
  • Rough Sets and Fuzzy Logic
  • Mobile Crowdsensing and Crowdsourcing
  • Machine Learning and Data Classification
  • Advanced Text Analysis Techniques
  • Misinformation and Its Impacts
  • Explainable Artificial Intelligence (XAI)
  • Remote-Sensing Image Classification
  • Web Data Mining and Analysis
  • Data Visualization and Analytics

Acteurs, Ressources et Territoires dans le Développement
2015-2025

Institut de Recherche pour le Développement
2016-2025

UMR Espace-Dev
2016-2025

Office of Scientific and Technical Information
2024

Oak Ridge National Laboratory
2024

Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier
2024

Aix-Marseille Université
2012-2023

Hamad bin Khalifa University
2015-2021

Laboratoire d’Informatique et Systèmes
2018-2021

Centre National de la Recherche Scientifique
2012-2021

Many data management applications, such as setting up Web portals, managing enterprise data, community and sharing scientific require integrating from multiple sources. Each of these sources provides a set values different can often provide conflicting values. To present quality to users, it is critical that integration systems resolve conflicts discover true Typically, we expect value be provided by more than any particular false one, so take the majority truth. Unfortunately, spread...

10.14778/1687627.1687690 article EN Proceedings of the VLDB Endowment 2009-08-01

Modern information management applications often require integrating data from a variety of sources, some which may copy or buy other sources. When these sources model dynamically changing world ( e.g. , people's contact changes over time, restaurants open and go out business), provide out-of-date data. Errors can also creep into when are updated often. Given erroneous provided by different, possibly dependent, it is challenging for integration systems to the true values. Straightforward...

10.14778/1687627.1687691 article EN Proceedings of the VLDB Endowment 2009-08-01

Web technologies have enabled data sharing between sources but also simplified copying (and often publishing without proper attribution). The relationships can be complex: some copy from multiple on different subsets of data; co-copy the same source, and transitively another. Understanding such is desirable both for business purposes improving many key components in integration, as resolving conflicts across various sources, reconciling distinct references to real-world entity, efficiently...

10.14778/1920841.1921008 article EN Proceedings of the VLDB Endowment 2010-09-01

Various computational procedures or constraint-based methods for data repairing have been proposed over the last decades to identify errors and, when possible, correct them. However, these approaches several limitations including scalability and quality of values be used in replacement errors. In this paper, we propose a new approach that is based on maximizing likelihood given distribution, which can modeled using statistical machine learning techniques. This novel combining cleaning dirty...

10.1145/2463676.2463706 preprint EN 2013-06-22

Quantitative Data Cleaning (QDC) is the use of statistical and other analytical techniques to detect, quantify, correct data quality problems (or glitches). Current QDC approaches focus on addressing each category glitch individually. However, in real-world data, different types glitches co-occur complex patterns. These patterns interactions between offer valuable clues for developing effective domain-specific quantitative cleaning strategies. In this paper, we address shortcomings extant...

10.1109/icde.2011.5767864 preprint EN 2011-04-01

A fundamental problem in data fusion is to determine the veracity of multi-source order resolve conflicts. While previous work truth discovery has proved be useful practice for specific settings, sources' behavior or set characteristics, there been limited systematic comparison competing methods terms efficiency, usability, and repeatability. We remedy this deficit by providing a comprehensive review 12 state-of-the art algorithms discovery. provide reference implementations an in-depth...

10.48550/arxiv.1409.6428 preprint EN cc-by arXiv (Cornell University) 2014-01-01

Social networks and the Web in general are characterized by multiple information sources often claiming conflicting data values. Data veracity is hard to estimate, especially when there no prior knowledge about or claims time-dependent scenarios where initially very few observers can report first information. Despite wide set of recently proposed truth discovery approaches, "no-one-fits-all" solution emerges for estimating on-line open contexts. However, analyzing space disagreeing might be...

10.1145/2872518.2890536 preprint EN 2016-01-01

Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving, where information is drawn from multiple sources or modalities such as images, texts, audios, videos. However, effectively managing uncertainty - arising noise, insufficient evidence, conflicts between crucial for reliable decision-making. Current uncertainty-aware ML methods leveraging, example, evidence averaging, accumulation underestimate uncertainties high-conflict scenarios. Moreover,...

10.48550/arxiv.2412.18024 preprint EN other-oa 2025-02-13

The Web has enabled the availability of a huge amount useful information, but also eased ability to spread false information and rumors across multiple sources, making it hard distinguish between what is true not. Recent examples include premature Steve Jobs obituary, second bankruptcy United airlines, creation Black Holes by operation Large Hadron Collider, etc. Since important permit expression dissenting conflicting opinions, would be fallacy try ensure that provides only consistent...

10.48550/arxiv.0909.1776 preprint EN cc-by arXiv (Cornell University) 2009-01-01

Functional dependencies (FDs) play an important role in maintaining data quality. They can be used to enforce consistency and guide repairs over a database. In this work, we investigate the problem of missing values its impact on FD discovery. When using existing discovery algorithms, some genuine FDs could not detected precisely due or non-genuine discovered even though they are caused by with certain NULL semantics. We define notion genuineness propose algorithms compute score FD. This...

10.14778/3204028.3204032 article EN Proceedings of the VLDB Endowment 2018-04-01

Data cleaning and preparation has been a long-standing challenge in data science to avoid incorrect results misleading conclusions obtained from dirty data. For given dataset machine learning-based task, plethora of preprocessing techniques alternative curation strategies may lead dramatically different outputs with unequal quality performance. Most current work on automated learning, however, focus developing either algorithms or user-guided systems

10.1145/3308558.3313602 preprint EN 2019-05-13

On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information,

10.2200/s00676ed1v01y201509dtm042 article EN Synthesis lectures on data management 2015-12-23

Object-based image analysis (OBIA) has been widely adopted as a common paradigm to deal with very high-resolution remote sensing images. Nevertheless, OBIA methods strongly depend on the results of segmentation. Many segmentation quality metrics have proposed. Supervised give accurate estimation but require ground-truth reference. Unsupervised only make use intrinsic and segment properties; yet most them application do not well variability objects in Furthermore, few developed context mainly...

10.1109/jstars.2015.2424457 article EN IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2015-05-01

Error detection is the process of identifying problematic data cells that are different from their ground truth. Functional dependencies (FDs) have been widely studied in support this process. Oftentimes, it assumed FDs given by experts. Unfortunately, usually hard and expensive for experts to define such FDs. In addition, automatic profiling over dirty order find correct known be a problem. paper, we propose an end-to-end solution detect FD-detectable errors data. The broad intuition...

10.1145/3035918.3064024 preprint EN 2017-05-09

Many emerging applications, from domains such as healthcare and oil & gas, require several data processing systems for complex analytics. This demo paper showcases system, a framework that provides multi-platform task execution applications. It features three-layer abstraction new query optimization approach settings. We will demonstrate the strengths of system by using real-world scenarios three different namely, machine learning, cleaning, fusion.

10.1145/2882903.2899414 article EN Proceedings of the 2022 International Conference on Management of Data 2016-06-16

Through extensive experience developing and explaining machine learning (ML) applications for real-world domains, we have learned that ML models are only as interpretable their features. Even simple, highly model types such regression can be difficult or impossible to understand if they use uninterpretable Different users, especially those using decision-making in may require different levels of feature interpretability. Furthermore, based on our experiences, claim the term "interpretable...

10.1145/3544903.3544905 article EN ACM SIGKDD Explorations Newsletter 2022-06-02

It is widely accepted that data preparation one of the most time-consuming steps machine learning (ML) lifecycle. also important steps, as quality directly influences a model. In this tutorial, we will discuss importance and role exploratory analysis (EDA) visualisation techniques to find issues for preparation, relevant building ML pipelines. We latest advances in these fields bring out areas need innovation. To make tutorial actionable practitioners, popular open-source packages can get...

10.1145/3534678.3542604 article EN Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022-08-12

Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled ambiguous definitions anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress tackling this problem using either single-timestamp predictions or reconstructions. While traditionally considered separately, are not mutually...

10.1109/bigdata55660.2022.10020857 article EN 2021 IEEE International Conference on Big Data (Big Data) 2022-12-17

Ensuring and maximizing the quality integrity of information is a crucial process for today enterprise systems (EIS). It requires clear understanding interdependencies between dimensions characterizing data (QoD), conceptual model (QoM) database, keystone EIS, management integration processes (QoP). The improvement one dimension (such as accuracy or expressiveness) may have negative consequences on other (e.g., freshness completeness data). In this paper we briefly present framework, called...

10.5220/0002378301700175 preprint EN cc-by-nc-nd 2007-01-01

Estimation of data veracity is recognized as one the grand challenges big data. Typically, goal truth discovery to determine multi-source, conflicting and return, outputs, a label confidence score for each value, along with trustworthiness source claiming it. Although plethora methods has been proposed, it unlikely technique dominates all others across sets. Furthermore, performance evaluation entirely depends on availability labeled ground (i.e., whose manually checked). In context Big...

10.1109/bigdata.2015.7364062 article EN 2021 IEEE International Conference on Big Data (Big Data) 2015-10-01
Coming Soon ...