NFDI4DS | UHH-SEMS - Publication Details

Nezihe Merve Gürel

ORCID: 0000-0002-4747-2406

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5016955253

Research Areas

Machine Learning and Data Classification
Advanced Graph Neural Networks
Topic Modeling
Data Quality and Management
Radio Astronomy Observations and Technology
Privacy-Preserving Technologies in Data
Machine Learning and Algorithms
Neural Networks and Applications
Bayesian Modeling and Causal Inference
Sparse and Compressive Sensing Techniques
Scientific Computing and Data Management
Data Stream Mining Techniques
Data Management and Algorithms
Advanced Neural Network Applications
Recommender Systems and Techniques
Adversarial Robustness in Machine Learning
Auction Theory and Applications
Advanced Image and Video Retrieval Techniques
Geophysics and Gravity Measurements
Statistical and numerical algorithms
Complexity and Algorithms in Graphs
Advanced Database Systems and Queries
Graph Theory and Algorithms
Mathematical Analysis and Transform Methods
Anomaly Detection Techniques and Applications

Delft University of Technology
2024

ETH Zurich
2018-2021

IBM Research - Zurich
2017

Efficient task-specific data valuation for nearest neighbor algorithms

OPENALEX - Publications

Ruoxi Jia David Dao Boxin Wang Frances Ann Hubis Nezihe Merve Gürel and 4 more

Given a data set D containing millions of points and consumer who is willing to pay for $ X train machine learning (ML) model over , how should we distribute this $X each point reflect its "value"? In paper, define the "relative value data" via Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such fairness, rationality decentralizability. For general, bounded utility functions, known be challenging compute: get values all N points, requires O (2 )...

10.14778/3342263.3342637 article EN Proceedings of the VLDB Endowment 2019-07-01

A Data Quality-Driven View of MLOps

OPENALEX - Publications

Cédric Renggli Luka Rimanic Nezihe Merve Gürel Bojan Karlaš Wentao Wu and 1 more

Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between two lies in strong dependency quality of model and data used train or perform evaluations. In this work, we demonstrate how different aspects propagate through various stages By performing joint analysis impact well-known dimensions downstream process, show that components typical MLOps pipeline efficiently designed, providing both...

10.48550/arxiv.2102.07750 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Nearest neighbor classifiers over incomplete information

OPENALEX - Publications

Bojan Karlaš Peng Li Renzhi Wu Nezihe Merve Gürel Xu Chu and 2 more

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables , which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose "Certain...

10.14778/3430915.3430917 article EN Proceedings of the VLDB Endowment 2020-11-01

DeGNN

OPENALEX - Publications

Xupeng Miao Nezihe Merve Gürel Wentao Zhang Zhichao Han Bo Li and 15 more

Mining from graph-structured data is an integral component of graph management. A recent trending technique, convolutional network (GCN), has gained momentum in the mining field, and plays essential part numerous graph-related tasks. Although emerging GCN optimization techniques bring improvements to specific scenarios, they perform diversely different applications introduce many trial-and-error costs for practitioners. Moreover, existing models often suffer oversmoothing problem. Besides,...

10.1145/3447548.3467312 article EN 2021-08-13

FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data

OPENALEX - Publications

Yuexiang Xie Zhen Wang Yaliang Li Bolin Ding Nezihe Merve Gürel and 4 more

High-order interactive features capture the correlation between different columns and thus are promising to enhance various learning tasks on ubiquitous tabular data. To automate generation of features, existing works either explicitly traverse feature space or implicitly express interactions via intermediate activations some designed models. These two kinds methods show that there is essentially a trade-off interpretability search efficiency. possess both their merits, we propose novel...

10.1145/3447548.3467066 article EN 2021-08-13

Compressive Sensing Using Iterative Hard Thresholding With Low Precision Data Representation: Theory and Applications

OPENALEX - Publications

Nezihe Merve Gürel Kaan Kara Alen Stojanov Tyler Smith Thomas Lemmin and 3 more

Modern scientific instruments produce vast amounts of data, which can overwhelm the processing ability computer systems. Lossy compression data is an intriguing solution, but comes with its own drawbacks, such as potential signal loss, and need for careful optimization ratio. In this work, we focus on a setting where problem especially acute: compressive sensing frameworks interferometry medical imaging. We ask following question: precision representation be lowered all inputs, recovery...

10.1109/tsp.2020.3010355 article EN IEEE Transactions on Signal Processing 2020-01-01

Towards Efficient Data Valuation Based on the Shapley Value

OPENALEX - Publications

Ruoxi Jia David Dao Boxin Wang Frances Ann Hubis Nick Hynes and 5 more

"How much is my data worth?" an increasingly common question posed by organizations and individuals alike. An answer to this could allow, for instance, fairly distributing profits among multiple contributors determining prospective compensation when breaches happen. In paper, we study the problem of valuation utilizing Shapley value, a popular notion value which originated in cooperative game theory. The defines unique payoff scheme that satisfies many desiderata value. However, often...

10.48550/arxiv.1902.10275 preprint EN other-oa arXiv (Cornell University) 2019-01-01

C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

OPENALEX - Publications

Mintong Kang Nezihe Merve Gürel Ning Yu Dawn Song Bo Li

Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented (RAG) have been proposed to enhance credibility generations by grounding external knowledge, but theoretical understandings their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead low risks, 2) how provide provable guarantees on vanilla LLMs, 3)...

10.48550/arxiv.2402.03181 preprint EN arXiv (Cornell University) 2024-02-05

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

OPENALEX - Publications

Bojan Karlaš Peng Li Renzhi Wu Nezihe Merve Gürel Xu Chu and 2 more

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables, which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose Predictions (CP)...

10.3929/ethz-b-000494739 article EN Very Large Data Bases 2020-05-11

Knowledge Enhanced Machine Learning Pipeline against Diverse Adversarial Attacks

OPENALEX - Publications

Nezihe Merve Gürel Xiangyu Qi Luka Rimanic Ce Zhang Bo Li

Despite the great successes achieved by deep neural networks (DNNs), recent studies show that they are vulnerable against adversarial examples, which aim to mislead DNNs adding small perturbations. Several defenses have been proposed such attacks, while many of them adaptively attacked. In this work, we enhance ML robustness from a different perspective leveraging domain knowledge: We propose Knowledge Enhanced Machine Learning Pipeline (KEMLP) integrate knowledge (i.e., logic relationships...

10.48550/arxiv.2106.06235 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Online Active Model Selection for Pre-trained Classifiers

OPENALEX - Publications

Mohammad Reza Karimi Nezihe Merve Gürel Bojan Karlaš Johannes Rausch Ce Zhang and 1 more

Given $k$ pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query label so that distinguish the best model from rest while making small number queries? Answering this question has profound impact on range practical scenarios. In work, design an online selective sampling approach selects informative examples outputs with high probability at any round. Our algorithm be used for prediction tasks both adversarial stochastic streams. We establish...

10.48550/arxiv.2010.09818 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

OPENALEX - Publications

Patrik Okanovic Roger Waleffe Vasilis Mageirakos Konstantinos E. Nikolakakis Amin Karbasi and 3 more

Methods for carefully selecting or generating a small set of training data to learn from, i.e., pruning, coreset selection, and distillation, have been shown be effective in reducing the ever-increasing cost neural networks. Behind this success are rigorously designed strategies identifying informative examples out large datasets. However, these come with additional computational costs associated subset selection distillation before begins, furthermore, many even under-perform random...

10.48550/arxiv.2305.18424 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DMLR: Data-centric Machine Learning Research -- Past, Present and Future

OPENALEX - Publications

Luis Oala Manil Maskey Lilith Bat-Leah Alicia Parrish Nezihe Merve Gürel and 33 more

Drawing from discussions at the inaugural DMLR workshop ICML 2023 and meetings prior, in this report we outline relevance of community engagement infrastructure development for creation next-generation public datasets that will advance machine learning science. We chart a path forward as collective effort to sustain maintenance these methods towards positive scientific, societal business impact.

10.48550/arxiv.2311.13028 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Nearest Neighbor Classifiers over Incomplete Information: From Certain Answers to Certain Predictions

OPENALEX - Publications

Bojan Karlaš Peng Li Renzhi Wu Nezihe Merve Gürel Xu Chu and 2 more

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables, which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose "Certain...

10.48550/arxiv.2005.05117 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Ease. ML: A Lifecycle Management System for Machine Learning

OPENALEX - Publications

Leonel Aguilar Melgar David Dao Shaoduo Gan Nezihe Merve Gürel Nora Hollenstein and 15 more

10.3929/ethz-b-000458916 article EN Conference on Innovative Data Systems Research 2021-01-01

COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

OPENALEX - Publications

Mintong Kang Nezihe Merve Gürel Linyi Li Bo Li

Conformal prediction has shown spurring performance in constructing statistically rigorous sets for arbitrary black-box machine learning models, assuming the data is exchangeable. However, even small adversarial perturbations during inference can violate exchangeability assumption, challenge coverage guarantees, and result a subsequent decline empirical coverage. In this work, we propose certifiably robust learning-reasoning conformal framework (COLEP) via probabilistic circuits, which...

10.48550/arxiv.2403.11348 preprint EN arXiv (Cornell University) 2024-03-17

Collaboratively Learning Federated Models from Noisy Decentralized Data

OPENALEX - Publications

Haoyuan Li Mathias Funk Nezihe Merve Gürel Aaqib Saeed

Federated learning (FL) has emerged as a prominent method for collaboratively training machine models using local data from edge devices, all while keeping decentralized. However, accounting the quality of contributed by clients remains critical challenge in FL, are often susceptible to corruption various forms noise and perturbations, which compromise aggregation process lead subpar global model. In this work, we focus on addressing problem noisy input space, an under-explored area compared...

10.48550/arxiv.2409.02189 preprint EN arXiv (Cornell University) 2024-09-03

All models are wrong, some are useful: Model Selection with Limited Labels

OPENALEX - Publications

Patrik Okanovic Andreas Kirsch J. C. Kasper Torsten Hoefler Andreas Krause and 1 more

With the multitude of pretrained models available thanks to advancements in large-scale supervised and self-supervised learning, choosing right model is becoming increasingly pivotal machine learning lifecycle. However, much like training process, best off-the-shelf for raw, unlabeled data a labor-intensive task. To overcome this, we introduce MODEL SELECTOR, framework label-efficient selection classifiers. Given pool target data, SELECTOR samples small subset highly informative examples...

10.48550/arxiv.2410.13609 preprint EN arXiv (Cornell University) 2024-10-17

Collaboratively Learning Federated Models from Noisy Decentralized Data

OPENALEX - Publications

Haoyuan Li Mathias Funk Nezihe Merve Gürel Aaqib Saeed

10.1109/bigdata62323.2024.10825502 article EN 2021 IEEE International Conference on Big Data (Big Data) 2024-12-15

Denoising radio interferometric images by subspace clustering

OPENALEX - Publications

Nezihe Merve Gürel Paul Hurley Matthieu Simeoni

Radio interferometry usually compensates for high levels of noise in sensor/antenna electronics by throwing data and energy at the problem: observe longer, then store process it all. Furthermore, only end image is cleaned, reducing flexibility substantially. We propose instead a method to remove explicitly before imaging. To this end, we developed an algorithm that first decomposes sensor signals into components using Singular Spectrum Analysis cluster these graph Laplacian matrix. show...

10.1109/icip.2017.8296659 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2017-09-01

Coming Soon ...