Nezihe Merve Gürel

ORCID: 0000-0002-4747-2406
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning and Data Classification
  • Advanced Graph Neural Networks
  • Topic Modeling
  • Data Quality and Management
  • Radio Astronomy Observations and Technology
  • Privacy-Preserving Technologies in Data
  • Machine Learning and Algorithms
  • Neural Networks and Applications
  • Bayesian Modeling and Causal Inference
  • Sparse and Compressive Sensing Techniques
  • Scientific Computing and Data Management
  • Data Stream Mining Techniques
  • Data Management and Algorithms
  • Advanced Neural Network Applications
  • Recommender Systems and Techniques
  • Adversarial Robustness in Machine Learning
  • Auction Theory and Applications
  • Advanced Image and Video Retrieval Techniques
  • Geophysics and Gravity Measurements
  • Statistical and numerical algorithms
  • Complexity and Algorithms in Graphs
  • Advanced Database Systems and Queries
  • Graph Theory and Algorithms
  • Mathematical Analysis and Transform Methods
  • Anomaly Detection Techniques and Applications

Delft University of Technology
2024

ETH Zurich
2018-2021

IBM Research - Zurich
2017

Given a data set D containing millions of points and consumer who is willing to pay for $ X train machine learning (ML) model over , how should we distribute this $X each point reflect its "value"? In paper, define the "relative value data" via Shapley value, as it uniquely possesses properties with appealing real-world interpretations, such fairness, rationality decentralizability. For general, bounded utility functions, known be challenging compute: get values all N points, requires O (2 )...

10.14778/3342263.3342637 article EN Proceedings of the VLDB Endowment 2019-07-01

Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between two lies in strong dependency quality of model and data used train or perform evaluations. In this work, we demonstrate how different aspects propagate through various stages By performing joint analysis impact well-known dimensions downstream process, show that components typical MLOps pipeline efficiently designed, providing both...

10.48550/arxiv.2102.07750 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables , which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose "Certain...

10.14778/3430915.3430917 article EN Proceedings of the VLDB Endowment 2020-11-01

Mining from graph-structured data is an integral component of graph management. A recent trending technique, convolutional network (GCN), has gained momentum in the mining field, and plays essential part numerous graph-related tasks. Although emerging GCN optimization techniques bring improvements to specific scenarios, they perform diversely different applications introduce many trial-and-error costs for practitioners. Moreover, existing models often suffer oversmoothing problem. Besides,...

10.1145/3447548.3467312 article EN 2021-08-13

High-order interactive features capture the correlation between different columns and thus are promising to enhance various learning tasks on ubiquitous tabular data. To automate generation of features, existing works either explicitly traverse feature space or implicitly express interactions via intermediate activations some designed models. These two kinds methods show that there is essentially a trade-off interpretability search efficiency. possess both their merits, we propose novel...

10.1145/3447548.3467066 article EN 2021-08-13

Modern scientific instruments produce vast amounts of data, which can overwhelm the processing ability computer systems. Lossy compression data is an intriguing solution, but comes with its own drawbacks, such as potential signal loss, and need for careful optimization ratio. In this work, we focus on a setting where problem especially acute: compressive sensing frameworks interferometry medical imaging. We ask following question: precision representation be lowered all inputs, recovery...

10.1109/tsp.2020.3010355 article EN IEEE Transactions on Signal Processing 2020-01-01

"How much is my data worth?" an increasingly common question posed by organizations and individuals alike. An answer to this could allow, for instance, fairly distributing profits among multiple contributors determining prospective compensation when breaches happen. In paper, we study the problem of valuation utilizing Shapley value, a popular notion value which originated in cooperative game theory. The defines unique payoff scheme that satisfies many desiderata value. However, often...

10.48550/arxiv.1902.10275 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented (RAG) have been proposed to enhance credibility generations by grounding external knowledge, but theoretical understandings their generation risks remains unexplored. In this paper, we answer: 1) whether RAG can indeed lead low risks, 2) how provide provable guarantees on vanilla LLMs, 3)...

10.48550/arxiv.2402.03181 preprint EN arXiv (Cornell University) 2024-02-05

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables, which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose Predictions (CP)...

10.3929/ethz-b-000494739 article EN Very Large Data Bases 2020-05-11

Despite the great successes achieved by deep neural networks (DNNs), recent studies show that they are vulnerable against adversarial examples, which aim to mislead DNNs adding small perturbations. Several defenses have been proposed such attacks, while many of them adaptively attacked. In this work, we enhance ML robustness from a different perspective leveraging domain knowledge: We propose Knowledge Enhanced Machine Learning Pipeline (KEMLP) integrate knowledge (i.e., logic relationships...

10.48550/arxiv.2106.06235 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Given $k$ pre-trained classifiers and a stream of unlabeled data examples, how can we actively decide when to query label so that distinguish the best model from rest while making small number queries? Answering this question has profound impact on range practical scenarios. In work, design an online selective sampling approach selects informative examples outputs with high probability at any round. Our algorithm be used for prediction tasks both adversarial stochastic streams. We establish...

10.48550/arxiv.2010.09818 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Methods for carefully selecting or generating a small set of training data to learn from, i.e., pruning, coreset selection, and distillation, have been shown be effective in reducing the ever-increasing cost neural networks. Behind this success are rigorously designed strategies identifying informative examples out large datasets. However, these come with additional computational costs associated subset selection distillation before begins, furthermore, many even under-perform random...

10.48550/arxiv.2305.18424 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Drawing from discussions at the inaugural DMLR workshop ICML 2023 and meetings prior, in this report we outline relevance of community engagement infrastructure development for creation next-generation public datasets that will advance machine learning science. We chart a path forward as collective effort to sustain maintenance these methods towards positive scientific, societal business impact.

10.48550/arxiv.2311.13028 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Machine learning (ML) applications have been thriving recently, largely attributed to the increasing availability of data. However, inconsistency and incomplete information are ubiquitous in real-world datasets, their impact on ML remains elusive. In this paper, we present a formal study by extending notion Certain Answers for Codd tables, which has explored database research community decades, into field machine learning. Specifically, focus classification problems propose "Certain...

10.48550/arxiv.2005.05117 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Conformal prediction has shown spurring performance in constructing statistically rigorous sets for arbitrary black-box machine learning models, assuming the data is exchangeable. However, even small adversarial perturbations during inference can violate exchangeability assumption, challenge coverage guarantees, and result a subsequent decline empirical coverage. In this work, we propose certifiably robust learning-reasoning conformal framework (COLEP) via probabilistic circuits, which...

10.48550/arxiv.2403.11348 preprint EN arXiv (Cornell University) 2024-03-17

Federated learning (FL) has emerged as a prominent method for collaboratively training machine models using local data from edge devices, all while keeping decentralized. However, accounting the quality of contributed by clients remains critical challenge in FL, are often susceptible to corruption various forms noise and perturbations, which compromise aggregation process lead subpar global model. In this work, we focus on addressing problem noisy input space, an under-explored area compared...

10.48550/arxiv.2409.02189 preprint EN arXiv (Cornell University) 2024-09-03

With the multitude of pretrained models available thanks to advancements in large-scale supervised and self-supervised learning, choosing right model is becoming increasingly pivotal machine learning lifecycle. However, much like training process, best off-the-shelf for raw, unlabeled data a labor-intensive task. To overcome this, we introduce MODEL SELECTOR, framework label-efficient selection classifiers. Given pool target data, SELECTOR samples small subset highly informative examples...

10.48550/arxiv.2410.13609 preprint EN arXiv (Cornell University) 2024-10-17

10.1109/bigdata62323.2024.10825502 article EN 2021 IEEE International Conference on Big Data (Big Data) 2024-12-15

Radio interferometry usually compensates for high levels of noise in sensor/antenna electronics by throwing data and energy at the problem: observe longer, then store process it all. Furthermore, only end image is cleaned, reducing flexibility substantially. We propose instead a method to remove explicitly before imaging. To this end, we developed an algorithm that first decomposes sensor signals into components using Singular Spectrum Analysis cluster these graph Laplacian matrix. show...

10.1109/icip.2017.8296659 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2017-09-01
Coming Soon ...