Subhojyoti Mukherjee

ORCID: 0000-0003-0537-7184
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Bandit Algorithms Research
  • Machine Learning and Algorithms
  • Numerical methods in engineering
  • Reinforcement Learning in Robotics
  • Optimization and Search Problems
  • Advanced Numerical Methods in Computational Mathematics
  • Stochastic Gradient Optimization Techniques
  • Distributed Sensor Networks and Detection Algorithms
  • Machine Learning and Data Classification
  • Data Stream Mining Techniques
  • Electromagnetic Simulation and Numerical Methods
  • Topic Modeling
  • Contact Mechanics and Variational Inequalities
  • Smart Grid Energy Management
  • Computability, Logic, AI Algorithms
  • Domain Adaptation and Few-Shot Learning
  • Bayesian Modeling and Causal Inference
  • Sparse and Compressive Sensing Techniques
  • Auction Theory and Applications
  • Composite Structure Analysis and Optimization
  • Computational Geometry and Mesh Generation
  • Computer Graphics and Visualization Techniques
  • Adversarial Robustness in Machine Learning
  • 3D Shape Modeling and Analysis
  • Mathematics, Computing, and Information Processing

University of Wisconsin–Madison
2018-2021

University of Massachusetts Amherst
2019

Indian Institute of Technology Madras
1995-2018

Cornell University
2007

Texas A&M University
2001

We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in stochastic multi-armed bandit (MAB) setting. EUCBV incorporates arm elimination strategy proposed UCB-Improved, while taking into account variance estimates compute arms' confidence bounds, similar UCBV. Through theoretical analysis we establish that incurs gap-dependent bound which is an improvement over existing state-of-the-art algorithms (such UCB1, UCBV,...

10.1609/aaai.v32i1.12110 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-26

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points labeling training. In recent active learning frameworks, Large Language Models (LLMs) have employed not only selection but also generating entirely new instances providing more cost-effective annotations. Motivated increasing importance of high-quality efficient training in era LLMs, we present comprehensive survey on LLM-based Learning. We introduce...

10.48550/arxiv.2502.11767 preprint EN arXiv (Cornell University) 2025-02-17

In this paper we propose the Augmented-UCB (AugUCB) algorithm for a fixed-budget version of thresholding bandit problem (TBP), where objective is to identify set arms whose quality above threshold. A key feature AugUCB that it uses both mean and variance estimates eliminate have been sufficiently explored; best our knowledge first employ such an approach considered TBP. Theoretically, obtain upper bound on loss (probability mis-classification) incurred by AugUCB. Although UCBEV in literature...

10.24963/ijcai.2017/350 preprint EN 2017-07-28

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions common hidden parameter 8*. Since we do not place any restrictions on these functions, the setting subsumes several previously studied frameworks that assume linear or invertible reward functions. propose novel approach to gradually estimate 8* and use together with substantially reduce exploration sub-optimal arms. This enables us fundamentally generalize classical algorithm...

10.1109/jsait.2020.3041246 article EN publisher-specific-oa IEEE Journal on Selected Areas in Information Theory 2020-11-01

Transduction, the ability to include query-specific examples in prompt at inference time, is one of emergent abilities large language models (LLMs). In this work, we propose a framework for adaptive design called active transductive (ATI). We LLM by adaptively choosing few-shot given query. The are initially unlabeled and query user label most informative ones, which maximally reduces uncertainty prediction. two algorithms, GO SAL, differ how chosen. analyze these algorithms linear models:...

10.48550/arxiv.2404.08846 preprint EN arXiv (Cornell University) 2024-04-12

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by this progress, and the cost obtaining high-quality annotations, we study problem data collection for learning models. The key idea our work is generalize optimal designs, a tool computing efficient logging policies, ranked lists. To show generality ideas, both absolute relative on items list. We design algorithms settings analyze them. prove that model estimators...

10.48550/arxiv.2404.13895 preprint EN arXiv (Cornell University) 2024-04-22

In this paper, we study safe data collection for the purpose of policy evaluation in tabular Markov decision processes (MDPs). evaluation, are given a \textit{target} and asked to estimate expected cumulative reward it will obtain. Policy requires interested question what \textit{behavior} should collect most accurate target policy. While prior work has considered behavior selection, additionally consider safety constraint on Namely, assume there exists known default that incurs particular...

10.48550/arxiv.2406.02165 preprint EN arXiv (Cornell University) 2024-06-04

In this paper, we study multi-task structured bandit problem where the goal is to learn a near-optimal algorithm that minimizes cumulative regret. The tasks share common structure and exploits shared minimize regret for an unseen but related test task. We use transformer as decision-making so generalize prior work of pretrained decision transformers like DPT requires access optimal action during training which may be hard in several scenarios. Diverging from these works, our learning does...

10.48550/arxiv.2406.05064 preprint EN arXiv (Cornell University) 2024-06-07

Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of is costly, a natural question ask if new always needs collected. Or could we evaluate model with on responses another model? This motivates us study off-policy evaluation logged feedback. We formalize problem, propose both model-based model-free estimators for policy values, show how optimize them. analyze unbiasedness our them empirically. Our can predict...

10.48550/arxiv.2406.10030 preprint EN arXiv (Cornell University) 2024-06-14

Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective optimization (MOO), where known at training or inference time. In contrast, when unknown difficult to quantify, natural approach cover the Pareto front by multiple diverse solutions. We propose an algorithm HaM for learning LLM policies that maximizes their...

10.48550/arxiv.2412.05469 preprint EN arXiv (Cornell University) 2024-12-06

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions common hidden parameter $\theta^*$. Since we do not place any restrictions these functions, the setting subsumes several previously studied frameworks that assume linear or invertible reward functions. propose novel approach to gradually estimate $\theta^*$ and use together with substantially reduce exploration sub-optimal arms. This enables us fundamentally generalize classic...

10.48550/arxiv.1810.08164 preprint EN other-oa arXiv (Cornell University) 2018-01-01

We consider the setup of stochastic multi-armed bandits in case when reward distributions are piecewise i.i.d. and bounded with unknown changepoints. focus on changes happen simultaneously all arms, stark contrast existing literature, we target gap-dependent (as opposed to only gap-independent) regret bounds involving magnitude $(Δ^{chg}_{i,g})$ optimality-gaps ($Δ^{opt}_{i,g}$). Diverging from previous works, assume more realistic scenario that there can be undetectable changepoint gaps...

10.48550/arxiv.1905.13159 preprint EN other-oa arXiv (Cornell University) 2019-01-01

We consider a finite-armed structured bandit problem in which mean rewards of different arms are known functions common hidden parameter θ <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sup> . This setting subsumes several previously studied frameworks that assume linear or invertible reward functions. propose novel approach to gradually estimate the and use together with substantially reduce exploration sub-optimal arms. enables us...

10.1109/icassp39728.2021.9413628 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021-05-13

A superconvergent element patch based stress extraction strategy is proposed for general FE postprocessing and/or error estimation procedures in adaptive finite analysis. Generalized weight functions are discrete least-square functionals, and a corrected conjoint polynomial fitting procedure presented to ensure accurate from the domain once primary level parameters have been evaluated. numerical example fix of functions. Several plane examples solved using QUAD4 elements, results compared...

10.1002/(sici)1099-0887(199808)14:8<731::aid-cnm179>3.0.co;2-7 article EN Communications in Numerical Methods in Engineering 1998-08-01

Abstract A variant of the boundary element method, called contour method (BCM), offers a further reduction in dimensionality. Consequently, analysis two‐dimensional (2‐D) problems does not require any numerical integration at all. The is thus very computationally effective and accurate as shown previous related studies. This paper presents development BCM for multi‐region 2‐D elasticity, an application this development, coupled with displacement correlation technique, to evaluating stress...

10.1002/cnm.1060 article EN Communications in Numerical Methods in Engineering 2007-10-11

This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In evaluation, we are given a target and asked to estimate expected cumulative reward it will obtain an environment formalized as MDP. We develop theory optimal within class tree-structured MDPs by first deriving oracle strategy that uses knowledge variance distributions. then introduce Reduced Variance Sampling (ReVar) algorithm approximates when variances unknown priori bound its...

10.48550/arxiv.2203.04510 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Active learning can reduce the number of samples needed to perform a hypothesis test and estimate parameters model. In this paper, we revisit work Chernoff that described an asymptotically optimal algorithm for performing test. We obtain novel sample complexity bound Chernoff's algorithm, with non-asymptotic term characterizes its performance at fixed confidence level. also develop extension sampling be used wide variety models on estimation error. apply our actively learn neural network in...

10.48550/arxiv.2012.08073 preprint EN cc-by arXiv (Cornell University) 2020-01-01

A hybrid error estimator using a priori interior region estimates in an posteriori framework is presented for linear elastostatics problems FEA. It shown that local rates of convergence are augmented by this technique and global not adversely affected. The effects pollution explained derived the concept loads. estimation can improve performance both conventional techniques. series numerical results which demonstrate superior proposed method over previously published © 1998 John Wiley & Sons, Ltd.

10.1002/(sici)1097-0207(19981015)43:3<507::aid-nme434>3.0.co;2-b article EN International Journal for Numerical Methods in Engineering 1998-10-15

The level set estimation problem seeks to find all points in a domain ${\cal X}$ where the value of an unknown function $f:{\cal X}\rightarrow \mathbb{R}$ exceeds threshold $\alpha$. is based on noisy evaluations that may be acquired at sequentially and adaptively chosen locations X}$. $\alpha$ can either \emph{explicit} provided priori, or \emph{implicit} defined relative optimal value, i.e. $\alpha = (1-\epsilon)f(x_\ast)$ for given $\epsilon > 0$ $f(x_\ast)$ maximal unknown. In this work...

10.48550/arxiv.2111.01768 preprint EN cc-by arXiv (Cornell University) 2021-01-01
Coming Soon ...