Abhishek Naik

ORCID: 0009-0008-1427-1609
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Reinforcement Learning in Robotics
  • Traffic control and management
  • Receptor Mechanisms and Signaling
  • Adversarial Robustness in Machine Learning
  • Explainable Artificial Intelligence (XAI)
  • Autonomous Vehicle Technology and Safety
  • Advanced Bandit Algorithms Research
  • Data Mining Algorithms and Applications
  • Complex Network Analysis Techniques
  • Supply Chain and Inventory Management
  • Data Stream Mining Techniques
  • Heart Failure Treatment and Management
  • Optimization and Search Problems
  • Optical Wireless Communication Technologies
  • Transportation and Mobility Innovations
  • Traffic Prediction and Management Techniques
  • Artificial Intelligence in Games
  • Adaptive Dynamic Programming Control
  • Industrial Automation and Control Systems
  • Advanced Clustering Algorithms Research
  • Machine Learning in Healthcare
  • Evolutionary Algorithms and Applications
  • Advanced MIMO Systems Optimization
  • Robotic Path Planning Algorithms
  • Satellite Communication Systems

University of Alberta
2019-2023

Indian Institute of Technology Madras
2017-2018

Virginia Tech
2005

Internet of Things (IoT) devices have become increasingly ubiquitous with applications not only in urban areas but remote as well. These support industries such agriculture, forestry, and resource extraction. Due to the device location being areas, satellites are frequently used collect deliver IoT data customers. As these advanced numerous, amount produced has rapidly increased potentially straining ability for radio frequency (RF) downlink capacity. Free space optical communications their...

10.48550/arxiv.2501.11198 preprint EN arXiv (Cornell University) 2025-01-19

Discounted reinforcement learning is fundamentally incompatible with function approximation for control in continuing tasks. It not an optimization problem its usual formulation, so when using there no optimal policy. We substantiate these claims, then go on to address some misconceptions about discounting and connection the average reward formulation. encourage researchers adopt rigorous approaches, such as maximizing reward,

10.48550/arxiv.1910.02140 preprint EN other-oa arXiv (Cornell University) 2019-01-01

In this work, we present MADRaS, an open-source multi-agent driving simulator for use in the design and evaluation of motion planning algorithms autonomous driving. MADRaS provides a platform constructing wide variety highway track scenarios where multiple agents can train tasks using reinforcement learning other machine algorithms. is built on TORCS, car-racing simulator. TORCS offers cars with different dynamic properties tracks geometries surface properties. inherits these functionalities...

10.1613/jair.1.12531 article EN cc-by Journal of Artificial Intelligence Research 2021-04-30

We introduce learning and planning algorithms for average-reward MDPs, including 1) the first general proven-convergent off-policy model-free control algorithm without reference states, 2) prediction algorithm, 3) that converges to actual value function rather than plus an offset. All of our are based on using temporal-difference error conventional when updating estimate average reward. Our proof techniques a slight generalization those by Abounadi, Bertsekas, Borkar (2001). In experiments...

10.48550/arxiv.2006.16318 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The DARPA Grand Challenge might be the greatest and most heralded control systems problem ever posed. challenge was to build an autonomous vehicle that could navigate from Barstow, CA Prim, NV, across hundreds of miles rugged desert terrain. To win million-dollar cash prize being offered by DARPA, required traverse course in less than ten hours with no operator intervention. Although team came close completing course, opened a new era unmanned ground navigation. From field more one hundred...

10.1109/itsc.2004.1398947 article EN 2005-04-06

Recommender systems are used to suggest items users based on the users' preferences. Such often deal with massive item sets and incredibly sparse user-item interactions, which makes it very challenging generate high-quality personalized recommendations. Reinforcement learning (RL) is a framework for sequential decision making naturally formulates recommender-system tasks: recommending as actions in different user context states maximize long-term experience. We investigate two RL policy...

10.1145/3543873.3587661 article EN 2023-04-28

We extend the options framework for temporal abstraction in reinforcement learning from discounted Markov decision processes (MDPs) to average-reward MDPs. Our contributions include general convergent off-policy inter-option algorithms, intra-option algorithms values and models, as well sample-based planning variants of our algorithms. convergence proofs those recently developed by Wan, Naik, Sutton. also notion option-interrupting behavior formulation. show efficacy proposed with...

10.48550/arxiv.2110.13855 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...