NFDI4DS | UHH-SEMS - Publication Details

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

OPENALEX - Publications

Rodrigo Toro Icarte Toryn Q. Klassen Richard Valenzano Sheila A. McIlraith

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have program function and, hence, there is opportunity make visible – show function’s code agent so it can exploit internal structure learn policies a more sample efficient manner. this paper, we how accomplish idea two steps. First, propose machines, type of...

10.1613/jair.1.12440 article EN cc-by Journal of Artificial Intelligence Research 2022-01-11

LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning

OPENALEX - Publications

Alberto Rivas Rodrigo Toro Icarte Toryn Q. Klassen Richard Valenzano Sheila A. McIlraith

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from reward function. Unfortunately, may take many interactions with environment to learn sparse rewards, and can be challenging specify functions that reflect complex reward-worthy behavior. We propose using machines (RMs), which are automata-based representations expose function structure, as a normal form representation for functions. show how specifications of in various formal languages, including LTL other...

10.24963/ijcai.2019/840 article EN 2019-07-28

Symbolic Plans as High-Level Instructions for Reinforcement Learning

OPENALEX - Publications

León Illanes Xi Yan Rodrigo Toro Icarte Sheila A. McIlraith

Reinforcement learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. Users define tasks or goals for RL by designing specialized functions such that maximization aligns task satisfaction. This work explores use of high-level symbolic action models as a framework defining final-state goal and automatically producing corresponding functions. We also show how automated planning can be used synthesize plans guide hierarchical (HRL)...

10.1609/icaps.v30i1.6750 article EN Proceedings of the International Conference on Automated Planning and Scheduling 2020-06-01

Connection-Aware Heuristics for Scheduling and Distributing Jobs under Dynamic Dew Computing Environments

OPENALEX - Publications

Pablo Sanabria Sebastián I. Montoya Andrés Neyem Rodrigo Toro Icarte Matías Hirsch and 1 more

Due to the widespread use of mobile and IoT devices, coupled with their continually expanding processing capabilities, dew computing environments have become a significant focus for researchers. These enable resource-constrained devices contribute power local network. One major challenge within these revolves around task scheduling, specifically determining optimal distribution jobs across available in This becomes particularly pronounced dynamic where network conditions constantly change....

10.3390/app14083206 article EN cc-by Applied Sciences 2024-04-11

Learning reward machines: A study in partially observable reinforcement learning

OPENALEX - Publications

Rodrigo Toro Icarte Toryn Q. Klassen Richard Valenzano Margarita P. Castro Ethan Waldie and 1 more

10.1016/j.artint.2023.103989 article EN Artificial Intelligence 2023-08-01

How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

OPENALEX - Publications

Rodrigo Toro Icarte Jorge A. Baier Cristián Ruz Álvaro Soto

The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense over relevant aspects the world, including useful visual information, e.g.: "a ball is used by a football player", tennis player located at court". Current state-of-the-art approaches for recognition do not exploit these rule-based sources. Instead, they learn models directly from training examples. In this paper, we study how ontologies—specifically, MIT's ConceptNet...

10.24963/ijcai.2017/178 article EN 2017-07-28

Solving Task Scheduling Problems in Dew Computing via Deep Reinforcement Learning

OPENALEX - Publications

Pablo Sanabria Tomás Felipe Tapia Rodrigo Toro Icarte Andrés Neyem

Due to mobile and IoT devices’ ubiquity their ever-growing processing potential, Dew computing environments have been emerging topics for researchers. These allow resource-constrained devices contribute power others in a local network. One major challenge these is task scheduling: that is, how distribute jobs across available the In this paper, we propose using artificial intelligence (AI). Specifically, show an AI agent, known as Proximal Policy Optimization (PPO), can learn simulated...

10.3390/app12147137 article EN cc-by Applied Sciences 2022-07-15

Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

OPENALEX - Publications

Parand Alizadeh Alamdari Toryn Q. Klassen Rodrigo Toro Icarte Sheila A. McIlraith

Pluralistic alignment is concerned with ensuring that an AI system's objectives and behaviors are in harmony the diversity of human values perspectives. In this paper we study notion pluralistic context agentic AI, particular agent trying to learn a policy manner mindful perspective others environment. To end, show how being considerate future wellbeing agency other (human) agents can promote form alignment.

10.48550/arxiv.2411.10613 preprint EN arXiv (Cornell University) 2024-11-15

Interpretable Sequence Classification via Discrete Optimization

OPENALEX - Publications

Maayan Shvo Andrew C. Li Rodrigo Toro Icarte Sheila A. McIlraith

Sequence classification is the task of predicting a class label given sequence observations. In many applications such as healthcare monitoring or intrusion detection, early crucial to prompt intervention. this work, we learn classifiers that favour from an evolving observation trace. While state-of-the-art are neural networks, and in particular LSTMs, our take form finite state automata learned via discrete optimization. Our automata-based interpretable---supporting explanation,...

10.1609/aaai.v35i11.17161 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2021-05-18

Reward Machines for Deep RL in Noisy and Uncertain Environments

OPENALEX - Publications

Andrew C. Li Zizhao Chen Toryn Q. Klassen Pashootan Vaezipoor Rodrigo Toro Icarte and 1 more

Reward Machines provide an automata-inspired structure for specifying instructions, safety constraints, and other temporally extended reward-worthy behaviour. By exposing complex reward function structure, they enable counterfactual learning updates that have resulted in impressive sample efficiency gains. While been employed both tabular deep RL settings, typically relied on a ground-truth interpretation of the domain-specific vocabulary form building blocks function. Such interpretations...

10.48550/arxiv.2406.00120 preprint EN arXiv (Cornell University) 2024-05-31

Learning Symbolic Representations for Reinforcement Learning of Non-Markovian Behavior

OPENALEX - Publications

Phillip J. K. Christoffersen Andrew C. Li Rodrigo Toro Icarte Sheila A. McIlraith

Many real-world reinforcement learning (RL) problems necessitate complex, temporally extended behavior that may only receive reward signal when the is completed. If reward-worthy known, it can be specified in terms of a non-Markovian function - depends on aspects state-action history, rather than just current state and action. Such functions yield sparse rewards, necessitating an inordinate number experiences to find policy captures pattern behavior. Recent work has leveraged Knowledge...

10.48550/arxiv.2301.02952 preprint EN cc-by arXiv (Cornell University) 2023-01-01

AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning

OPENALEX - Publications

Maayan Shvo Zhiming Hu Rodrigo Toro Icarte Iqbal Mohomed Allan Jepson and 1 more

Human beings, even small children, quickly become adept at figuring out how to use applications on their mobile devices. Learning a new app is often achieved via trial-and-error, accelerated by transfer of knowledge from past experiences with like apps. The prospect building smarter smartphone â one that can learn achieve tasks using apps tantalizing. In this paper we explore the Reinforcement (RL) goal advancing aspiration. We introduce an RL-based framework for learning accomplish in RL...

10.21428/594757db.e57f0d1e article EN cc-by 2021-06-08

Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks

OPENALEX - Publications

Andrew C. Li Pashootan Vaezipoor Rodrigo Toro Icarte Sheila A. McIlraith

Deep reinforcement learning has shown promise in discrete domains requiring complex reasoning, including games such as Chess, Go, and Hanabi. However, this type of reasoning is less often observed long-horizon, continuous with high-dimensional observations, where instead RL research predominantly focused on problems simple high-level structure (e.g. opening a drawer or moving robot fast possible). Inspired by combinatorially hard optimization problems, we propose set robotics tasks which...

10.48550/arxiv.2206.01812 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

OPENALEX - Publications

Andrew C. Li Zizhao Chen Pashootan Vaezipoor Toryn Q. Klassen Rodrigo Toro Icarte and 1 more

Natural and formal languages provide an effective mechanism for humans to specify instructions reward functions. We investigate how generate policies via RL when functions are specified in a symbolic language captured by Reward Machines, increasingly popular automaton-inspired structure. interested the case where mapping of environment state (here, Machine) vocabulary -- commonly known as labelling function is uncertain from perspective agent. formulate problem policy learning Machines with...

10.48550/arxiv.2211.10902 preprint EN cc-by arXiv (Cornell University) 2022-01-01

LTL2Action: Generalizing LTL Instructions for Multi-Task RL

OPENALEX - Publications

Pashootan Vaezipoor Andrew Li Rodrigo Toro Icarte Sheila A. McIlraith

We address the problem of teaching a deep reinforcement learning (RL) agent to follow instructions in multi-task environments. Instructions are expressed well-known formal language -- linear temporal logic (LTL) and can specify diversity complex, temporally extended behaviours, including conditionals alternative realizations. Our proposed approach exploits compositional syntax semantics LTL, enabling our RL learn task-conditioned policies that generalize new instructions, not observed during...

10.48550/arxiv.2102.06858 preprint EN cc-by arXiv (Cornell University) 2021-01-01

How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval

OPENALEX - Publications

Rodrigo Toro Icarte Jorge A. Baier Cristián Ruz Álvaro Soto

The knowledge representation community has built general-purpose ontologies which contain large amounts of commonsense over relevant aspects the world, including useful visual information, e.g.: "a ball is used by a football player", tennis player located at court". Current state-of-the-art approaches for recognition do not exploit these rule-based sources. Instead, they learn models directly from training examples. In this paper, we study how ontologies---specifically, MIT's ConceptNet...

10.48550/arxiv.1705.08844 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Real-Time Heuristic Search with LTLf Goals

OPENALEX - Publications

Jaime Middleton Rodrigo Toro Icarte Jorge A. Baier

In Real-Time Heuristic Search (RTHS) we are given a search graph G, heuristic, and the objective is to find path from start node goal in G. As such, one does not impose any trajectory constraints on path, besides reaching goal. this paper consider version of RTHS which temporally extended goals can be defined form path. Such specified Linear Temporal Logic over Finite Traces (LTLf), an expressive language that has been considered many other frameworks, such as Automated Planning, Synthesis,...

10.24963/ijcai.2022/663 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

The act of remembering: a study in partially observable reinforcement learning

OPENALEX - Publications

Rodrigo Toro Icarte Richard Valenzano Toryn Q. Klassen Phillip Christoffersen Amir‐massoud Farahmand and 1 more

Reinforcement Learning (RL) agents typically learn memoryless policies---policies that only consider the last observation when selecting actions. policies is efficient and optimal in fully observable environments. However, some form of memory necessary RL are faced with partial observability. In this paper, we study a lightweight approach to tackle observability RL. We provide agent an external additional actions control what, if anything, written memory. At every step, current state part...

10.48550/arxiv.2010.01753 preprint EN other-oa arXiv (Cornell University) 2020-01-01