ROBIO 2022

Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration

"Skill-based Multi-objective Reinforcement Learning of Industrial Robot Tasks with Planning and Knowledge Integration"
Matthias Mayr, Faseeh Ahmad, Konstantinos Chatzilygeroudis, Luigi Nardi and Volker Krueger
IEEE International Conference on Robotics and Biomimetics (ROBIO) 2022

Abstract:

In modern industrial settings with small batch sizes it should be easy to set up a robot system for a new task. Strategies exist, e.g. the use of skills, but when it comes to handling forces and torques, these systems often fall short. We introduce an approach that provides a combination of task-level planning with targeted learning of scenario-specific parameters for skill-based systems. We propose the following pipeline: (1) the user provides a task goal in the planning language PDDL, (2) a plan (i.e., a sequence of skills) is generated and the learnable parameters of the skills are automatically identified. An operator then chooses (3) reward functions and hyperparameters for the learning process. Two aspects of our methodology are critical: (a) learning is tightly integrated with a knowledge framework to support symbolic planning and to provide priors for learning, (b) using multi-objective optimization. This can help to balance key performance indicators (KPIs) such as safety and task performance since they can often affect each other. We adopt a multi-objective Bayesian optimization approach and learn entirely in simulation. We demonstrate the efficacy and versatility of our approach by learning skill parameters for two different contact-rich tasks. We show their successful execution on a real 7-DOF KUKA-iiwa manipulator and outperform the manual parameterization by human robot operators.

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation. This research was also supported in part by affiliate members and other supporters of the Stanford DAWN project—Ant Financial, Facebook, Google, InfoSys, Teradata, NEC, and VMware.

IROS 2021

Learning of Parameters in Behavior Trees for Movement Skills

"Learning of Parameters in Behavior Trees for Movement Skills"
Matthias Mayr, Konstantinos Chatzilygeroudis, Faseeh Ahmad, Luigi Nardi and Volker Krueger
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021

Abstract:

Reinforcement Learning (RL) is a powerful mathematical framework that allows robots to learn complex skills by trial-and-error. Despite numerous successes in many applications, RL algorithms still require thousands of trials to converge to high-performing policies, can produce dangerous behaviors while learning, and the optimized policies (usually modeled as neural networks) give almost zero explanation when they fail to perform the task. For these reasons, the adoption of RL in industrial settings is not common. Behavior Trees (BTs), on the other hand, can provide a policy representation that a) supports modular and composable skills, b) allows for easy interpretation of the robot actions, and c) provides an advantageous low-dimensional parameter space. In this paper, we present a novel algorithm that can learn the parameters of a BT policy in simulation and then generalize to the physical robot without any additional training. We leverage a physical simulator with a digital twin of our workstation, and optimize the relevant parameters with a black-box optimizer. We showcase the efficacy of our method with a 7-DOF KUKA-iiwa manipulator in a task that includes obstacle avoidance and a contact-rich insertion (peg-in-hole), in which our method outperforms the baselines.

This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation. This research was also supported in part by affiliate members and other supporters of the Stanford DAWN project-Ant Financial, Facebook, Google, InfoSys, Teradata, NEC, and VMware.

ICDL-EpiRob 2020

From human action understanding to robot action execution

"From human action understanding to robot action execution: how the physical properties of handled objects modulate non-verbal cues"
Nuno Ferreira Duarte, Konstantinos Chatzilygeroudis, José Santos-Victor, and Aude Billard
International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) 2020

Abstract:

Humans manage to communicate action intentions in a non-verbal way, through body posture and movement. We start from this observation to investigate how a robot can decode a human's non-verbal cues during the manipulation of an object, with specific physical properties, to learn the adequate level of "carefulness" to use when handling that object. We construct dynamical models of the human behaviour using a human-to-human handover dataset consisting of 3 different cups with different levels of fillings. We then included these models into the design of an online classifier that identifies the type of action, based on the human wrist movement. We close the loop from action understanding to robot action execution with an adaptive and robust controller based on the learned classifier, and evaluate the entire pipeline on a collaborative task with a 7-DOF manipulator. Our results show that it is possible to correctly understand the "carefulness" behaviour of humans during object manipulation, even in the pick and place scenario, that was not part of the training set.

This work is supported by the CHIST-ERA programme through the project CORSMAL, under UK EPSRC grant EP/S031715/1 and Swiss NSF grant 20CH21180444; it also received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (SAHR project, ERC Advanced Grant, project ID:741945).

RA-L 2020

Benchmark for Bimanual Robotic Manipulation of Semi-deformable Objects

"Benchmark for Bimanual Robotic Manipulation of Semi-deformable Objects"
Konstantinos Chatzilygeroudis, Bernardo Fichera, Ilaria Lauzana, Fanjun Bu, Kunpeng Yao, Farshad Khadivar, and Aude Billard
Robotics and Automation Letters (RA-L): Special Issue on Benchmarking Protocols for Robotic Manipulation 2019

Abstract:

We propose a new benchmarking protocol to evaluate algorithms for bimanual robotic manipulation semi-deformable objects. The benchmark is inspired from two real-world applications: (a) watchmaking craftsmanship, and (b) belt assembly in automobile engines. We provide two setups that try to highlight the following challenges: (a) manipulating objects via a tool, (b) placing irregularly shaped objects in the correct groove, (c) handling semi-deformable objects, and (d) bimanual coordination. We provide CAD drawings of the task pieces that can be easily 3D printed to ensure ease of reproduction ,and detailed description of tasks and protocol for successful reproduction, as well as meaningful metrics for comparison. We propose four categories of submission in an attempt to make the benchmark accessible to a wide range of related fields spanning from adaptive control, motion planning to learning the tasks through trial-and-error learning.

This work received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (SAHR project, ERC Advanced Grant, project ID:741945).

Benchmark for Human-to-Robot Handovers of Unseen Containers with Unknown Filling

"Benchmark for Human-to-Robot Handovers of Unseen Containers with Unknown Filling"
Ricardo Sanchez-Matilla*, Konstantinos Chatzilygeroudis*, Apostolos Modas, Nuno Ferreira Duarte, Alessio Xompero, Pascal Frossard, Aude Billard, and Andrea Cavallaro
Robotics and Automation Letters (RA-L): Special Issue on Benchmarking Protocols for Robotic Manipulation 2019
*Equal contribution

Abstract:

The real-time estimation through vision of the physical properties of objects manipulated by humans is important to inform the control of robots for performing accurate and safe grasps of objects handed over by humans. However, estimating the 3D pose and dimensions of previously unseen objects using only inexpensive cameras is challenging due to illumination variations, transparencies, reflective surfaces, and occlusions caused both by the human and the robot. In this paper we present a benchmark for dynamic human-to-robot handovers that is based on an affordable experimental setup that does not use a motion capture system, markers, or prior knowledge of specific object models. The benchmark focuses on containers and specifically on plastic drinking cups with an unknown amount of unknown filling. The performance measures assess the overall system as well as its components in order to help isolate elements of the pipeline that need improvements. In addition to the task description and the performance measures, we also present and distribute as open source a baseline implementation for the overall task in order to enable comparisons and facilitate progress.

This work is supported by the CHIST-ERA programme through the project CORSMAL, under UK EPSRC grant EP/S031715/1 and Swiss NSF grant 20CH21180444; and the Research and Innovation programme ICT-2014-1, under grant agreement 643950-SecondHands.

CoRL 2018

Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards

"Multi-objective Model-based Policy Search for Data-efficient Learning with Sparse Rewards"
Rituraj Kaushik, Konstantinos Chatzilygeroudis and Jean-Baptiste Mouret
Paper presented at the Conference on Robot Learning (CoRL) 2018

Abstract:

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes.To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-space regions for which the model is as accurate as possible. We then optimize these objectives using a Pareto-based multi-objective optimization algorithm.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

ICRA 2018

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

"Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics"
Konstantinos Chatzilygeroudis and Jean-Baptiste Mouret
Paper presented at the International Conference on Robotics and Automation (ICRA) 2018

Abstract:

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search

"Bayesian Optimization with Automatic Prior Selection for Data-Efficient Direct Policy Search"
Rémi Pautrat, Konstantinos Chatzilygeroudis and Jean-Baptiste Mouret
Paper presented at the International Conference on Robotics and Automation (ICRA) 2018. A short version of the paper was accepted at the non-archival track of the 1st Conference on Robot Learning (CoRL) 2017.

Abstract:

One of the most interesting features of Bayesian optimization for direct policy search is that it can leverage priors (e.g., from simulation or from previous tasks) to accelerate learning on a robot. In this paper, we are interested in situations for which several priors exist but we do not know in advance which one fits best the current situation. We tackle this problem by introducing a novel acquisition function, called Most Likely Expected Improvement (MLEI), that combines the likelihood of the priors and the expected improvement. We evaluate this new acquisition function on a transfer learning task for a 5-DOF planar arm and on a possibly damaged, 6-legged robot that has to learn to walk on flat ground and on stairs, with priors corresponding to different stairs and different kinds of damages. Our results show that MLEI effectively identifies and exploits the priors, even when there is no obvious match between the current situations and the priors.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

RAS 2018

Reset-free Trial-and-Error Learning for Robot Damage Recovery

"Reset-free Trial-and-Error Learning for Robot Damage Recovery"
Konstantinos Chatzilygeroudis, Vassilis Vassiliades and Jean-Baptiste Mouret
Robotics and Autonomous Systems (RAS) 2018

Abstract:

The high probability of hardware failures prevents many advanced robots (e.g., legged robots) from being confidently deployed in real-world situations (e.g., post-disaster rescue). Instead of attempting to diagnose the failures, robots could adapt by trial-and-error in order to be able to complete their tasks. In this situation, damage recovery can be seen as a Reinforcement Learning (RL) problem. However, the best RL algorithms for robotics require the robot and the environment to be reset to an initial state after each episode, that is, the robot is not learning autonomously. In addition, most of the RL methods for robotics do not scale well with complex robots (e.g., walking robots) and either cannot be used at all or take too long to converge to a solution (e.g., hours of learning). In this paper, we introduce a novel learning algorithm called "Reset-free Trial-and-Error" (RTE) that (1) breaks the complexity by pre-generating hundreds of possible behaviors with a dynamics simulator of the intact robot, and (2) allows complex robots to quickly recover from damage while completing their tasks and taking the environment into account. We evaluate our algorithm on a simulated wheeled robot, a simulated six-legged robot, and a real six-legged walking robot that are damaged in several ways (e.g., a missing leg, a shortened leg, faulty motor, etc.) and whose objective is to reach a sequence of targets in an arena. Our experiments show that the robots can recover most of their locomotion abilities in an environment with obstacles, and without any human intervention.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

IROS 2017

Black-Box Data-efficient Policy Search for Robotics

"Black-Box Data-efficient Policy Search for Robotics"
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades and Jean-Baptiste Mouret
Paper presented at International Conference on Intelligent Robots and Systems (IROS) 2017.

Abstract:

The most data-efficient algorithms for reinforcement learning (RL) in robotics are based on uncertain dynamical models: after each episode, they first learn a dynamical model of the robot, then they use an optimization algorithm to find a policy that maximizes the expected return given the model and its uncertainties. It is often believed that this optimization can be tractable only if analytical, gradient-based algorithms are used; however, these algorithms require using specific families of reward functions and policies, which greatly limits the flexibility of the overall approach. In this paper, we introduce a novel model-based RL algorithm, called Black-DROPS (Black-box Data-efficient RObot Policy Search) that: (1) does not impose any constraint on the reward function or the policy (they are treated as black-boxes), (2) is as data-efficient as the state-of-the-art algorithm for data-efficient RL in robotics, and (3) is as fast (or faster) than analytical approaches when several cores are available. The key idea is to replace the gradient-based optimization algorithm with a parallel, black-box algorithm that takes into account the model uncertainties. We demonstrate the performance of our new algorithm on two standard control benchmark problems (in simulation) and a low-cost robotic manipulator (with a real robot).

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

NIPS 2016 Bayesian Optimization Workshop

Safety-Aware Robot Damage
Recovery Using Constrained Bayesian Optimization and Simulated Priors

"Safety-Aware Robot Damage Recovery Using Constrained Bayesian Optimization and Simulated Priors"
Vaios Papaspyros, Konstantinos Chatzilygeroudis, Vassilis Vassiliades and Jean-Baptiste Mouret
Paper presented at "Bayesian Optimization: Black-box Optimization and Beyond" (BayesOpt) workshop in NIPS 2016.

Abstract:

The recently introduced Intelligent Trial-and-Error (IT&E) algorithm showed that robots can adapt to damage in a matter of a few trials. The success of this algorithm relies on two components: prior knowledge acquired through simulation with an intact robot, and Bayesian optimization (BO) that operates on-line, on the damaged robot. While IT&E leads to fast damage recovery, it does not incorporate any safety constraints that prevent the robot from attempting harmful behaviors. In this work, we address this limitation by replacing the BO component with a constrained BO procedure. We evaluate our approach on a simulated damaged humanoid robot that needs to crawl as fast as possible, while performing as few unsafe trials as possible. We compare our new "safety-aware IT&E" algorithm to IT&E and a multi-objective version of IT&E in which the safety constraints are dealt as separate objectives. Our results show that our algorithm outperforms the other approaches, both in crawling speed within the safe regions and number of unsafe trials.

This video shows a damaged simulated iCub robot safely compensating for damage.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

Poster (click on the image for the pdf):

ICRA 2016 AILTA Workshop

Towards semi-episodic learning for robot damage recovery

"Towards semi-episodic learning for robot damage recovery"
Konstantinos Chatzilygeroudis, Antoine Cully and Jean-Baptiste Mouret
Paper presented (20 min talk) at "Artificial Intelligence for Long-Term Autonomy" (AILTA) workshop in ICRA 2016.

Abstract:

The recently introduced Intelligent Trial and Error algorithm (IT&E) enables robots to creatively adapt to damage in a matter of minutes by combining an off-line evolutionary algorithm and an on-line learning algorithm based on Bayesian Optimization. We extend the IT&E algorithm to allow for robots to learn to compensate for damages while executing their task(s). This leads to a semi-episodic learning scheme that increases the robot’s life-time autonomy and adaptivity. Preliminary experiments on a toy simulation and a 6-legged robot locomotion task show promising results.

This video shows a 6-legged robot performing locomotion tasks despite the left middle leg being removed using our technique.

This work was supported by the ERC project “ResiBots” (grant agreement No 637972), funded by the European Research Council.

Poster (click on the image for the pdf):

Diploma Thesis

NAO Walking Simulation in Gazebo

Part of my diploma thesis. NAO Humanoid Walking Simulation in Gazebo using NAOqi Simulator C++ SDK. More information on github repo.