GSoC’22 RoboComp project: Reinforcement Learning for pick and place operations

Created: 30th June 2022 Updated: 8th September 2022

Objective

The aim of the project is to make a Open AI Gym wrapper for the exisiting robotic arm model in CoppeliaSim. The gym wrapper creation eases the process of training our agent. The currently available library implementations of state-of-the-art Deep RL algorithms require the custom environment to follow this gym.Env structure. A standard wrapper has been built until now.The environment supports both continuous and discrete action spaces.

Environment Description

State Space

A 29 dimensional continuous state space is considered, comprising of:

Info	Dimensions
Block pose: 3 coords+ 4 quaternions	7
Block velocity	3
Block angular velocity	3
Gripper tip position corods	3
Relative position of block w.r.t tip	3
Grip force sensors (left & right)	2
Finger force sensors (left & right)	2
Rel. position b/w left&right fingers	3
Gripper velocity	3

Action space

5 dimensional action space in either discrete or continuous setting.

Info	Discrete	Continuous
Move arm in x-direction	{-1,0,1}	[-1,1]
Move arm in y-direction	{-1,0,1}	[-1,1]
Move arm in z-direction	{-1,0,1}	[-1,1]
Move wrist	{-1,0,1}	[-1,1]
Open/Close the gripper	{-1,0,1}	[-1,1], but will berounded off to {-1,0,1}

Collision Detection

Collision Detection is a important aspect for the environment as it prevents arm to crash into block, table and such. The force data from the left and right finger sensors is used. The magintude of force sensors is obtained and if that exceeds a certain threshold, a collsion is detected. The threshold is finetuned from observations of various training episodes involving collisions.

Grasp Detection

Similar to collision detection, if the force magintudes obtained from the gripper sensors exceeds a certain fiinetuned threshold, a grasp is detected. In the training phase, this would be a very useful feature to have in the reward function, where a certain reward is achieved for a successful grasp.

Further steps

Goal Environment for goal-conditioning with HER

Since, the task of pick and place is quite complex, we want to use to leverage the idea of goal-conditioning. With goal-conditioning, each episode is considering as a success by treating the achieved terminal state as a virtual goal state. Hindsight Experience Replay(HER) is used to achieved the goal conditioning for our agent. In order to use HER, our environment need to be modified into a gym.goalEnv structure, where the observation space consists of state, achieved goal and desired goal, and the reward for each time step will be computed based on this structure. This goal env will be created and tested.

Vamsi Anumula