Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gait Planning and Reinforcement Learning #3

Open
ekanshgupta92 opened this issue May 26, 2021 · 0 comments
Open

Gait Planning and Reinforcement Learning #3

ekanshgupta92 opened this issue May 26, 2021 · 0 comments

Comments

@ekanshgupta92
Copy link
Collaborator

ekanshgupta92 commented May 26, 2021

Gait planning

A fundamental requirement of legged robots is gait planning, the way the foot moves in a closed loop consisting of stance(when the foot touches the ground) and the swing phase (when the same foot leaves the ground). Quadrupeds use different gaits in different situations to optimise their energy, mainly: walk, trot, pace, and gallop. The difference in all these gaits lies in the phase between the legs. Check this link to see them in action.

Gait algorithms and need for RL

There are various methods we can use while gait implementation(the swing and stance phases), like spring-loaded inverted pendulum (SLIP), but we have finally decided to implement Bezier curves. However, this won't be much stable in rough and uneven terrains as the legs strictly follow a fixed loop. In order to counter this problem, we need to adapt the bot such that it adjusts the trajectory of the feet to make the bot stable. This is done using Reinforced learning, a system in which machines learn through trial and error using their own experience. In a bird's eye view for the working of RL, we have an environment (a simulation or kind of a game) in which the agent(player: computer in this case) can take one of the fixed states of actions. You reward the player/computer if the goal is achieved. RL helps to increase the reward by playing in the environment for several iterations called episodes. If the episode ends or the player loses the game, the reset function is called, which forms the default state of the environment for a fresh episode.

A word of caution: Don't confuse it with deep learning. The two might sound similar but are different because reinforced learning learns using rewards and punishment by taking random steps in an environment, whereas deep learning learns by finding patterns in past data to predict results on new data.

Libraries and resources

OpenAI's gym library comes handy in creating custom environments so that we can train our agent. However, it simulates the environment in Pybullet. So the team has decided first to train the model and then export it to Gazebo and integrate it with the other systems. On the RL agent that we have planned to use, we think PPO2 or ARS might be a good choice since our environment has continuous action space and other algorithms fail to work in this case. According to the plan, we are currently training with these two agents and whichever seems efficient will be used finally. The environment was a big task to complete so we are using the one created by Morisbots. We found this a useful guide to learn RL. Another useful course was UAlberta's specialization. Besides, to prepare your agents, you need to learn either Tensorflow or Pytorch. It is better to have a basic idea of both since the online resources are limited and both are equally used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant