Skip to content

Latest commit

 

History

History
102 lines (64 loc) · 2.92 KB

Readme.md

File metadata and controls

102 lines (64 loc) · 2.92 KB

Overview

In this project, we implement Deep Deterministic Policy Gradient (DDPG) from scratch (using Numpy only), without using DL framework such as Tensorflow.

DDPG

Key steps in DDPG

In the following, the key steps and their corresponding code snippets are listed. The DDPG algorithm is implemented in ddpg_numpy.py

  1. Select action a_t according to current policy and exploration noise

Drawing

a_t = actor.predict(np.reshape(s_t,(1,3)), ACTION_BOUND, target=False)+1./(1.+i+j)
  1. Execute action a_t and observe reward r_t and observe new state s_{t+1}

Drawing

s_t_1, r_t, done, info = env.step(a_t[0])
  1. Create and sample from replay buffer

Drawing

  1. Set y_i according to the following equation:

Drawing

y=np.zeros((len(batch), action_dim))
a_tgt=actor.predict(states_t_1, ACTION_BOUND, target=True)
Q_tgt = critic.predict(states_t_1, a_tgt,target=True)
for i in range(len(batch)):
    if dones[i]:
        y[i] = rewards[i]
    else:
        y[i] = rewards[i] + GAMMA*Q_tgt[i] 
  1. Update the critic network by the loss function

Drawing

loss += critic.train(states_t, actions, y)
  1. Update the actor policy using the sampled policy gradient:

Drawing

which needs the input of

Drawing

dQ_da = critic.evaluate_action_gradient(states_t,a_for_dQ_da)

which in turn relies on $a=\mu(s_i)$:

a_for_dQ_da=actor.predict(states_t, ACTION_BOUND, target=False)

Finally, the following code implements the actor policy update:

actor.train(states_t, dQ_da, ACTION_BOUND)

  1. Update target networks

Drawing

actor.train_target(TAU)
critic.train_target(TAU)

Actor (policy) Network

The actor network is implemented in actor_net.py .

Critic (value) Network

The critic network is implemented in critic_net.py . We note that we follow the implementation as mention in DDPG paper. The following sketch shows the architecture of the critic network. Drawing

Results

Drawing

Acknowledgement

In the process of the coding, I am informed and inspired by the coding practice, style, technique in the following Github repository (https://github.com/yanpanlau/DDPG-Keras-Torcs, http://cs231n.github.io/assignments2017/assignment2/).