-
Notifications
You must be signed in to change notification settings - Fork 0
Blog Post #3 (Planning)
We left off last week having a nearly completed implementation, with the exception of a dark cloud of gremlin-infestation and an urgent need to increase efficiency. In this, our final post, we solve all of these problems and even develop an MDP solver that performs faster than python's mdp toolbox!
Our first step to fixing our MDP implementation was mimicking mdptoolbox's roadmap structure, (num_actions, num_states, num_states) instead of our intial choice of the (num_states, num_states, num_actions) shape. We then verified that the P matrix and PS matrices were adding up to 1 along their rows (for the dot product with the value function to work). In the process, we noticed that these matrices would actually add up to 2.0 along some of the rows (yellow is probability 1, purple is probability 0).
It turns out that the roadmap was being built twice, once in the constructor of MDP and once manually in the simulation code. Additionally, the start state was actually on the vertical axis and the end state along the horizontal. Otherwise, the matrices seemed to make sense.
The final gremlin we discovered was another arbitrary decision that came back to bite us: our policy stored the actions as their actual values (LEFT:-1, FORWARD:0, RIGHT:1) instead of indices. Because we had first used the visualization code to examine the results of mdptoolbox, it was looking for the indices of the actions in the list of all actions. Once we standardized to using indices, suddenly, everything worked!
A useful simplification we made was generating states in a grid pattern rather than randomly. A grid gave us intuition as to what kind of transitions we should see from each state. On a grid, the moves from any given state should be forwards to the next grid point and left and right turns. Additionally, when we finally visualized a successful policy, it was very easy to verify that the actions the robot took were as expected.
100 states is not enough to develop a robust policy to cover a large space. The paper we read generated 20,000 states as good approximation; we started off with generating 5,000. Our MDP policy converged in 9 iterations, taking 1,032 seconds (~17 minutes). The green represents going forwards, red is left, magenta is turning right. The blue marker represents the simulated robot's position:
With a few tweaks and ignoring the provenance state when calculating transitions, we are sitting pretty at 40 seconds for 10,000 nodes. While certainly not real time, but enough to make the algorithm viable. From profiling, it seems like the improvement was not so much from querying a smaller kd-tree (which makes sense as a log(n) operation), but simply from making significantly fewer queries. We kept the orientation of the states sorted, allowing us to then find the best position/orientation combination with a binary search. If we were to make this a real time algorithm, we would want to look into ways of keeping most of the roadmap when the map updates, and only update the states and transitions where change occurs.
To be an effective planning algorithm, the policy iteration cannot take more time that it takes to build all of the possible transitions. We started with building 5,000 states in 17 minutes.
The first couple of changes we made regarded iteration and array indexing. We realized that building the PS matrix was simply taking a plane in the three-dimensional roadmap array, while building the P matrix was combining full rows from the roadmap into a 2D array. It turns out that numpy array native slicing is much faster than for loops; the time to build the P matrix went down from 5.259s to 0.021s for 1,000 states. Additionally, we replaced all calls to range() with xrange(), which would generate the next value instead of pre-generating the entire list to loop through.
The next time-hungry process was the solving of the value function. Since most entries in the matrices are 0s with a few high probability transitions, we transitioned to using sparse matrices. Most significantly, we realized that we were solving a homogenous system and could use the built-in spsolve function to solve Ax = b instead of taking the inverse of the matrix. This was a radical improvement. At first, we left in a calculation of the determinant to check whether the matrix was solveable. However, because Markov matrices should not run into this problem, we removed this statement, especially when we saw that at 5,000 states the determinant was taking an enormous amount of time compared to the rest of the calculations! Out of the total time to build the model and solve it, the MDP solver was taking only 20%. Compared to the 17 minutes it took to build 5,000 states in the beginning, we'd say 8 seconds is an improvement!
50,000 States | Profile |
---|---|
Using Determinant | |
MDPToolbox |
Lastly, we compared our algorithm to mdptoolbox's. Our target goal was to have around 10,000 states to cover a small map densely. The roadmap built in 70-76 seconds (we used a seed to always generate the same states). Our MDP solver finished in 25.5 s (20% of the total time), while mdptoolbox's policy solver completed in 183s (60% of the total time, which was 3 times as long).
10,000 States | Profile |
---|---|
Our implementation | |
MDPToolbox |
With a fairly optimized algorithm, we implemented the policy execution on a Neato, using the odometry as a positional estimate and initializing the odom->map transform by using the 2D pose estimate tool. The Neato only queried the policy for a new action if it detected that it had completed the action by turning or driving the given amount of radians or meters. In the simulation below, the robot's blue pose is updated only when it changes state in this way; the robot itself continues to execute the actions until it reaches an appropriate distance from its goal state.
RVIZ Simulation | Real-Time |
---|---|
The robot successfully executes the policy, matching what the simulation expects it to do. However, running the robot with a map of a specific area shows that there is much improvement on the localization and motion modeling side to do:
RVIZ Simulation | Real-Time |
---|---|
The map we got was reflected from the real entryway, and the Neato's odometry did not do a very good job approximating its travel distance. For a more robust robot implementation, we would need to improve our model for motion uncertainty in the planner or potentially tune the actions to distances that are more accurately detected by a positioning system. Additionally, integrating more sensors, overhead AR tag localization, or using the SLAM algorithm's localization output would output better results.