Abstract:
Agentstrainedwithdeepreinforcementlearningalgorithmsarecapableofperforming highly complex tasks including locomotion in continuous environments. In order to attain a human-level performance, the next step of research should be to investigate the ability to transfer the learning acquired in one task to unknown tasks. Concerns on generalization and overfitting in deep reinforcement learning are not usually addressed in current transfer learning research. This issue results in simplistic benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this thesis, we propose novel regularization techniques exclusive to policy gradient algorithms for continuous control through the application of sample elimination and early stopping. By discarding samples that lead to overfitting via strict clipping we will generate robust policies for a humanoid with high generalization capacity. We also suggest the inclusion of training iteration to the hyperparameters in deep transfer learning problems. We recommend resorting to earlier snapshots of parameters depending on the target task due to the occurrence of overfitting to the source task. We demonstrate that a humanoid is capable of performing forward locomotion in unseen environments with different gravities and tangential frictions using strict clipping and early stopping. Furthermore, we evaluate our propositions on a delivery task where a humanoid is required to carry a heavy box while walking and inter-robot transfer tasks where the humanoid transfers its learning to taller and shorter robots. Because source task performance is not indicative of the generalization capacity of the algorithm we propose three different transfer learning evaluation methods. We increase the generalization capacity of a state-of-art adversarial algorithm by introducing entropy bonus, proposing different critic architectures and using simpler adversaries. Finally, we evaluate the robustness of these adversarial algorithms on morphologically modified hopper environments and environments with unknown gravities according to the criteria we proposed.