POWERED LANDING GUIDANCE ALGORITHMS USING REINFORCEMENT LEARNING METHODS FOR LUNAR LANDER CASE

Larasmoyo Nugroho; Novanna Rahma Zani; Nurul Qomariyah; Rini Akmeliawati; Rika Andiarti; Sastra Kusuma Wijaya

doi:10.30536/j.jtd.2021.v19.a3573

Authors

Larasmoyo Nugroho Rocket Technology Center, National Institute of Aeronautics and Space (LAPAN), Indonesia
Novanna Rahma Zani Electronics Department, Politeknik Elektronika Negeri Surabaya, Indonesia
Nurul Qomariyah Electronics Department, Politeknik Elektronika Negeri Surabaya, Indonesia
Rini Akmeliawati School of Mechanical Engineering, University of Adelaide, Australia
Rika Andiarti Deputy of Aerospace Technology, National Institute of Aeronautics and Space (LAPAN), Indonesia
Sastra Kusuma Wijaya Physics Department, Universitas Indonesia, Depok, Indonesia

DOI:

https://doi.org/10.30536/j.jtd.2021.v19.a3573

Keywords:

Planetary Landing, Lunar Lander, Q-Learning, DQN, DDQN, DDPG, PPO

Abstract

Any future planetary landing missions, just as demonstrated by Perseverance in 2021 Mars landing mission require advanced guidance, navigation, and control algorithms for the powered landing phase of the spacecraft to touch down a designated target with pinpoint accuracy (circular error precision < 5 m radius). This requires a landing system capable to estimate the craftâ€™s states and map them to certain thrust commands for each craftâ€™s engine. Reinforcement learning theory is used as an approach to manage the mappingÂ guidance algorithm and translate it to engine thrust control commands. This work compares several reinforcement learning based approaches for a powered landing problem of a spacecraft in a two-dimensional (2-D) environment, and identify the advantages/disadvantages of them. Five methods in reinforcement learning, namely Q-Learning, and its extension such as DQN, DDQN, and policy optimization-based such as DDPG and PPO are utilized and benchmarked in terms of rewards and training time needed to land the Lunar Lander. It is found that Q-Learning method produced the highest efficiency. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control systemâ€™s performance in a 2-D simulation environment and demonstrate robustness to noise and system parameter uncertainty.

References

Basic, B. D., & Snajder, J. (2020). Unit 3. Heuristic Search. University of Zagreb, 1â€“41.

Bengio, Y. (2013). Deep Learning of Representations: Looking Forward. Arxiv.Org - Universite de

Montreal, Canada, 1305.0445.

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., & Zaremba, W. (2016).

OpenAI Gym. 1â€“4. http://arxiv.org/abs/1606.01540

Cini, A., Eramo, C. D., Peters, J., & Alippi, C. (2020). Deep Reinforcement Learning with Weighted Q-

Learning. ArXiv, 2003.09280.

Ermacora, G., Rosa, S., Toma, A., & Torino, P. di. (2016). Fly4SmartCity : A cloud robotics service for smart

city applications. Journal of Ambient Intelligence and Smart Environments, 8, 347â€“358. https://doi.org/10.3233/AIS-160374

Gaudet, B., & Linares, R. (2018). Integrated Guidance and Control for Pinpoint Mars Landing Using

Reinforcement Learning. AIAA Guidance, Navigation and Control Conference and Exhibit, October.

Hasselt, H. Van, Guez, A., & Silver, D. (2016). Deep Reinforcement Learning with Double Q-learning. AAAI

- ArXiv Computer Science, 1509.06461.

Liu, R., Nageotte, F., Zanne, P., Mathelin, M. De, & Dresp-langley, B. (2021). Deep Reinforcement

Learning for the Control of Robotic Manipulation : A Focussed Mini-Review. MDPI - Robotics, 1â€“13.

Meen, V. (2020). Space Model Web Page. Spacemodels.Nuxit.Net. http://spacemodels.nuxit.net/1-32

LM/Orientation1.jpg

Michaux, J. (2019). Off-Policy Actor-Critic Algorithms. https://jmichaux.github.io/week4b/

MitiÄ‡, M., Miljkovi, Z., & BabiÄ‡, B. (2011). Empirical Control System Development for Intelligent

Mobile Robot Based on the Elements of the Reinforcement Machine Learning and Axiomatic Design

Theory. FME Transactions University Belgrade, March.

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M.,

Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D.,

Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control through deep reinforcement learning.

Nature, 518(7540), 529â€“533. https://doi.org/10.1038/nature14236

Mnih, V., & Silver, D. (2013). Playing Atari with Deep Reinforcement Learning. DeepMind Technologies,

1â€“9.

Sutton, R. ., & Barto, A. . (1998). Reinforcement Learning: An Introduction. MIT Press Cambridge: London,

UK.

TanyolaÃ§, T., & Yasarcan, H. (2012). Control Heuristics for Soft Landing Problem 1. Theses Bogazici

Universitesi, 1â€“26.

Von Dollen, D. (2017). Investigating Reinforcement Learning Agents for Continuous State Space

Environments. 5â€“7. http://arxiv.org/abs/1708.02378

Watkins, C. (1989). Learning from Delayed Rewards. Theses Dissertation - Kings College, May.

Yu, X. (2019). Deep Q-Learning on Lunar Lander Game (Issue May).

POWERED LANDING GUIDANCE ALGORITHMS USING REINFORCEMENT LEARNING METHODS FOR LUNAR LANDER CASE

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

sidebar

Tools

Statistics

About the Journal

Policies

Other