Skip to main content

Exploring the Training Robustness of Distributional Reinforcement Learning Against Noisy State Observations

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases: Research Track (ECML PKDD 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14173))

Abstract

In real scenarios, state observations that an agent observes may contain measurement errors or adversarial noises, misleading the agent to take suboptimal actions or even collapse while training. In this paper, we study the training robustness of distributional Reinforcement Learning (RL), a class of state-of-the-art methods that estimate the whole distribution, as opposed to only the expectation, of the total return. Firstly, we validate the contraction of distributional Bellman operators in the State-Noisy Markov Decision Process (SN-MDP), a typical tabular case that incorporates both random and adversarial state observation noises. In the noisy setting with function approximation, we then analyze the vulnerability of least squared loss in expectation-based RL with either linear or nonlinear function approximation. By contrast, we theoretically characterize the bounded gradient norm of distributional RL loss based on the categorical parameterization equipped with the Kullback-Leibler (KL) divergence. The resulting stable gradients while the optimization in distributional RL accounts for its better training robustness against state observation noises. Finally, extensive experiments on the suite of environments verified that distributional RL is less vulnerable against both random and adversarial noisy state observations compared with its expectation-based counterpar (Code is available in https://github.com/datake/RobustDistRL. The extended version of the paper is in https://arxiv.org/abs/2109.08776.).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barth-Maron, G., et al.: Distributed distributional deterministic policy gradients. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  2. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: International Conference on Machine Learning (ICML) (2017)

    Google Scholar 

  3. Chen, T., Liu, J., Xiang, Y., Niu, W., Tong, E., Han, Z.: Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity 2(1), 11 (2019)

    Article  Google Scholar 

  4. Dabney, W., Ostrovski, G., Silver, D., Munos, R.: Implicit quantile networks for distributional reinforcement learning. In: International Conference on Machine Learning (ICML) (2018)

    Google Scholar 

  5. Dabney, W., Rowland, M., Bellemare, M.G., Munos, R.: Distributional reinforcement learning with quantile regression. In: Association for the Advancement of Artificial Intelligence (AAAI) (2018)

    Google Scholar 

  6. Ghiassian, S., Patterson, A., Garg, S., Gupta, D., White, A., White, M.: Gradient temporal-difference learning with regularized corrections. In: International Conference on Machine Learning, pp. 3524–3534. PMLR (2020)

    Google Scholar 

  7. Guan, Z., et al.: Robust stochastic bandit algorithms under probabilistic unbounded adversarial attack. In: AAAI, pp. 4036–4043 (2020)

    Google Scholar 

  8. Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: International Conference on Machine Learning, pp. 1861–1870. PMLR (2018)

    Google Scholar 

  9. Huang, S., Papernot, N., Goodfellow, I., Duan, Y., Abbeel, P.: Adversarial attacks on neural network policies. In: International Conference on Learning Representations (ICLR) workshop (2017)

    Google Scholar 

  10. Ilahi, I., et al.: Challenges and countermeasures for adversarial attacks on deep reinforcement learning. arXiv preprint arXiv:2001.09684 (2020)

  11. Imani, E., White, M.: Improving regression performance with distributional losses. In: International Conference on Machine Learning, pp. 2157–2166. PMLR (2018)

    Google Scholar 

  12. Lin, Y.C., et al.: Tactics of adversarial attack on deep reinforcement learning agents. arXiv preprint arXiv:1703.06748 (2017)

  13. Ma, X., Xia, L., Zhou, Z., Yang, J., Zhao, Q.: DSAC: distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:2004.14547 (2020)

  14. Mavrin, B., Zhang, S., Yao, H., Kong, L., Wu, K., Yu, Y.: Distributional reinforcement learning for efficient exploration. In: International Conference on Machine Learning (ICML) (2019)

    Google Scholar 

  15. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  16. Nguyen, T.T., Gupta, S., Venkatesh, S.: Distributional reinforcement learning with maximum mean discrepancy. In: Association for the Advancement of Artificial Intelligence (AAAI) (2020)

    Google Scholar 

  17. Pattanaik, A., Tang, Z., Liu, S., Bommannan, G., Chowdhary, G.: Robust deep reinforcement learning with adversarial attacks. In: AAMAS (2018)

    Google Scholar 

  18. Shen, Q., Li, Y., Jiang, H., Wang, Z., Zhao, T.: Deep reinforcement learning with robust and smooth policy. In: International Conference on Machine Learning, pp. 8707–8718. PMLR (2020)

    Google Scholar 

  19. Singh, R., Zhang, Q., Chen, Y.: Improving robustness via risk averse distributional reinforcement learning. arXiv preprint arXiv:2005.00585 (2020)

  20. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  21. Zhang, H., Chen, H., Xiao, C., Li, B., Boning, D., Hsieh, C.J.: Robust deep reinforcement learning against adversarial perturbations on observations. In: Advances in Neural Information Processing Systems (2020)

    Google Scholar 

  22. Zhang, S.: Modularized implementation of deep RL algorithms in PyTorch (2018). https://github.com/ShangtongZhang/DeepRL

  23. Zhang, S., Yao, H.: Quota: The quantile option architecture for reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5797–5804 (2019)

    Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers for great feedback on the paper. Dr. Kong was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC), the University of Alberta/Huawei Joint Innovation Collaboration, Huawei Technologies Canada Co., Ltd., and Canada Research Chair in Statistical Learning, the Alberta Machine Intelligence Institute (Amii), and Canada CIFAR AI Chair (CCAI). Yingnan Zhao and Ke Sun were supported by the State Scholarship Fund from China Scholarship Council (No: 202006120405 and No: 202006010082).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linglong Kong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, K., Zhao, Y., Jui, S., Kong, L. (2023). Exploring the Training Robustness of Distributional Reinforcement Learning Against Noisy State Observations. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14173. Springer, Cham. https://doi.org/10.1007/978-3-031-43424-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43424-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43423-5

  • Online ISBN: 978-3-031-43424-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics