An algorithm for learning a Markov decision process policy, used in reinforcement learning. 27.07.2023 17:54 aior