An algorithm for learning a Markov decision process policy, used for finding a suitable action model to take in a given state. 27.07.2023 17:54 aior