A type of reinforcement learning method that maintains separate parameterizations for the policy and the value function. 27.07.2023 17:54 aior