Get the latest tech news
A (Long) Peek into Reinforcement Learning
[Updated on 2020-09-03: Updated the algorithm of SARSA and Q-learning so that the difference is more pronounced. [Updated on 2021-09-19: Thanks to 爱吃猫的鱼, we have this post in Chinese].
(Image source: David Silver's RL course lecture 4: "Model-Free Prediction") All the methods we have introduced above aim to learn the state/action value function and then to select actions accordingly. )ES, as a black-box optimization algorithm, is another approach to RL problems ( In my original writing, I used the phrase “a nice alternative”; Seita pointed me to this discussion and thus I updated my wording.). Many architectures using deep learning models were proposed to resolve the problem, including DQN to stabilize the training with experience replay and occasionally frozen target network.
Or read this on Hacker News