Get the latest tech news

Reinforcement Learning – A Reference

...

Solution: Use importance sampling to correct for the difference between behaviour and target policy -> Off-policy Monte Carlo Problem: It's hard to pick the right value of n. Solution: Calculate n-step returns for multiple n's and combine them -> TD(λ) Solution: Compare returns to a reference value (baseline) to better judge if the outcome was actually above average from that state.

Get the Android app

Or read this on Hacker News