Get the latest tech news

Reinforcement Learning – A Reference


...

Solution: Use importance sampling to correct for the difference between behaviour and target policy -> Off-policy Monte Carlo Problem: It's hard to pick the right value of n. Solution: Calculate n-step returns for multiple n's and combine them -> TD(λ) Solution: Compare returns to a reference value (baseline) to better judge if the outcome was actually above average from that state.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of reference

reference

Related news:

News photo

OpenAI quietly revises policy doc to remove reference to ‘politically unbiased’ AI

News photo

NeurIPS keynote speaker apologizes for reference to Chinese student

News photo

Microsoft GW-Basic User's Guide and Reference (1989) [pdf]