Online Off-policy Prediction
This paper investigates the problem of online prediction learning, where learning proceeds continuously as the agent interacts with an environment. The predictions made by the agent are contingent on a particular way of behaving, represented as a value function. However, the behavior used to select actions and generate the behavior data might be different from the one used to define the predictions, and thus the samples are generated off-policy. The ability to learn behavior-contingent predictions online and off-policy has long been advocated as a key capability of predictive-knowledge learning systems but remained an open algorithmic challenge for decades. The issue lies with the temporal difference (TD) learning update at the heart of most prediction algorithms: combining bootstrapping, off-policy sampling and function approximation may cause the value estimate to diverge. A breakthrough came with the development of a new objective function that admitted stochastic gradient descent variants of TD. Since then, many sound online off-policy prediction algorithms have been developed, but there has been limited empirical work investigating the relative merits of all the variants. This paper aims to fill these empirical gaps and provide clarity on the key ideas behind each method. We summarize the large body of literature on off-policy learning, focusing on 1- methods that use computation linear in the number of features and are convergent under off-policy sampling, and 2- other methods which have proven useful with non-fixed, nonlinear function approximation. We provide an empirical study of off-policy prediction methods in two challenging microworlds. We report each method's parameter sensitivity, empirical convergence rate, and final performance, providing new insights that should enable practitioners to successfully extend these new methods to large-scale applications.[Abridged abstract]
Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Sina Ghiassian (add twitter)
Andrew Patterson (edit)
Martha White (add twitter)
Richard S. Sutton (add twitter)
Adam White (edit)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
11/07/18 06:05PM
27,013
4,017
Tweets
riashatislam: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
PerthMLGroup: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
AssistedEvolve: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
teenvan1995: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
asymptotic123: RT @arxiv_org: Online Off-policy Prediction. https://t.co/hGo8WpuUC6 https://t.co/Fc6oAGgTot
arxiv_org: Online Off-policy Prediction. https://t.co/hGo8WpuUC6 https://t.co/Fc6oAGgTot
arxivml: "Online Off-policy Prediction", Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, Adam White https://t.co/Uo36z3xEf5
random_agent: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
samsinai: RT @Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
BrundageBot: Online Off-policy Prediction. Sina Ghiassian, Andrew Patterson, Martha White, Richard S. Sutton, and Adam White https://t.co/BugXPYu38O
Miles_Brundage: "Online Off-policy Prediction," Ghiassian et al.: https://t.co/ynHjteGJws
Images
Related