An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation architecture and with memory and computation costs scaling quadratically in the size of the feature set. The algorithms employ the multi-timescale stochastic approximation variant of the very popular cross entropy (CE) optimization method which is a model based search method to find the global optimum of a real-valued function. A proof of convergence of the algorithms using the ODE method is provided. We supplement our theoretical results with experimental comparisons. The algorithms achieve good performance fairly consistently on many RL benchmark problems with regards to computational efficiency, accuracy and stability.
NurtureToken New!

Token crowdsale for this paper ends in

Buy Nurture Tokens

Authors

Are you an author of this paper? Check the Twitter handle we have for you is correct.

Ajin George Joseph (add twitter)
Shalabh Bhatnagar (add twitter)
Ask The Authors

Ask the authors of this paper a question or leave a comment.

Read it. Rate it.
#1. Which part of the paper did you read?

#2. The paper contains new data or analyses that is openly accessible?
#3. The conclusion is supported by the data and analyses?
#4. The conclusion is of scientific interest?
#5. The result is likely to lead to future research?

Github
User:
None (add)
Repo:
None (add)
Stargazers:
0
Forks:
0
Open Issues:
0
Network:
0
Subscribers:
0
Language:
None
Youtube
Link:
None (add)
Views:
0
Likes:
0
Dislikes:
0
Favorites:
0
Comments:
0
Other
Sample Sizes (N=):
Inserted:
Words Total:
Words Unique:
Source:
Abstract:
None
06/23/18 05:05PM
20,946
4,024
Tweets
Nobody has tweeted about this paper.
Images
Related