experience.log: offline

1. Fern, A., Batch RL Via LSPI (slides)
2. Elkan, C. (2012) Least squares policy iteration (LSPI) (article)
3. Busoniu, L., Lazaric, A., Ghavamzadeh, M., Munos, R., Babuska, R., and De Schutter, B. (2011) Least-squares methods for policy iteration. Reinforcement Learning: State of the Art. Springer. (article)
4. Lagoudakis, M., G., Parr, R., (2003) LSPI (article, code)
5. Lagoudakis, M., G., Parr, R., (2001) Model-Free LSPI (proceeding)

Lazaric, A., Ghavamzadeh, M. & Munos, R. (2011) Finite-sample analysis of least-squares policy iteration. Journal of Machine learning Research, 13:3041-3074.
Xu, X., Hu D., & Lu, X. (2007) Kernel-Based LSPI for RL (article)
Ma, J., & Powell, W. B. (2009) Convergence Proofs of LSPI Algorithm for High-Dimensional Infinite Horizon Markov Decision Process Problems (article)
Thiery, C., & Scherrer, B. (2010) Least-Squares lambda Policy Iteration: Bias-Variance Trade-off in Control Problems (article)

experience.log

LSPI

Labels

About Me