The PPO method with count-based explorationĬhapter 22: Beyond Model-Free – Imagination The bilingual evaluation understudy (BLEU) scoreĬhapter 19: Trust Regions – PPO, TRPO, ACKTR, and SAC The theoretical background of the cross-entropy methodĬhapter 5: Tabular Learning and the Bellman EquationĬhapter 11: Policy Gradients – an AlternativeĬhapter 13: Asynchronous Advantage Actor-Critic The final glue – loss functions and optimizers ![]() ![]() Chapter 1: What Is Reinforcement Learning?Įxtra Gym functionality – wrappers and monitors
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |