Numpy – How to get an array of the pattern gamma^t for some 0-t?

Advertisements I am creating a basic gridworld RL problem and I need to calculate the return for some given episode. I currently have the array of rewards, and I would like to element-wise multiply this with a list of the form: [gamma**0, gamma**1, gamma**2, ….] In order to get: [r_0*gamma**0, r_1*gamma**1, r_2*gamma**2, ….] and then… Read More Numpy – How to get an array of the pattern gamma^t for some 0-t?

Relationship of Horizon and Discount factor in Reinforcement Learning

Advertisements What is the connection between discount factor gamma and horizon in RL. What I have learned so far is that the horizon is the agent`s time to live. Intuitively, agents with finite horizon will choose actions differently than if it has to live forever. In the latter case, the agent will try to maximize… Read More Relationship of Horizon and Discount factor in Reinforcement Learning