Menu
  • HOME
  • TAGS

How are eligibility traces with sarsa calculated?

machine-learning,reinforcement-learning,sarsa

It's unfortunate that they've reused the variables s and a in two different scopes here, but yes, you adjust all e(s,a) values, e.g., for every state s in your state space for every action a in your action space update Q(s,a) update e(s,a) Note what's happening here. e(s,a) is getting...

SARSA Implementation

machine-learning,sarsa

Why do we subtract Q(a,s)? r + DQ(a',s1) is the reward that we got on this run through from getting to state s by taking action a. In theory, this is the value that Q(a,s) should be set to. However, we won't always take the same action after getting to...

Implementing SARSA using Gradient Discent

machine-learning,reinforcement-learning,sarsa

Summary: your current approach is correct, except that you shouldn't restrict your output values to be between 0 and 1. This page has a great explanation, which I will summarize here. It doesn't specifically discuss SARSA, but I think everything it says should translate. The values in the results vector...

Eligibility trace reinitialization between episodes in SARSA-Lambda implementation

machine-learning,reinforcement-learning,sarsa

I agree with you 100%. Failing to reset the e-matrix at the start of every episode has exactly the problems that you describe. As far as I can tell, this is an error in the pseudocode. The reference that you cite is very popular, so the error has been propagated...