Exercise 17.6 [reward-equivalence-exercise]
Sometimes MDPs are formulated with a reward function $R(s,a)$ that depends on the action taken or with a reward function $R(s,a,sā)$ that also depends on the outcome state.
- 
    
Write the Bellman equations for these formulations.
 - 
    
Show how an MDP with reward function $R(s,a,sā)$ can be transformed into a different MDP with reward function $R(s,a)$, such that optimal policies in the new MDP correspond exactly to optimal policies in the original MDP.
 - 
    
Now do the same to convert MDPs with $R(s,a)$ into MDPs with $R(s)$.
 
      Answer
      Improve This Solution
    
    
  View Answer