Exercise 21.7 [approx-LMS-exercise]
Implement an exploring reinforcement learning agent that uses direct utility estimation. Make two versions—one with a tabular representation and one using the function approximator in Equation (4x3-linear-approx-equation). Compare their performance in three environments:
- 
    
The $4\times 3$ world described in the chapter.
 - 
    
A ${10}\times {10}$ world with no obstacles and a +1 reward at (10,10).
 - 
    
A ${10}\times {10}$ world with no obstacles and a +1 reward at (5,5).
 
      Answer
      Improve This Solution
    
    
  View Answer