Exercise 21.2

Table of Contents

Part Ⅰ Artificial Intelligence
1. 1. Introduction
2. 2. Intelligent Agent
Part Ⅱ Problem-solving
Part Ⅲ Knowledge, reasoning, and planning
Part Ⅳ Uncertain knowledge and reasoning
Part Ⅴ Learning
Part Ⅵ Communicating, perceiving, and acting
Part Ⅶ Conclusions
1. 26. Philosophical Foundations
2. Future Exercises

Chapter complex-decisions-chapter defined a proper policy for an MDP as one that is guaranteed to reach a terminal state. Show that it is possible for a passive ADP agent to learn a transition model for which its policy $\pi$ is improper even if $\pi$ is proper for the true MDP; with such models, the POLICY-EVALUATION step may fail if $\gamma1$. Show that this problem cannot arise if POLICY-EVALUATION is applied to the learned model only at the end of a trial.

Answer Improve This Solution

View Answer

Request Answer

Aritificial Intelligence: A Modern Approach

Stuart J. Russell and Peter Norvig