Consider the $4\times 3$ world shown in Figure sequential-decision-world-figure.
-
Implement an environment simulator for this environment, such that the specific geography of the environment is easily altered. Some code for doing this is already in the online code repository.
-
Create an agent that uses policy iteration, and measure its performance in the environment simulator from various starting states. Perform several experiments from each starting state, and compare the average total reward received per run with the utility of the state, as determined by your algorithm.
-
Experiment with increasing the size of the environment. How does the run time for policy iteration vary with the size of the environment?
Answer
Improve This Solution
View Answer