HQ's advantages.
HQ's current limitations.
Still, HQ's remaining HSP may prevent HQ from learning an optimal policy. To deal with this HSP one might think of using subgoal trees instead of sequences. All possible subgoal sequences are representable by a tree whose branches are labeled with subgoals and whose nodes contain RPs for solving RPPs. Each node stands for a particular history of subgoals and previously solved subtasks -- there is no HSP any more. Since the tree grows exponentially with the number of possible subgoals, however, it is practically infeasible in case of large scale POMDPs. Perhaps it will be possible to find a reasonable compromise between simple linear sequences and full-fledged trees.