Diederik M. Roijers: dissertation

PhD: Multi-Objective Decision-Theoretic Planning

Decision making is hard. It often requires reasoning about uncertain environments, partial observability and action spaces that are too large to enumerate. In such complex decision-making tasks decision-theoretic agents, that can reason about their environments on the basis of mathematical models and produce policies that optimize the utility for their users, can often assist us.

In most research on decision-theoretic agents, the desirability of actions and their effects is codified in a scalar reward function. However, many real-world decision problems have multiple objectives. In such cases the problem is more naturally expressed using a vector-valued reward function. Rather than having a single optimal policy, we then want to produce a set of policies that covers all possible preferences between the objectives. We call such a set a coverage set.

In this dissertation, we focus on decision-theoretic planning algorithms that produce the convex coverage set (CCS), which is the optimal solution set when either: 1) the user utility can be expressed as a weighted sum over the values for each objective; or 2) policies can be stochastic.

We propose new methods based on two popular approaches to creating planning algorithms that produce an (approximate) CCS by building on an existing single-objective algorithm. In the inner loop approach, we replace the summations and maximizations in the inner most loops of the single-objective algorithm by cross-sums and pruning operations. In the outer loop approach, we solve a multi-objective problem as a series of scalarized problems by employing the single-objective method as a subroutine.

Our most important contribution is an outer loop framework that we call optimistic linear support (OLS). As an outer loop method OLS builds the CCS incrementally. We show that, contrary to existing outer loop methods, each intermediate result is a bounded approximation of the CCS with known bounds (even when the single-objective method used is a bounded approximate method as well) and is guaranteed to terminate in a finite number of iterations.

We apply OLS-based algorithms to a variety of multi-objective decision problems, and show that it is more memory-efficient, and faster than corresponding inner loop algorithms for moderate numbers of objectives. We show that exchanging subroutines in OLS is relatively easy and illustrate the importance on a complex planning problem. Finally, we show that it is often possible to reuse parts of the policies and values, found in earlier iterations of OLS, to hot-start later iterations of OLS. Using this last insight, we propose the first method for multi-objective POMDPs that employs point-based planning and can produce an ε-CCS in reasonable time.

Overall, the methods we propose bring us closer to truly practical multi-objective decision-theoretic planning.