Multi-objective reinforcement learning: a comprehensive survey of theories, algorithms ...

tandfonline.comJun 9, 2026

When AI must juggle competing goals simultaneously, classical single-reward reinforcement learning breaks down — this survey reveals the deeper mathematical frameworks needed to make multi-objective decision-making work.

Multi-Objective Reinforcement LearningMarkov Decision ProcessPareto OptimalityArrow's Impossibility Theorem

Theory Briefing

The survey formalises the multi-objective Markov decision process (MOMDP), giving researchers a unified mathematical scaffold for AI agents balancing multiple competing rewards.
Classical reinforcement learning optimises a single scalar reward, but real-world decisions involve trade-offs — MORL's Pareto optimality notions capture which solutions are truly non-dominated.
By cataloguing algorithms and optimality criteria together, the survey exposes how Arrow's Impossibility logic echoes in AI: no single policy can always win on every objective at once.

Read original article →