mlpy.learners.offline.irl.ApprenticeshipLearner.learn¶
-
ApprenticeshipLearner.learn()[source]¶ Learn the optimal policy via apprenticeship learning.
The apprenticeship learning algorithm for finding a policy
,
that induces feature expectations
close to
is as follows:- Randomly pick some policy
, compute (or approximate via Monte Carlo)
, and set
. - Compute
,
and let
be the value of
that attains this maximum. This can be achieved
by either the max-margin method or by the projection method. - If
, then terminate. - Using the RL algorithm, compute the optimal policy
for the MDP using rewards
. - Compute (or estimate)
. - Set
, and go back to step 2.
- Randomly pick some policy