mlpy.learners.offline.irl.IncrApprenticeshipLearner

class mlpy.learners.offline.irl.IncrApprenticeshipLearner(obs, planner, method=None, max_iter=None, thresh=None, gamma=None, nsamples=None, max_steps=None, filename=None, **kwargs)[source]

Bases: mlpy.learners.offline.irl.ApprenticeshipLearner

Incremental apprenticeship learner.

The model under which the apprenticeship is operating is updated incrementally while learning a policy that emulates the expert’s demonstrations.

Parameters:

obs : array_like, shape (n, nfeatures, ni)

List of trajectories provided by demonstrator, which the learner is trying to emulate, where n is the number of sequences, ni is the length of the i_th demonstration, and each demonstration has nfeatures features.

planner : IPlanner

The planner to use to determine the best action.

method : {‘projection’, ‘maxmargin’}, optional

The IRL method to employ. Default is projection.

max_iter : int, optional

The maximum number of iteration after which learning will be terminated. It is assumed that a policy close enough to the experts demonstrations was found. Default is inf.

thresh : float, optional

The learning is considered having converged to the demonstrations once the threshold has been reach. Default is eps.

gamma : float, optional

The discount factor. Default is 0.9.

nsamples : int, optional

The number of samples taken during Monte Carlo sampling. Default is 100.

max_steps : int, optional

The maximum number of steps in an iteration (during MonteCarlo sampling). Default is 100.

filename : str, optional

The name of the file to save the learner state to after each iteration. If None is given, the learner state is not saved. Default is None.

Other Parameters:
 

mix_policies : bool

Whether to create a new policy by mixing from policies seen so far or by considering the best valued action. Default is False.

rescale : bool

If set to True, the feature expectations are rescaled to be between 0 and 1. Default is False.

visualize : bool

Visualize each iteration of the IRL step if set to True. Default is False.

Notes

Inverse reinforcement learning assumes knowledge of the underlying model. However, this is not always feasible. The incremental apprenticeship learner updates its model after every iteration by executing the current policy. Thus, it provides an extension to the original apprenticeship learner.

Attributes

mid The module’s unique identifier.
type This learner is of type offline.

Methods

choose_action(state) Choose the next action
execute(experience) Execute learning specific updates.
learn() Learn a policy from the experience.
load(filename) Load the state of the module from file.
reset(t, **kwargs) Reset the apprenticeship learner.
save(filename) Save the current state of the module to file.