mlpy.learners.offline.irl.IncrApprenticeshipLearner¶
-
class
mlpy.learners.offline.irl.
IncrApprenticeshipLearner
(obs, planner, method=None, max_iter=None, thresh=None, gamma=None, nsamples=None, max_steps=None, filename=None, **kwargs)[source]¶ Bases:
mlpy.learners.offline.irl.ApprenticeshipLearner
Incremental apprenticeship learner.
The model under which the apprenticeship is operating is updated incrementally while learning a policy that emulates the expert’s demonstrations.
Parameters: obs : array_like, shape (n, nfeatures, ni)
List of trajectories provided by demonstrator, which the learner is trying to emulate, where n is the number of sequences, ni is the length of the i_th demonstration, and each demonstration has nfeatures features.
planner : IPlanner
The planner to use to determine the best action.
method : {‘projection’, ‘maxmargin’}, optional
The IRL method to employ. Default is projection.
max_iter : int, optional
The maximum number of iteration after which learning will be terminated. It is assumed that a policy close enough to the experts demonstrations was found. Default is inf.
thresh : float, optional
The learning is considered having converged to the demonstrations once the threshold has been reach. Default is eps.
gamma : float, optional
The discount factor. Default is 0.9.
nsamples : int, optional
The number of samples taken during Monte Carlo sampling. Default is 100.
max_steps : int, optional
The maximum number of steps in an iteration (during MonteCarlo sampling). Default is 100.
filename : str, optional
The name of the file to save the learner state to after each iteration. If None is given, the learner state is not saved. Default is None.
Other Parameters: mix_policies : bool
Whether to create a new policy by mixing from policies seen so far or by considering the best valued action. Default is False.
rescale : bool
If set to True, the feature expectations are rescaled to be between 0 and 1. Default is False.
visualize : bool
Visualize each iteration of the IRL step if set to True. Default is False.
See also
Notes
Inverse reinforcement learning assumes knowledge of the underlying model. However, this is not always feasible. The incremental apprenticeship learner updates its model after every iteration by executing the current policy. Thus, it provides an extension to the original apprenticeship learner.
Attributes
mid
The module’s unique identifier. type
This learner is of type offline. Methods
choose_action
(state)Choose the next action execute
(experience)Execute learning specific updates. learn
()Learn a policy from the experience. load
(filename)Load the state of the module from file. reset
(t, **kwargs)Reset the apprenticeship learner. save
(filename)Save the current state of the module to file.