mlpy.planners.discrete.ValueIteration¶
-
class
mlpy.planners.discrete.
ValueIteration
(model, explorer=None, gamma=None, ignore_unreachable=False)[source]¶ Bases:
mlpy.planners.IPlanner
Planning through value Iteration.
Parameters: model : DiscreteModel
The Markov decision model.
explorer : Explorer, optional
The exploration strategy to employ. Available explorers are:
EGreedyExplorer
With probability, a random action is chosen, otherwise the action resulting in the highest q-value is selected.
SoftmaxExplorer
The softmax explorer varies the action probability as a graded function of estimated value. The greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates.
By default no explorer is used and the greedy action is chosen.
gamma : float, optional
The discount factor. Default is 0.9.
ignore_unreachable : bool, optional
Whether to ignore unreachable states or not. Unreachability is determined by how many steps a state is are away from the closest neighboring state. Default is False.
Raises: AttributeError
If both the Markov model and the planner define an explorer. Only one explorer can be specified.
Attributes
mid
The module’s unique identifier. model
The Markov decision process model. Methods
activate_exploration
()Turn the explorer on. create_policy
([func])Creates a policy (i.e., a state-action association). deactivate_exploration
()Turn the explorer off. get_best_action
(state)Choose the best next action for the agent to take. get_next_action
(state[, use_policy])Returns the optimal action for a state according to the current policy. load
(filename)Load the state of the module from file. plan
()Plan for the optimal policy. save
(filename)Save the current state of the module to file. visualize
()Visualize of the planning data.