mlpy.planners.discrete.ValueIteration¶

class mlpy.planners.discrete.ValueIteration(model, explorer=None, gamma=None, ignore_unreachable=False)[source]¶

Bases: mlpy.planners.IPlanner

Planning through value Iteration.

Parameters:

model : DiscreteModel

The Markov decision model.

explorer : Explorer, optional

The exploration strategy to employ. Available explorers are:

EGreedyExplorer

With $\epsilon$ probability, a random action is chosen, otherwise the action resulting in the highest q-value is selected.

SoftmaxExplorer

The softmax explorer varies the action probability as a graded function of estimated value. The greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates.

By default no explorer is used and the greedy action is chosen.

gamma : float, optional

The discount factor. Default is 0.9.

ignore_unreachable : bool, optional

Whether to ignore unreachable states or not. Unreachability is determined by how many steps a state is are away from the closest neighboring state. Default is False.

Raises:

AttributeError

If both the Markov model and the planner define an explorer. Only one explorer can be specified.

Attributes

`mid`	The module’s unique identifier.
`model`	The Markov decision process model.

Methods

`activate_exploration`()	Turn the explorer on.
`create_policy`([func])	Creates a policy (i.e., a state-action association).
`deactivate_exploration`()	Turn the explorer off.
`get_best_action`(state)	Choose the best next action for the agent to take.
`get_next_action`(state[, use_policy])	Returns the optimal action for a state according to the current policy.
`load`(filename)	Load the state of the module from file.
`plan`()	Plan for the optimal policy.
`save`(filename)	Save the current state of the module to file.
`visualize`()	Visualize of the planning data.