mlpy.planners.explorers.discrete.SoftmaxExplorer¶
-
class
mlpy.planners.explorers.discrete.SoftmaxExplorer(tau=None, decay=None)[source]¶ Bases:
mlpy.planners.explorers.discrete.DiscreteExplorerThe softmax explorer.
The softmax explorer varies the action probability as a graded function of estimated value. The greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates.
Parameters: tau : float, optional
The temperature value. Default is 2.0.
decay : float, optional
The value by which
decays. This value should
be between 0 and 1. The temperature
to decrease
over time with a factor of decay. Set this value to 1 if
should remain the same throughout the experiment.
Default is 1.Notes
The softmax function implemented uses the Gibbs distribution. It chooses action a on the t-th play with probability:

where
is a positive parameter called the temperature.
High temperatures cause all actions to be equiprobable. Low temperatures
cause a greater difference in the selection probability. For
close to zero, the action selection because the same as greedy.Methods
activate()Turn on exploration mode. choose_action(actions, qvalues)Choose the next action. deactivate()Turn off exploration mode.