mlpy.planners.explorers.discrete.SoftmaxExplorer¶
-
class
mlpy.planners.explorers.discrete.
SoftmaxExplorer
(tau=None, decay=None)[source]¶ Bases:
mlpy.planners.explorers.discrete.DiscreteExplorer
The softmax explorer.
The softmax explorer varies the action probability as a graded function of estimated value. The greedy action is still given the highest selection probability, but all the others are ranked and weighted according to their value estimates.
Parameters: tau : float, optional
The temperature value. Default is 2.0.
decay : float, optional
The value by which decays. This value should be between 0 and 1. The temperature to decrease over time with a factor of decay. Set this value to 1 if should remain the same throughout the experiment. Default is 1.
Notes
The softmax function implemented uses the Gibbs distribution. It chooses action a on the t-th play with probability:
where is a positive parameter called the temperature. High temperatures cause all actions to be equiprobable. Low temperatures cause a greater difference in the selection probability. For close to zero, the action selection because the same as greedy.
Methods
activate
()Turn on exploration mode. choose_action
(actions, qvalues)Choose the next action. deactivate
()Turn off exploration mode.