mlpy.mdp.discrete.UnknownBonusExplorer.update

UnknownBonusExplorer.update(model)[source]

Update the reward model.

Update the reward model according to a RMax based exploration policy. States for which the decision tree was unable to predict a reward are considered unknown. These states are given a bonus of RMax to drive exploration.

Parameters:

model : StateActionInfo

The states-action information.