mlpy.mdp.discrete.UnknownBonusExplorer.update¶

UnknownBonusExplorer.update(model)[source]¶

Update the reward model.

Update the reward model according to a RMax based exploration policy. States for which the decision tree was unable to predict a reward are considered unknown. These states are given a bonus of RMax to drive exploration.