GSO-2011-Jung

GSO-2011: Tobias Jung

Empowerment for continuous agent-environment systems

In this talk we will discuss a recently introduced extension of the empowerment framework proposed in earlier work by Polani et al. [Klyubin, Polani & Nehaniv 2005, 2008] to agent-environment systems with continuous multi-dimensional state spaces. Empowerment is a information-theoretic quantity that is motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but is also motivated from considerations stemming from curiosity-driven learning. Empowerment measures, for any agent-environment system with stochastic transitions (which includes deterministic transitions as a special case), how much influence an agent has on its environment, meaning to what extent an agent can affect the future development of the system through its own actions. Empowerment can be seen as an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, for example, it allows us to identify salient states using only the dynamics of the environment, and it can act as a natural intrinsic reward as opposed to having humans synthetically define an external reward. However, in this earlier work empowerment was limited to the case of discrete and small-scale domains and furthermore state transition probabilities were assumed to be known. In the work of [Jung, Polani & Stone 2011] which we will discuss here, empowerment is extended to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilites. We show how empowerment can be defined for continuous state spaces and then approximately calculated by Monte-Carlo integration. In addition, we show how unknown transition probabilities can be learned by Gaussian process regression and predictions be made by iterated forecasting. We then go on to present some rather interesting practical results: we examine the dynamics induced by empowerment in a number of well-known and difficult control tasks (such as double-pendulum or acrobot inverted balance) and discuss applications to exploration and online model learning.