Autonomous learning requires that a system learns without prior knowledge, prespecified rules of behavior, or built-in internal system values.  The system learns by interacting with the environment and uses occasional reward signals to determine which actions are the most rewarding. Until recently, such systems could only use hierarchical reinforcement learning or its derivatives with added artificial curiosity.  While such systems can learn efficiently in the environment that does not change its rules as a result of the agent’s action, they are not efficient in the environments where as a result of the agent’s action the environment changes either making it easier or more difficult for the agent to accomplish his goals.

If the environment changes dynamically as a result of the agent’s action, the agent should be able to observe such a change and learn from it.  For instance, if the agent is rewarded for heating up his house, he may want to pick up wood to burn it.  But as a result of his action there is less and less wood and it takes longer to find it in order to maintain proper house temperature.  But the agent may discover that when he is out of the wood he can buy a full load of the wood from sawmill.  The agent does not get a reward for this action, yet it helps him to modify the environment to his advantage.  Since he now needs money to buy the wood, he may learn that working in the factory may give him money he needs. Notice, that the agent not only learns proper actions, but introduces new goals (buy wood, get money) for which he is not directly rewarded.

Black Box Environment Scenario

Black Box software