The inverted pendulum problem is cited very often in the literature about robotics. The reason is, that it is a simple underactuated kinematic chain which can become very difficult to solve. The inverted pendulum problem is from the same category like “Ball on a beam” task. In both cases, the system has an actuator which provides a force, but the system has also a freely movable joint which can’t influenced directly. How can we solve the issue elegant?
The first thing to do is not to focus on a certain technique like neural networks or q-learning but the first thing to do is define the difference between the AI controller and the prediction engine. The AI controller is something which can control the pendulum. It is activated and it will bring the pendulum into the upward position. The more interesting part is the prediction engine which is a subpart of the AI controller. The terms in the literature are sometimes a bit confussing. In control theory books the prediction engine is often cited as a “system identification”. In gaming oriented communities this software module is called the physics engine. If we want to solve the inverted pendulum problem we have to focus on this part.
The first question is, why do we need to predict the system, if we only want to figure out what the correct motor control command is. Isn’t it more easier to create the AI Controller directly, for example in the form “If angle newangle=-10, velocity=-2
What does that mean? It describes the behavior of the pendulum. It predicts a future state. On the left side of the equation is the current status given. On the right side the consequence. The funny thing that from a perspective of an AI Controller, this prediction is useless, because it doesn’t answer the question how to bring the pendulum upwards. But with a detailed look the equation is more powerful than expected first. Because it answers the given what result a given control signal will produce. In the reinforcement learning literature, the concept is called model-based reinforcement learning. The model is the prediction model.
But why do we need such a model, isn’t the environment in which the pendulum task is moving able to provide the physics engine? Yes and no. It is correct, that Box2d can simulate a pendulum. This is called the environment or the game. But it is not possible to create the controller directly.
The picture shows the principle of model predictive control. There are two physics engines available, the main one at the bottom which simulates the game. And a second one which has to be created from scratch and the process in doing so is called “System identification”. The good news is, that after the system identification was successful, the inverted pendulum problem is solved, there are not further challenges. The prediction model can be used very easily for building the AI controller. And the AI controller will bring the system into any desired state. What exactly is system identification? In the game theory, there are two problems discussed, the first one is how to play an existing game, for example tictactoe and the second one is how to learn the game rules for an unknown game. Learning the game rules is equal to system identification. A game is different from an AI controller. A game provides options. That a possible action sequences. For example, the Tictactoe game provides the option to place a figure on one of the 9 fields. The the game engine will mark the field as non-empty, which reduces the options for the next player. A prediction engine for the inverted pendulum works the same way.
Let us make the advantages of a prediction engine more clear on the problem of driving a car. What programmers sometimes belief is that a self-driving car has to provide a certain reaction. For example, if the traffic light is red, the car has to stop. This kind of input-output relation ship is shown in reality, that means, the car approaches the situation, the light is red and as a consequence the car will stop. But this kind of rule is not what drivers are using for generating the action. What a good driver is doing is to calculate alternatives. He answers what-if-question. A possible case is the question what will happen, if the car doesn’t stop at the right light. If the aim is only to generate the next action, this question makes no sense, but if the idea is to understand how driving works than it is important.
Suppose, we are ignoring potential future states of a system and describe only what a car should do in each situation. The result is, that world is no longer predicted accurate. The problem is, that in the reality, each driver should stop at the red light, but he has from a physical perspective more options. He can – in theory – decide to ignore the red light. The traffic system in general won’t prevent such cases. If a driver moves on the cross even he is not allowed to do so, the game won’t freeze because the driver has made a mistake. In contrast, the traffic system will run ahead. The result is an accident. To the explain it more clear. If a large group of poential actions are ignored, the system identification is not complete. It is important to say for each situation what the outcome is, no matter if a car has stopped at the light or not.