Creating self-modifying code is not that hard. At first the current file is read into the memory. Then the modification is done. This is done by converting the content into a list and back into a string. Then the string can be written back to the harddrive.
The first impression in Robotics might be, that some kind of magic algorithm is available which is able to control a robot. Such an algorithm isn’t there, instead the programming of robots is an iterative workflow which contains of substeps. Let us describe the creation of a robot system in simple steps.
At the beginning, the programming of a robot simulator is recommended. Such a system is able to simulate a robotarm on the screen. The underlaying technology are mainstream programming languages like Python and out-of-the box physics engines likes bullet. With a robot simulator it is possible to create a task, for example with creating some boxes, a robotarm and now the human operator can move the arm manually.
The next step on top of the simulator is to program an annotation engine. That is a module which is printing out information in natural language on the screen. These information might help the human operator. An annotation engine is not able to control the robot, this is the task of the human operator, but it is able to formalize domain knowledge. An annotation engine is organized as a hierarchical event taxonomy. All the events are detected. This can be a colliding object, an action which was started by the human, a high-level even like the fulfillment of a subgoal or the recognition of a trajectory. The annotation engine can only print text and visual information to the screen. It is able to create a verbose layer ontop of the game.
Now the robotics project contains of two parts: a simulator and an annotation engine. What is missing is an AI controller. Such a module can be realized with model predictive control for creating a gametree plus a solver for selecting a node in the tree. Model predicting control means, to use the created model of the annotation engine and predict future events. For example, to analyze what will happen, if the human is executing a certain action.
From a technical point of view, the robot control system is complete. It contains of four parts: a simulator, an annotation engine, a model predictive control engine and a solver. Such a system is able to control a robot autonomously. The precondition is, that all the parts are implemented and tested for a certain domain. Most robotics projects doesn’t have such an advanced system available. To write a simple simulator is hard, and creating the other parts too. What is available in reality is a non-working robot simulator, a failed project with an annotation engine, plans for model predictive control and a imginary solver which isn’t ready right now.
The good news is, that for easy tasks like pick&place robot the complete pipeline is available. That means it is possible to program all the parts. For more complex domains like human walking or dexterous grasping it is more complicated to create such a pipeline. The reason is, that in reality a grasping task contains of hundreds of events. And implementing all in software is a large scale science project.
The problem with the protege OWL editor is, that the software can’t be downloaded from the repository of Fedora and on the official website the download starts only after a registration form is filled. Other OWL editors for creating ontologies have the same problem. Additional the documentation makes no sense and there is no hint why ontologies are a here to stay.
Let us go a step backward and describe what the idea behind Protege is, and how to simplify the process of ontology engineering. Usually an ontology is created to store domain knowledge in a hierarchy. This is useful for automatic event recognition and textparsing. Full blown OWL ontologies are the most complicated way in doing so, a lightweight version is a taxonomy stored in the json syntax and non-hierarchical tags. A tag is some kind of annotated information. The idea is that a parser is analyzing the content and if he detects something a tag is stored near to the original information.
The remarkable feature of tags is, that they doesn’t need an ontology. Both, tags and ontologies, can be classified as datacentric metadata. The idea is structure existing information, for example blog posts entries, a fulltext document or a video stream.
The main reason why the usage of Protoge and similar tools is not recommended is because the advantage of simple tagging is small. In theory a full blown OWL model is better than 20 simple tags, but the effort for creating ontologies from scratch is huge. Let us make an example.
In the pong game, the ball is near the block. The idea is to analyze the scene, extract domain knowledge and put the event in natural language to the screen. For doing so, two modules are needed. First the event parser itself, that is computercode who is measering the spatial relationship between the ball and the block, and secondly a tagger. The tagger takes the result of the parser and stores the recognized event near the timeline.
The principle was original used in compiiler design. For example the Python parser has to identify if a statement is if,for or a variable and the result of the parsing is stored in a parse tree. Exactly this principle is used for semantic annotations. Ontologies and protege are not the only way for doing so. Instead the pipeline can be much easier.
Thinking annotations from the user GUI perspective
If somebody asks for a tutorial for the Protege software he will find some example in which it is explained how to created classes and things within Protege. Such handbooks doesn’t make much sense. The better way of understanding ontologies is the other way around. Suppose, a OWL file is already there and connected with a database, what will the enduser see on the screen?
He will see a first class database. Something what is used on shopping website in which the user can search for categories. He can enter a term, or he can search by a price range. And he can filter the result page to see only books but no dvds. The ontology and tagging mechanism in the background makes the access to an existing database more user friendly. The question is not how to use the Protege editor, the question is how to build the database frontend.
The most easiest form in creating a database frontend is Python/tkinter and a sqlite backend. The resulting GUI is equal to the domain knowledge. That means, the buttons and textfields on the screen for searching in the database are providing an understanding about the domain. The user sees a tag cloud and can search for certain annotated data.
Suppose we have created in the Box2D physics engine a robot gripper who is able to pick&place objects. On the first impression, the software is working great. But something is missing: an explicit domain knowledge description. This kind of machine readable information is not there or it stays within the program of an Artificial Intelligence. The first step is robotics control is to make the domain knowledge visible.
A good way in doing so is to create a database which contains of tasks, subgoals, events, actions, objects and relationships. Let us make some examples to explain the reason why. Suppose, the robot gripper touches with the box. The result are triggered events. The object “robotgripper” gets a spatial relationship with the object “box”. And this action is part of a larger workflow called “grasp the object”. It sounds a bit unnatural to explain this in detail but this how semantic annotation is working. The idea is to describe the situation in terms. That means that all the actions have a meaning and it’s up to the parser to detect it.
The nice thing about events is, that they are separated from actions. The semantic parser can be working while the robot is controlled manually. That means, there is no need to program a dedicated AI only the event tagger is enough. From a technical point of view, there are two layers: first the physics engine itself which is calculating the force of the objects. And on top of the Box2d engine the event parser is active which translates the actions on the screen into natural language. If the verbose message is shown in the status bar, the domain knowledge is made explicit. That means, the human operator sees what he is doing now. There is some kind of sense in his actions.
An event parser is basically a data-to-natural language convert. It takes numerical values as input and returns a description in structured English. If a scene is described in English the domain is formalized. That means, the natural language description which can be utilized by an observer. This can be a human or a textparser in software. On the first look, this seems to be like double work, because from the beginning everything is already digital. So why do we need binary data into binary nowledge? The reason is, that lowlevel data provided by the Box2D physics engine doesn’t contain a meaning. The engine knows only, that the gripper’s position is (200,100) but the system is not aware, that the name is “gripper” or that a gripper can grasp the box. Such a meaning is provided by a different layer, called domain ontology. In the literature this is called sometimes grounding, because the scene is connected to natural language.
Grounding is different from Artificial Intelligence. Grounding means to take input and convert it into output. A grounding module takes numerical information and translates it into textual description. Sometimes it is enriched with additional information which have to do with the overall task. That means a single pick-operation is useless, but in a larger workflow it helps to fulfill subgoals.
The pictures shows a small box2d game which was utilized before. The human operator has the obligation to control the plate under the ball and prevents that he is leaving the system. He can rotate the beam left or right. Controlling this game manual is always possible. Then it is called a normal game. The more interesting question is how to control the system by a software program better known as Artificial Intelligence.
The first step towards this aim is to create linguistic features. The ball can be in the middle, left, or right. The ball can standstill, moveslow or movefast. A handcrafted feature set is recorded next in a textfile:
position, velocity, angle ============================ middle standstill normal middle standstill right middle rightslow right middle rightslow right right rightslow normal right rightslow left right rightslow normal right rightslow normal right rightslow normal rightside rightslow normal rightside standstill left rightside standstill left rightside leftslow left right leftslow normal right leftslow normal right leftslow normal right leftslow normal middle leftslow normal middle leftslow normal middle leftslow normal middle leftslow normal middle leftslow normal middle leftslow normal middle leftslow normal left leftslow normal left leftslow normal left leftslow normal left leftslow normal
The textfile contains all possible game states. The amount is low, because the game state is encoded as categories. What can be used next is the ID3 decision tree learning algorithm. The algorithm goes through the textfile and identifies logical pattern. He will recognize under which situation the human operator has to rotate the beam and in which situations not.
Let us take an example: in the recorded gameplay, the ball was on the rightside. In such a situation, the human operator has made the decision to rotate the beam into the opposite direction, because he prevented that the ball is falling down. This movement is given in the table. In angle of the beam was set to “left”.
For an easier understanding, we can analyze the table by the beam angle. We have three classes: beam is normal, beam is rotated left and beam has an right angle. That is the steering movement seen on the screen. Now the question is, under which condition each of the beam angle has to be realized. This is explained by the other two features, namely ballposition and velocity. From the data it is possible to generate rules which will replicate the control-pattern. A possible rule would be:
if ball = rightside and vel=rightslow then rotateleft
The idea is to use machine learning together with id3 algorithm and grounded features of a game to create a controller which is able to imitate human actions.
Regularly reader of the trollheaven blog are perhaps familiar with GUI interface shown in the screenshot above. It is the same used in the project before. Instead of balancing a ball, this time is a jumping robot in the middle. The surprising information is, that realizing different kind of robots has nothing to do with Artificial Intelligence directly, but with the preprocessing steps. Let us go into the details.
Before an AI -controller for a robot can be realized some kind of simulation environment is needed. A typical feature of such environments is, that they are manually controlled. That means, the system has no autonomous mode, but the user has to press keys and type in commands for moving the robot. Somebody may think, this the simulation environment is not a big think and is motivated to ignore the step, but in reality the environment is more important than the AI controller itself. What we see in the screen shot is a combination of Python/tkinter, pygame, box2d and a parser for motion primitives. All elements are important, it is not recommended to leave out the Tkinter gui or any other part.
Such an environment can be used to realize different kind of robots. In my first project a ball on beam situation was created, now a jumping robot is visible. It consists of a torso which is the larger box and two rotating cubes. On the cubes legs are mounted with a linearjoint. That means, it is not a lifelike dog, but a simplified version. The interesting feature is, that after entering the jump command, the linear joints are stretching fast and this let the robot jumps in the box2d engine. It looks highly realistic.
From a perspective of history, it is copy of the famous Marc Raibert jumping robots at the beginning of the MIT leg lab. I’ve seen the old videos and rebuild the structure in my robot simulator. Let us focus on the TK gui right at the screen. This amazing forms is the core of the simulator. It shows information, for example the current frame and the angle of the legs, but it’s also possible to enter textcommands interactively. The concept is not new, it was used before in the Karel the robot project which was a famous education tool for teaching structured programming with Pascal and Java. The idea is, that the robot has motion primitives which can activated. And the user has to enter the primitives in the textbox. For example, if the user enters “rotrback” the right leg will rotate backwards by a small amount. This allows the user to interact with the robot at the screen.
From a programming perspective the overall simulator isn’t very complicated. The project has exactly 400 lines of code so far. The GUI, the pygame, the box2d, the parser and the words are all included in this amount of codelines. What is missing is the automatic mode. That means, the AI itself isn’t available. Such an AI can be created by putting the words in a program, similar to the “Karel the robot” challenge. For example, the user lets the robot jumps, and checks in the air if the angle of the legs are right and then he executes another action. Like i mentioned before such a routine is missing right now. But it can be constructed easily.
What i want to explain is, that the robotics simulator is more important than the AI-controller which runs in the simulator. If the environment was programmed well it is easy to build on top the AI engine. Most robotics projects fail, because the environment is not working. The funny thing is, that a jumping robot is very easy to build. Because the jumping is not the result of the controller, but the jumping is calculated by the Box2D engine. Let us take a look, how jumping was realized in the robot simulator:
speed=-3 self.setmotor(2,speed) self.setmotor(3,speed) time.sleep(.1) speed=0 self.setmotor(2,speed) self.setmotor(3,speed)
In 7 lines of code, the linearjoints of the Box2D engine gets for a short time a signal and then they stop. As a result, the box2d engine calculates, that the robot jumps into the air. That means, it is calculated by the physics engine and not by a highly developed AI. Entering the jump command in the textbox will activate the routine, and in the background, Box2D determines everything else.
In the Box2d vocabulary a jumping joint is called a PrismaticJoint. It can move in a linear fashion back and forth. The spring effect is the result of moving the servo with high speed. In mechanical engineering, such a device is called “linear actuator with a spring return mechanism”. It produces a unidirectional force and make every robot jump. That means, the jumping is not the result of wishful thinking or motivated by Artificial Intelligence, it is simply a mechanical feature. What can happen during the jump is, that the robot will loose his balance. That means the robot is not able to land on his legs but in a wrong way. This has to be adjusted by an AI-controller.
Somebody may suggest, that enough tutorials about using the version control system “git” are available. Nope, there is something which is explained in them wrong. A typical git tutorial explains in detail, what a branch is. Without any need, this make things more complicated. In the following text, a simplified git tutorial is given which needs only two commands and is working great for 95% of the users.
We are starting the journal to agile development by creating a new project folder. Then we are using the command #1 which is “git init”. This will create the hidden folder “.git” which can be made visible with “ls -lisa”. It will hold the version history and grows quickly together with the project. Now the project folder is ready to take some new files, which are created from scratch. After some example files were created we can use git command #2 which is the commit command but includes adding all files to the git tree.
git add --all && git commit -m "comment"
git add --all && git commit -m "comment2"
git add --all && git commit -m "comment3"
From now on, we will need only git command #2, which is the “git add & git commit” statement. The only thing which has to be modified is the comment section at the end. The workflow contains of making some changes in the project folder, for example add some text to a file and then the git command is executed. Other commands are not necessary.
The interesting question is, what will happen if the first project version is done, and we want to create version2 which is more stable. In a classical git tutorial this would be explained under the term branch. But it’s not a good idea to use branches as directories. The better idea is to ignore the git features and use the normal UNIX tool for creating subfolders, which is “mkdir new-version”. That means, all the files are stored into folders and the only available git branch is the master branch.
The advantage is, that there is no need for switching branches, merging branches and resolve conflicts. Instead, all the commits are executed by the same command, given before. The resulting version logfile looks similar to what the Wikipedia community is using. That means, all the users are writing the changes into the same master branch and there is no need to merge something between branches.
If somebody is working alone on a project this is the best practice method, and small teams will love the “single branch” idea too. Instead of creating branches, every user gets his own working directory. If somebody is trying out new things, he is executing the command “mkdir user2-prototype-may07”. That means, git is used like a filesystem but not like a project management tool. The advantage is, that apart from two simple commands “git init” and “git commit” no other actions are needed.