Tutorial for a research workflow

Here is the C++ random generator, which gets started if no one press the vote button:

#include <iostream>
#include <random>
int main() { 
  srand(time(0));
  auto number=rand()%3; // between 0 and 2
  std::cout<<number<<std::endl;
}

Update 2018-10-22

The amount of votes right now is one. That is a good starting point, but I will wait a bit perhaps somebody else wants to press the button too. In the meantime, a new poll gets started for the next article popularity contest. That means, two polls are run in parallel.

Update 2018-10-27

The topic “Tutorial for a research workflow” has won the poll with two votes. Here comes the article.

The Google Scholar website is the most important source for academic information. But how exactly can the website be used for gaining knowledge about a subject? A research workflow consists of steps which are connected together:

1. identify a paper at Google Scholar
2. make notes in an outline editor
3. write a text in own words
4. make an experiment
5. publish the paper

The second step on the list was visualized in the screenshot. The outliner tool is called lyx. Left on the screen, the sections are visible and right the notes are created. It is very equal to draw a mindmap, with the exception that everything is text. It is possible to enter a short sentence, to copy a quote from a paper, to enter a reference to a paper and very important to color important notes in blue.

The advantage of making notes in an outline editor is, that it is possible to scroll back in time. Suppose, the researcher want’s to know, what he has read 7 days ago. He has to scroll only a bit up and sees what his notes are. If all the information are structured in sections it will become more clear.

Such an outline is a backlog which makes it visible what the reading history at Google Scholar was. It is possible to read a huge number of documents and write down what is written in the papers. If the step of “make notes in an outline editor” is combined with others (write text in own words, publish a paper) a dedicated research workflow will become obvious. It is some kind of cycle which can be run over and over and which produces a lot of papers, notes and individual learning experiences. Somebody may argue, that it is not a big deal to create a note list, especially if it’s contains only of three word sentences. But, these words are a specialized vocabulary. Knowing the correct terms is helpful for further refinement of search queries.

personalized search history

What is shown in the figure is an individual reading list through information in the internet. It is some kind of Google search log, but not an automatic generated one but a manually created. The notes somebody has taken, depends on the subject and the individual knowledge background. That means, person A will make different notes to a paper, than person B. The main reason why such a note list has a value is because it helps the learner to understand a topic. The memorizing process become visible.

But how exactly can a note list be transformed into a paper? It is easier than expected, because the notelist isn’t abstract information which are hard to understand, but they are written down by the student himself. That means, he knows what the meaning is of his own notes. So it becomes easy to aggregate the notes into fulltext.

Perhaps some remarks to the Lyx outline editor. The tool was original created for formatting LaTeX documents. But it can also be used as an outliner. The reason is, that the program has left on the screen a hierarchical topic map which shows the sections. This allows to structure a complex topic into small subsections. For example, we want to write something about computers. Then the first step is to create subtopics like software, hardware and networking. And in these subtopics the notes are created. It is the same principle like creating a mindmap but it is a text only version.

The most important feature of taking notes is, that they gets stored in a file. Everything what is written as a note is explored. That means, a student has made the note in the past, the knowledge is already in his brain. If the student scrolls back to the notes from one month ago, he will remember. That means, he can create a paper again. He has only use the notes from last month and he can start writing because it is all in his brain.

In contrast, everything what is not available in the note list is unexplored for the student. That means, he hasn’t researched the topic in Google Scholar and he is not familiar with the subject. If he wants to write a paper about such an area he will probably fail.

Step by step instruction

Suppose a new topic is available which has to be researched in detail. The first step is to create in the outliner software a new chapter. If the topic is bigger, also subsections are needed. Then a first literature investigation is started with Google Scholar, the results are write down as notes into the outline tool. A result can be:

– an interesting point
– a literature reference which can be copied by copy&paste
– a quote
– an open question which was upraising while reading

The next step after making notes is to formulate fulltext. The fulltext summarizes the notes and helps the author to elaborate things. He will also recognize which points remains unclear. After writing the fulltext the next step depends on the individual situation. The fulltext can be published in a weblog, in an academic paper, in a Wikipedia page or in a conference paper. Each publication format has different requirements in style and quality. In a simple blog article, it is not needed to enrich the writing with literature reference. In contrast, this is a must have for a Wikipedia text other wise the opponent admin will argue that the content can be deleted quickly.

Sometimes a literature survey alone is not enough to discuss complex question, then a dedicated experiment is needed. The cheapest way in doing so is to write a computer program and use existing simulation tools. In robotics it is for example a game in which an algorithm is tested, while in mathematics the experiment will by a python notebook in which a formula is plotted. Additionally it helps to visualize the content with graphics software.

Advertisements

A single paper costs 0.5 million US$

Globally statistics about the spending for Research and development are saying that around 1.7 trillion US$ (10^12) are spend every year. According to this table https://www.scimagojr.com/countryrank.php?year=2017 the number of yearly produced scientific papers is 3536878. If we are setting both numbers in relation, a single papers costs 480650 US$. That is not the price for the APC charge the author has to pay, it is the overall costs which includes the writing of the paper.

The number is stable over different countries. I’ve made a separate statistics for European countries research funding and figured out that 529057 EUR have to paid for each paper. So in general we can assume that a single paper produces costs of around 0.5 million US$. Is it possible to reduce the costs? Oh yes it is possible. I would guess there is a potential for an improvement with the factor 10. Reducing the costs further is hard, because writing a paper takes time, and remaining a high quality standard is expensive. Until a paper is created from scratch, peer-reviewed and published a long time period is happened. The most important goal is to reduce the costs for a single paper from today 0.5 million US$ downto smaller values. The result would be, that for the same amount of funding more papers can be written.

Academic journals are a permanent call for papers

The technical side of an academic journal can be explained very easy. A journal is some kind of newspaper about scientific topics, it has the pdf format and contains of text and images. Creating a journal from scratch is possible with open source software like LaTeX and a bibliography manager.

But, this explanation ignores the social context in which academic journals are created. Why someone submits a paper to a journal, why are papers usually (>99%) are created by teams of authors? Why are the prices for journals so high? The answer to all these questions are given in the following blogpost.

Starting a journal is a social role play. On one side, there are researchers from the subject of Artificial Intelligence. They have a long experience with algorithm and software and are able to write a paper in English. On the other hand of the table are sitting customers who have a need for a paper. The customer initiates a “Call for paper”. That is a description of a problem, and a payment how important this topic is for him. In today’s scientific landscape, the customer are equal to the government and large companies who are interested in detailed question. For example a car-company want’s to build an autonomous car and has so questions about the vision system.

After the call for paper is written down, the other side (the researcher) can answer to this call. They are doing the experiments, writing the paper and deliver it to the customer. He reads the results, decides that it is great and pays the money. The overall workflow is interactive, it is a dialogue between the customer and the researcher-team.

What modern academic journals are doing is a permanent “call for paper”. That means, they have an ongoing need for new knowledge. And each paper they published is paid, that means money is transferred from the journal to the authors. How much money costs a single paper? A lot. Let us take a look into the well documented EU FP7 research project. According to the overview https://ec.europa.eu/research/fp7/pdf/fp7-inbrief_en.pdf the total costs are 50 billion Euro. According to a different website https://www.openaire.eu/fp7-stats the result of the effort was 244763 publications which means papers. So a single papers costs 204279 EUR.

Researchers are publishing content in journals and working together in larger teams, because they are paid by the customer. The customer is a research organization which is either the government or a for-profit company who is financing a conference or a journal. Without a customer, research wouldn’t work. That means, a single researcher is not writing a paper because it is fun, but because it is his job.

For the untrained eye the combination of government founded research and researchers who are submitting papers is uncommon. It looks, that the game is faked and that more money is paid then it should be. The main problem in scientific research is to find a customer who is interested in the result. Customers who have a need for a paper are seldom. This can be seen in the visit counter of existing papers. A typical document created in that domain has less then 10 visits worldwide. That is surprising, because on the first hand, technology progress is important for society and on the other hand nobody needs it. To overcome the bottleneck the government and large cooperation have build an artificial need for scientific research. That means, the government defines themself as a research customer who is interesting in the topic deeply. This produces a stable a mount of “Call for papers”. Suppose, the government decides to no longer play the customer in the game. What would happen with the EU FP7 research project? RIght, it would be canceled. The 50 billion EUR are not spend for writing papers. But without a “Call for paper” the researchers are not allowed to fulfill the demand, they are obsolete and doing nothing. The result would a collapsing research system.

Modern societies have build over decades an environment which generates a need for new papers. This is called in the literature research funding and means a social role play which produces opportunities for new researchers to engage in the community. After the EU FP7 project was over, the next large scale project started, called Horizon 2020. The idea is the same. Large scale customers initiate a “call for papers”, with slightly modified topics, and researcher groups are working together to fulfill the needs. After a transaction is completed it gets paid.

Disadvantages

According to self-awareness, the current public founded research landscape works great. But in reality there are some problems visible. The major one is, that the research is done within government, universities and companies but not transparent for the public. That means, a large company has a need for a paper, a large university provides the researchers to fulfill the order and the resulting pdf paper is hidden behind a paywall. Open Science tries to open this process a bit. The first option, called Open Access, is about removing existing paywall. Open Science goes a step further and asks if other customers and other researchers can be included in the workflow. Are large companies the only possible customers who have a need for a paper? Or can the process be financed with crowdsourcing and kickstarter? And are universities the only place in which research can be done, or is a wikipedia like crowd also able to fulfill tasks?

From Polymath to group working – a short lesson in modern science

A Polymath is a single researcher from the renaissance period. He worked outside of groups. Since 200 years, Polymath education is obsolete. That means science is equal with group working. That might be a surprising, because the assumption is, that researchers are not in team play, but the opposite is the case. 100% of all papers at Google scholar are the result of team. In the author field at least 3 names are given, sometimes more. Even phd thesis are the result of group working. That was not always the case. If we are going back to Renaissance, group working was the exception.

But why has the situation changed? A common myth is, that science become so complex that a single person is no longer able to handle all that knowledge. But if a single person is not familiar with Artificial Intelligence, how should become the group invent something important? Right, that is not the real explanation. The correct answer is that group working increases the public attraction. That means, if a group wants to publish a paper and request for money they will be successful. If a single person is trying to publish something he fails. Another advantage of team work is, that the group can make an arrangement before they are going public. That means, they are discussing the theory first internally and only if the group is sure, they are going life and inform the public. This second layer increases the quality of academic work, because the public can trust the scientific community. It is a sign of professionalism if a dedicated peer-review takes place. And only if the peer-review was successful, the public gets informed.

Peer-review makes no sense for a single researcher. A single researcher is according to the definition not in a position to peer-review his own work. Without a group, the quality is lower and this is equal to unscientific work. Team work is mandatory for modern researchers. They doesn’t have to decide, they are forced to work as team players. If somebody leaves a group and doesn’t find a new group, he is no longer a researcher. He can’t publish no more his ideas, he is no longer allowed to speak for the community. Instead he can call himself a retired researcher.

How important is group working in Academia?

Suppose somebody has written a paper, and has formatted the layout. The only question is now how to publish the paper. The answer is: is it not possible. Even if the paper is great nobody will allow him to publish the paper. Why?

The answer has to do how Academia works. The first surprising fact is, that the quality of a paper is not important. That means, an academic paper which is great wouldn’t be published, while a paper which is boring will get published. The reason why has to do with group work. Here is the work hypothesis how Academic publishing works in reality.

If somebody was involved in group work he get’s published. No matter if he has a formal Phd title or not. But if someone has written a paper alone, he won’t get published, even he has a formal phd title. The content of the paper takes no matter.

Let us take a deeper look into papers who are get published in the official Google Scholar directory. The common feature is, that all of them were written by an author collective. At least by two authors, but most paper are written by 3-5 authors. What does that mean? Is group working similar to a better quality? No it is not. Many papers on Google Scholar have a low quality and are not reproducible. Group working means only, that more than one person has contributed content and that the topic was discussed before a publication.

The shared feature of all papers on Google Scholar is, that at least one team member in the author list is an official professor at a university. And he has connections to the publisher. That is the way how Academia works. Let us construct some examples on that hypothesis to evaluate if it is valid or not.

Suppose, a single researcher have written in the last 10 years a really good paper. He has invested a lot of energy and from a content perspective the paper is great. That means, it contains valuable information, brings science forward and is written in an easy to understand language. The funny information is, that this paper will never get published. At least not by an Academic publishers. If the person submits the pdf file to Elsevier or PLOS it will get rejected because the author list contains only a single person. That means, the paper was not discussed with anybody and this is the rejection criteria.

Now let us construct the opposite case. A professor in small university writes together with 2 assistants a paper. The quality of the paper is bad. That means, the team has no idea about the subject. Only the professor has an official title, his two assistants didn’t even have a phd title. This paper will get published, because it is the result of a group work.

Let us explain the situation from the other way around. What professional publication companies like Elsevier and Springer want to see is, that a group of people have written a paper and this group can convince Elsevier to publish the paper.

On the first look this might look surprising because a paper is usually determined by it’s content and not by the number of people who were involved. This assumption is not formalized and in reality it is not the way how the system works. The publication system works not on a content base, it works on interaction. This prevents, that Academia is divided into chunks. If only papers created by groups are allowed, every single person who wants to be part of the system must first convince somebody else to work together with him.

The question is not if independent researchers are allowed to publish their papers, the question is, if a single researcher is allowed to write down his idea. In the current system the answer is no. I would guess, that even a long-term university professor who is well informed about a subject is not allowed to publish a paper alone. That means, without forming a group of at least 3 authors. If he is trying to work alone, he is out of the system. Not because his idea is wrong, but he is not interested in group working.

What exactly is group working? I have no idea. If this become clear it is possible to publish a paper. From an artificial Intelligence perspective, group working is some kind of interactive intelligence. That means, that a work piece is distributed between different persons. Wikipedia is an example in group working. The result is a certain kind of community which is different from the work of a single user. Suppose a workgroup decides to make a Wiki in an intranet and after a year they are trying to publish it. According to the hypothesis this is possible, even if the content has a low quality and the person are not involved in higher education.

I would guess also, that the famous “peer review” system is not a quality control, it is a groupwork control. That means, if a paper was peer reviewed this is equal that the work was not done by a single researcher.

The school of the future

According to older publications, teaching in schools is hard and is based on concepts which can be learned in teacher training courses. No it’s not. It is very easy to transform a school into a super-school and is has to do with buying things which are provided by external suppliers.

Let us research companies and products for the educational sector. What kind of goods a school can buy at the free market? At first, the school building itself, which is usually located in a central place in a city. After the school building itself, the gymnasium is important, which contains of a football field and equipment likes basketballs and stopwatches for running trials. Inside the school building, many other infrastructure has to be bought. For example electrical current, a fast internet connection and PC rooms. What good schools and universities also have are specialized lab-rooms. They can contains 3d printers, music-equipment, a recording studio, a robotics room, a movie theater and so on. The funny thing is, that all the equipment can be bought for money. It is nothing, what somebody has to learn, it is somebody what is provided by external companies.

What the role of human teachers is in school is simple: introduce the equipment to the students. That means, a sport teacher should be familiar with the tennis area, a music teacher should be able to handle the midi-keyboard, while a robotics teacher should be able to program the Nao robot. If the teacher is also able to make formal lessons, the school will be great. That means, the students will love the years, and they will learn a lot.

If this concept has a name in the official literature is unknown. I would it call a infrastructure guided education plan. That means, the idea is to organize the school around hard measurable things which costs money and the teaching itself will follow after the infrastructure is there. Or to make it clear: if the class is in the music-room, every student has a keyboard in front of him, and the teacher is an experienced reggae producer, it is very hard to do anything which isn’t about music.

Some teachers and companies are argue, that Google chromebooks will be the future. No, they don’t. Chromebooks are the standard today, the future is something which works better. This is called Robotics and Artificial Intelligence. That means, the trend is, to use Lego Mindstorms and Nao humanoid robots together with Chromebooks. This helps to improve the education system a lot and makes the student ready for the future. And if a school is already is equipped with robots, the next hot thing are 3d printers to print out the self-designed robot, and if the school already has 3d printers the next big thing are tissue printers to create new teeth for mice in the biology lab …

How to improve education in schools

Some European school teachers are struggling with creating an encouraging atmosphere in schools and the result are frustrated students who are not motivated to learn. But what is the best practice method in education? The main idea is to establish an infrastructure which allows organizations outside of the school to get influence in a pedagogical meaningful way. That means, a learning atmosphere is nothing what a school can provide from within, it is something which is requested by a school to the society.

Let us define some potential stakeholders who are providing tools which improves learning experience. At first hand it is a company who delivers the building itself, that means bright rooms and nice chairs. But also a supplier for food is important for a school this can be a cafeteria supplier or in detail: McDonalds.

After the school building is available and the food supply chain works, we can talk about the technical infrastructure. That is usually a computer network, which means a server, WLAN connections, computers and tablets. Also an Internet service provider is needed for a fast internet connection. If this technical side works, the next step is to analyze which kind of educational material are helpful to support learning in schools. The most famous one is perhaps Wikipedia. Students who are browsing through this encyclopedia for answering questions or simply because they play the Wikipedia game (how to get from word1 to word2?) are doing a lot for improving their skills. The next step after Wikipedia is the introduction of robotics in the classroom. The cheapest way in doing so is the Lego Mindstorms EV3 kit, which costs around 300 US$ each.

Lego is an external supplier which is specialized in providing educational resources to schools. It is like the school building itself and the computer pool, nothing what the school has by itself, but something which is come from the outside. That means, the Mindstorms EV3 kit is provided to the school and the school has to pay money back to Lego. If the school is rich, they will come to the conclusion, that Mindstorms is a nice starting point, but the Nao humanoid robot is the better choice in teaching students programming. The Nao robot costs more (around 10000 US$ each) but motivates the students better. They will love the device, especially because many students have not enough money to buy such a toy for their own at home. That means, they are going to the school because that is the place where they can meet their friend Nao.

To make all this happen in a teaching environment, teachers are needed. That is a person who is familiar with robotics and also with the social role of teaching the subject to students. It is a good idea, If the teachers are better informed about Artificial Intelligence than the students. This helps to build a trustworthy situation, in which students can ask for help, if they have an issue. Especially in the domain of robotics, they will have a lot of them, because that the software doesn’t work and the network connection is broken is the normal case. What good teachers and good students are doing is fixing open problems all the time and this is equal to become familiar with a subject.

What I want to explain is, that a positive learning atmosphere is something which has to be bought from the outside of the school. It is delivered by tech-companies and projects in the internet. A good school has to select between these deals and pay for the services. Or to explain it the other way around: if the aim is to establish a non working school then the school secretary has to cancel the contract with the Internet-service-provider, cancel the contract with the food supplier, cancel the contract with Apple, cancel the contract with Lego/Aldebaran Robotics/Wikimedia and as a result the school will become the worst teaching environment ever with a low amount of discipline and frustrated students who hate the institution.