A milestone in the history of the git-software-development tool is the Tech-talk of Microsoft about adopting git in their own company. “Git Is Taking Over Microsoft, Saeed Noursalehi – Git Merge 2016”, https://www.youtube.com/watch?v=rKgBV4yfK3g It is more than only randomness that at the same time, Microsoft adopted git, they also implemented the Linux subsystem into their Windows sourcecode, so that git acts as a Trojan horse for taking over the Redmond company by OpenSource software.
But i want to direct the focus on something inside the git-migration. Inside the above cited talk Microsoft explains what the concrete problems are. A team has around 500 software-repositories which are all managed with git, and a new developer in the company has to find out which repositories of them are important for a task. And indeed this is a problem from real-life and not only a Microsoft specific problem. The bad answer is, that git itself has no answer to it. Because git only provides git-related features like “git push” or “git commit”. From a technical point of view, the developer can use git for searching full text in all the 500 repositories, but this makes his job not easier.
The problem has to do with information knowledge. The knowledge is stored inside the source-code repositories and the problem is, that the single developer needs access to the knowledge. Solving the task with software is not possible, instead the better way is structuring the knowledge with classical papers. That means, the problem of Microsoft is a lack of documentation. The number of PDF papers in which the developers are explaining what their code is doing is too low. What Microsoft needs is an internal PDF-repository.
The good news is, that git can help to create such a storage. Git can not only be used for tracking c# files, it can also be used for commit changes to LaTeX files. The best-practice method is to document the sourcecode and the projects the engineers are working on in natural language, store the information in PDF format on a server and give the developers a fulltext-search engine for retrieving the information.
Some programmers believe, that access to the sourcecode is enough for others to understand and learn. In the case of Microsoft this is equal to give the developers read/write access to the Microsoft sourcecode. But at the end, it will not work. Suppose the single developers can search in 500 Microsoft internal repositories, will he ever find the needed sourcecode or a helpful comment? Perhaps not. The situation is the same, if someone is searching inside the github repository. At first, it is better than nothing, but the sourcecode alone is not providing knowledge about software programming. The better way of communicating to each other are PDF papers, written in English. They are describing how a software works, and figures are used to make the point clear. If a project or a company has problem with information management, in most cases the reason is a lack of pdf-papers which are describing the internal workflow. Or to be more specific: the costs of creating new pdf papers are to high. Reasons can be:
• culture in the company
• no read/write access to the pdf-repository
• lack of understanding who to write a paper
• no full text search engine available for retrieving existing information
The idea is to use the versioncontrol system git for reducing the costs of creating new papers. Writing a paper which documents the written sourcecode should be as normal as writing the code itself. And it makes even more sense to write a paper in teams. This helps to collection information which was distributed over many persons before. After the paper is written the information is stored in the pdf file and can be read by new developers.
Using git for managing LaTeX documents is not very common right now. In most cases, git is used only for programming projects. The main advantage is the same like using git in software projects: the activities of the users can be tracked. If someone is using git for writing his papers, the “.git” folder can be analyzed by statistics programs. That means, it is possible to say how many words someone has written in a week, and how many commits he has done to a LaTeX file. Writing a paper without git is technical possible, but then the writing process can’t be analyzed. Instead, the pdf-paper is one day ready and no one knows, in which timeperiod it was created.
Picture 1 shows the result of gitstats. The number of added lines is very high, because it was a Lyx project which was tracked.
In Picture 2 the statistics for each filetype is given. 80% of the edits were done inside the .lyx textfile and 20% inside the bibliography (.bib file). The interesting aspect of using git for tracking scientific writing is, that we get detailed statistics for every week. In theory, it is possible to say how much commits were done on a single day in a year for measuring the individual productivity of an author. I would guess this is the most interesting potential which is not used today very often.
The disadvantes of using git for sourcecode management and LaTeX projects has to do with the possible effects on the human workforce. Traditionally, only the blue-collar workers are tracked with a stopwatch to measure their productivity. In contrast, the white collar workers, in which programmers and scientific writers are part of, remaining untracked. The typical programmer today is not carefully measured and the same is true for the typical pdf-author. Both are working under special conditions which gives them the freedom not to be tracked. And this is the reason why git is so powerful. It can change the situation. With git, also the white collar workers (and especially the high-skilled phd students) can be tracked on hourly basis. If someone is committing every 10 minutes his progress into a git repository, the boss can analyze the statistics in detail and gets clear information about what the worker is doing all day long. This explains, why today git is not used very much in LaTeX projects, because most authors are trying to prevent such transparency.