Electronic mail: the hidden giant

In the age of Web 2.0, Google and Facebook the good old e-mail is often called outdated. According to the numbers, e-mail is more important in the Internet than all other services together. But what exactly is the difference between e-mail and the World Wide Web? The world Wide Web also known as the Internet is some kind of online-community. That means that anybody can read a website. If the content was published under a URL it is possible to access the document from all over the world. The communication is called one-to-all. A single person writes a blogpost and 7 billion people can read it. This ie equal to what in the past was called the Gutenberg galaxis or mass communication. The information is created for an anonymous audience.

E-Mail works different. Per default, the information is send to a single person. The E-Mail which was received by an example user, can not be read by the other people on earth, except the user is forwarding the e-mail to his blog. E-Mail is a one-to-one medium. It is equal to what the postal service is about. Even many millions letters were send day by day, the public is not informed about the content in the letters.

Let us make an example to explain the difference between e-mail and the WWW. Suppose someone is entering the URL of WIkipedia in his browser. He will see the startpage. This website looks the same for any person. That means, user1 has the same WIkipedia like user2. Information stored on Wikipedia are shared content, it is some kind of blackboard for a worldwide audience. But what happens if user1 is checking his e-mail Account? Who else can see his messages? Answer: nobody else. E-Mail is encrypted and it is not accessible by the public. E-Mall is a protected communication stream between sender and receiver. It is not shared with the world.

On the first look, a protected secure communication seems to be more important. But in reality, it is the other way around. E-Mails contains in most cases, information with a small lifespan, for example an invoice or a status update. They were read by the receiver and deleted after a while. Even large companies which receive a lot of e-mails will delete them after 10 years or so, because the information in the mails are no longer important. On the other hand, public information in the internet for example blogposts, youtube videos and academic content have a higher value. They are interesting not only for a small audience but for everybody. Usually they will not deleted, instead academic papers get archived for eternity.

WWW vs. E-Mail

The world wide web supports static language. Even if a modern website is based on the PHP scripting language it is always some kind of archive. The user uploads information to the internet and then the other people can read it. In contrast, E-Mail supports workflow based communication. That means the message is send away to a different person. E-Mail has the obligation to coordinate group decisions, while the WWW contains declarative knowledge in which no decisions are made. A typical e-mail example is a customer request to a company, while a typical WWW example is a news article.

Around the e-mail is some kind of mystery. Even it is the most dominant communication medium in the Internet, less research is done. Apart from Enron Corpus no other large e-mail archive was made public. That means, it is simply unknown who e-mail is used in reality. Sure, everybody has send e-mails, but linguists experts struggle to describe the phenomena. Only speech in the public internet, that means anonymous communication is well researched. The reason is, that blogs and websites are accessible as default. The so called social networks (Twitter, Facebook) have a mixed status between E-Mail and the WWW. A facebook message is semi-public. That means it is not completely hidden like an e-mail but it is not part of the normal WWW.


Modelling paraphrasing with natural grounding

Communication which supports group working is often discussed on real life situation. Mainly because humans are the only species who are familiar with natural language. The problem is, that the setting is very complex and it is hard to focus an speech acts which have the purpose of framing, that means to modify the discussion itself. At first we need a simpler model which is language based but can be simulated in a computer. A good starting point is natural language processing especially this one which is grounded in computergames.

Let us make an example. We have a textadventure in which the player can move his character with natural commands: left, right, up, down, takeobject and putobject. He can now form a sequence of actions which using the words in the right order, for example: 1. up, 2. up, 3. pickupobject The advantage over a real life situation which contains natural language too is, that we can predict the system behavior. It is possible to send random actions against the parser and observe what will happen. This option is only possible in a simulation, it makes no sense to test out this in a real life situation.

At first I want to describe a normal goal oriented speech act with grounded language. The agent is an maze and has to fullfill a task. He has to go to the middle, take the object and bring it to the goal. The problem can be solved with the correct sequence of action primitives. If we are modifying the task and put the object not in the middle of the maze, but at the corner, the correct action sequence is different.

What i have not explained until now is, that this task is a dedicated single user task. That means, the agent gets a task, has to figure out the correct plan and execute it. It is not necessary that the agent coordinates his behavior in a group. This assumption simplifies the plan finding but it is not very realistic. In most cases, that agent is not like Robinson Crusoe alone on a island but part of a larger group and has to coordinate his decisions with others. Can we model group working with grounded language too?

At first it is important to define group working. Group working means, that the agents are communicating over a proxy, which is called group moderator. He has the obligation to distort language, which helps to agent to cope with contradictory situations. A contradictory situation can occur, if it is unclear what the location of an agent is. If the agent doesn’t know his own position he is not able to plan his actions. That means, the starting point and the goal have a small noise and the task of the group moderator is to communicate with agents to help them to make better decisions.

The speech process works as a dialog. First an agent speaks, then the moderator. Then speaks the next agent, then the moderator and so forth. Agent1 makes a plan, which is the best he can from his skills and his knowledge about the environment. He assumes that the plan “up, up, pickup, up, up” will solve one of the tasks. The moderator looks on the plans and interprets the plan. He can say: “under the assumption that your an position A the plan will work. But perhaps you are on position B. Think about it”.

It is very hard to describe exactly who such a moderated speech act has to be generated. In the linguistic theory there is a description available called paraphrasing. The idea is to translate a sentence into a slightly different sentence. Paraphrasing works with the “Chinese whisperer” principle, that means it is speech which is some kind of workflow:

agent1: "up, up, pickup, up, up"
moderator: "He said: up, left, pickup, up, up"
agent2: assumes a certain position of agent1

This is only a reduced example of what paraphrasing means. The main idea is that while the transition from agent to the moderator a small mistake is injected into the sentence. That means that the moderator has technical well understand what the agent said to him, because the cable with connects them works with 100% accuracy. But he misinterpreted him because this is his job. But what exactly is the algorithm which forced the moderator to change the second up into a “Left” command?

In discussions between humans or in political campaigns this is very difficult to answer. Because real language is much more complex. In our simplified model it is possible to find an answer. Our model has a limited number of possible actions and the sourcecode of all the agents is known. To understand the misinterpretation (or we can call this moderation) of the moderator has to do with his role in the group. The moderator knows, that he takes from agent1 the input and produces the input2 for agent2. That means, if he wants to control the position of agent2 he must feed him with a certain speech act.

To understanding speech manipulation better we must make clear what the purpose of moderation is. Let us assume that framing and moderation is always bad, because it is not allowed to modify speech. In this situation the moderator is not allowed to do any kind of distortion, instead he must repeat the input without any modification:

agent1: "up, up, pickup, up, up"
moderator: "He said: up, up, pickup, up, up"
agent2: assumes a certain position of agent1

This time, no moderation was visible. That means, the proxy moderator gets an input and transfers this to the agent2. He is out of the game. The result is, that all agents are working as individuals. That means, the moderator is like a cable which transmits messages and the agents can be sure that the cable works. It is equal to an unmoderated group. The agent is posting something on the blackboard, and a second agent finds the unmodified messages. The decision making works by individuals. That means, the group has no moderator, every agent is his own moderator.

goal oriented group discussion
The above described example with grounded language in a simulation is great for explaining how goal-oriented moderation works. The most important part of the group is the moderator. His obligation is to slightly modify language with the aim, that the group is reaching a certain goal. For example, the moderator should bring his group in the right corner of the maze. He can not give a direct order to the agents, instead he must modify packets which are send as speech acts in the group. The moderator has a model of agent1 and agent2. The question is: what kind of dialog is necessary that both agents are going to the right corner?

Finding an answer to the problem is surprisingly simply. The moderator must first simulate possible workflows, and the selects one of the planed workflow to get the information how to distort incoming speech. Let us go into the detail what a workflow simulator is.

A workflow is dialog between the group members: agent1 sends a message to moderator. Moderator sends a message to agent2. Agent2 sends a message to moderator, moderator sends a message to agent1 and so on. After a while, one hundreds message are send, and the agents have sometimes execute some actions. The overall group is in a new situation.

And now we can take what will happen in workflow 2. This time, the moderator is changing his behavior. He adjust his role and modifies incoming messages with a certain principle. The result after 100 send messages is, that the group is a different situation then in situation1.

Under the assumption, that the moderator can predict the behavior of his agents, he can plan which kind of speech moderation is needed to bring the group into a certain state. This is equal to work delegation. Because the moderator is not able to reach the goal alone. He needs his agents to fulfill a task. That means, he is physical weak and his only choice is to communicate right with the group. Work delegation is equal to distort language by moderation.

How to framing a discussion

In management theory and in political debate there is often explained what framing means. It is some kind of manipulation strategy to move a message forward. That means, the person who frames is not part of the normal debate, but uses a language which is able to change the discussion itself. But how exactly works framing? This is explained in the following post.

The possibility to manipulate groups has to do with the fact, that workgroups are everywhere. A company contains of groups, a hospital, a school class and the voter groups are a team too. The term group means, that all the individual in it are walking in the same directions. All voter of party A are voting for that party, and all employees of Enron have to do with Energy, but not with software development. The term group manipulation is a bit misleading, because it assumes that a group works different from the purpose that all have the same idea. No that is not the principle. If not all employees of Enron are thinking about how to earn money with energy, Enron is no longer a group. That means, a group can be there or not.

Let us explain who group communication works. It is a workflow which runs through a group. The workflow has to do with individual communication between people. The people are forming a hierarchical “chinese whisperer” decision chain and at the end, all of them have the same opinion. The question is: how to modify the communication workflow in a group? If this is possible it is called framing.

The workflow is not driven randomly. A hierarchical structure is visible in every group. That means any company has a CEO and any school has a principal. In a modern social media dialogue the leader of a group is called the influencer. He has the power fo interact with the group. Not in a sense, that the leader is delegating tasks, but in a sense that the leader has a role in the group. He has the obligation to moderate the decision making in the group and this includes also to moderate the choice for a new influence if the old one becomes obsolete. The process in doing so is chaotic and is often topic in the evening news, when scandals and internal conflicts are become visible.

The surprising information is, that it is not possible to frame a group into a certain direction. Even the leader can’t do this, because modern groups are too large, to complicated and to much connected with other groups. Even if the CEO of Enron is telling their employees what they need to know, they are watching the evening news too, and if they have another reality, they are warned. Humans have so kind of natural nonsense parser integrated, which prevents them to believe everything. No, the only way to frame a group is to raise the communication workflow as an amount.

Let us go into the details of how to do this. Suppose the enron company has per day in their intranet 10k e-mails who are sending back and forth. If we want to frame the group, it is necessary to double the numbers or more. That means, per day, 30k e-mails is the new standard. What the result of the increased traffic will be is unclear. Sure is only, that after the framing took place, the group will act differently.

In general the amount of traffic measured in packets and the efficiency of a group is the same. More communication is better then less communication. I’m in doubt if it is possible to frame a group and leaving the amount of communication the same.

Framing on the topic level

What is difference between a speech act within an existing structure and a speech act which is able to modify the group structure? The linguists are calling the phenomena a language code. The first one is called monolog and the second is a dialog. Let us go into the details.

Suppose someone has to write an essay about robotics. He will write down what the servo motor is doing, what a microcontroller is and that he has programmed in C a program which let the robot follow a line. This speech act is a legitimate commit in the domain of robotics. It will be recognized by the group as valid.

Somebody may think, that a paper in which the topics of robotics was described bad is the opposite to that, but even a bad paper is a valid contribution. It can be discussed on a topic level too. For example, the spelling can be wrong or the C program can have bugs. This contributions was also generated with the topic in mind.

Now I want to describe how a dialog works which will frame the discussion itself. This time, the speech is different. Not the language itself, it is also English, but it is switch from elaborated language code which is used to describe knowledge into a workflow oriented language which is used for process needs. If framing of a discussion is the main goal, then a workflow oriented language is stronger. But what exactly is “workflow language”?

Workflow language is a space in which the groups is coordinating. In contains of persons, people and dates. Workflow language is spoken on the mailing list of the robotics group. It is not about the difference between 8bit and 16 bit microcontroller but about if Timmy or Andrew is the better programmer.

Workflow oriented language has a small audience. An E-Mail written by the boss of a company is not for everyone but only for a small amount of people. Not because the E-mail is full of company internal secrets, but because the boss tries to moderate something. Let us make an example. In Wikipedia it is written that Enron is involved in a scandal. The language code in Wikipedia is formal, that means it is written for the public with the aim to write an abstract emotionless reality description. Somebody may think, that the boss of Enron is not able to frame the Wikipedia article. But he can. He can write the following e-mail to his employees.

“All of you have probably read the Wikipedia article in which Enron was presented as a company under fire. They have written that Enron is near a bankruptcy and will probably lost his property. End of E-Mail”.

Let us read the message again. It is a short e-mail, but where is the framing? Or let me ask differently, is this e-mail trying to frame something, which means to direct the Enron employees into a certain direction? Yes it is doing so. How exactly this takes place is hidden behind the lines. The difference is that such a e-mail is written in a different language. It is called a workflow language because it has to do with decision making in groups.

I want to describe the phenomena on a more abstract level. The language code in the public internet (which means what is indexed by Google) is topic oriented language. If someone want’s to know, if an 8bit or 16bit microcontroller is the better choice he can enter this keyword and browse through the result list. Such a language can not frame group discussion, it is language owned by nobody and which is forming the so called Gutenberg Galaxis. It is the shared knowledge of the world.

In contrast the language used in IntraNet of company is workflow language. Sometimes the public has fulltext access to it, sometimes not. The difference is, that in personal communication no meaning is visible. That means, it makes no sense to use a search engine for finding information in the internal e-mail communication of a company. Sure, the grep tool (or any other search engine) will find some e-mails, but they have not the same language like documents in the Gutenberg galaxis. The reason is, that the intranet of a company is used for workflow coordination. That means, the employees are forming not an online community, but a social network. And their main topic are feelings, persons they know, and tasks they have to organize.

To make the difference more explicit let us describe the inner working of a webcrawler. The original meaning of a webcrawler was introduced by Altavista. It is a perl script on a UNIX workstation which is retrieving the index.html file of a webserver and follows all the links. The content is stored in a fulltext database with the purpose to search in the content. The Google Search engine works with the same principle but it is more powerful than the Altavista tool.

Webcrawlers are a useful tool for indexing texts in an abstract language. It make sense to let robots search for PDF files, html files, plain text files and blogs. All the content is copied into the database and can be searched. Google makes a great job and provides access to the knowledge which is free for everybody. Let us now describe what will happen if we are using a webcrawler to index e-mails. On a technical level it works. The crawler created an e-mail archive which is simply a .zip file which is 100 GB in size. And now what? Sometimes the situation is discussed under the perspective of data protection, for example that it is not allowed to create e-mail archives. But let us assume a case in which the information is available, for example the Enron corpus or if the system administrator have access to e-mails. What is can we do with the e-mails? The surprising answer is: nothing.

Let us describe what exactly the problem is. On a technical level workflow communication contains English words, which can be searched in fulltext. The problem is, that the search will not find useful information. A search engine doesn’t help to understand the linguistic code. Even the enron corpus was published as Open Access, and even it was described in many papers, nobody has decrypts the information until now. Not because the e-mails are encrypted, no they not. The enron archive is compressed as a simple “.tar” file. The problem is that a fulltext search engine is the wrong tool to analyze workflow language. The reason why a group communicates has nothing to do with topics, but with something else. Unfortunately I must admit that i have no alternative to offer, which helps to understand workflow language more easily. I only can say, that a normal “grep like “ program will fail, the result list is useless. The hypothesis is, that the information has to be structured not by topics but by people. That means, if person A is often cites person B this is a meaning. And if Person C is often referencing to the meeting room this is also valuable knowledge.

Emergeny management in “The Sims”

In the business literature the topic “Workflow management” is discussed on a theoretical basis. Sometimes psychology like theories were used to explain the behavior of individual. That is a good starting point to focus on the problem of how to organize a group but it is possible to increase the understanding with more modern tools. In the following blogpost, I want to explain how to use a computergame as a training simulation for emergency management.

Instead of programming the game from scratch, it make sense to use COTS software for example “The sims”. After installing the game on the harddrive we need an emergency scenario, preferably with lots of emotions in it. No I’m not talking about escaped cat but about ambulance and flood crisis. The game starts with a problem, the people are running around and argue what they can do. Now it is important to manage the situation. This is only possible with group working. The best tool in organizing a group is to track them. Them means, the group in the game. If the simulation has only 10 participants it’s fine, because in MS-Excel the maximum number of rows is limited. The first item is the name of a person, and the second item is called activity. In the activity columns all important events are write down by date. That means, if the manager is talking to person B, the result is write down in the excel sheet. After a while and lots of meaningful interactions the group structure becomes visible in the excel sheet. It allows the manager to organize the group much better. He knows not only who is friend of whom, but he knows also which person has said what to which other person in the past.

On a formal level such a group-tracker is only an excel file. It contains of 100 kb and is some kind of group diary. But it is very useful because it allows the manager (and everybody other who has read access) to become familiar with the group. Somebody may ask, what a name directory has to do with crisis simulation. Isn’t an ambulance game not about fire departments and extinguishers? Not exactly. In a single player mode perhaps, but in a group work the task has to be solved via communication only. That means, the problem is not to splash water in the flame, the question is how to explain the group what they should do. Like I mentioned above, the group can be called them, because they are containing of individuals who are interacting, or not. Who have desires, or not and the group has to be tracked.

The complexity of such a list will raise, if not one manager is responsible but more then 1. That means, manager 1 creates a list of people in the game, and manager 2 creates also a list. The names are the same, but the comments are different. Now the problem is how to merge this list into a global directory. That means that under one name, the entries are visible from manager 1 and 2. Realizing this with Excel is a bit complicated, but with manual copy and paste it is possible to merge two spreadsheets into one.

How important is E-Mail at the workplace?

In the normal Internet E-Mail is playing a minor role. It was introduced a long time ago, before the World Wide Web was started in the 1990s and was later replaced by dynamic websites which is mostly PHP generated Websites like Blogs, Wikis and Online-Forums. Some online shops are using e-mails today as a fallback mode if the normal php generated website doesn’t show the order status and some marketing companies are using e-mail to send advertisement but as a summary we can see that in the public internet E-Mail is dead.

But let us take a look into a subfield of the internet which is not visible for search engines: the intranet in companies and the e-mail messages between individuals. The surprising fact is, that the situation here is the opposite. That means, since the 1990s, the amount of daily send e-mail has growing and the trend is positive. That means future communication will be done with more e-mails ever. In the area of intranets and individual communication there were invented some possible replacements for e-mails, notably live chats and social networks but until now, they were not able to replace e-mail.

What we can observer is, that in the public Google based internet which contains of Wikis, Blogs, PDF papers and online-forums, e-mail is death. That means nobody is using it, and a normal PHP dynamic website is the better technical answer. While in the hidden cooperate intranet and for individual communication, e-mail is the quasi standard and has replaced all other technologies.

Understanding the difference only on the basis of internet technology doesn’t make much sense. Perhaps we should give it a trial. A web 2.0 website in the public internet is generated with a Apache Webserver who is using the PHP scripting language together with a MySQL Database. Often the system runs on Linux. In contrast, most e-mail systems are working with groupware servers which is often the Microsoft Exchange server. The problem is, that this description doesn’t explain the linguistic aspect of spoken language and it makes no sense to give more details.

A much better approach to draw a circle around the phenomena of e-mail is the Enron corpus. Which is the most important example of public available e-mail communication. The first fact is, that the Enron corpus is different from the information which are available in the normal public internet. That means, the language is different. E-Mail is a medium for individual and group communication, while the public internet is used for formal communication about topics.

In the public Internet hierarchy and groups are not important. That means Wikipedia doesn’t have any kind of boss, and online-forums doesn’t work with group thinking. Instead the information are speaking for themself. They are searchable with fulltext websearch and can be cited later. This type of information is not recommended for everything, it is only the communication in the public internet. In cooperate intranets the workflow is different. Here is hierarchy very important, the content isn’t searchable in fulltext and if somebody is citing a previous post he has usually the goal to make the social hierarchy clear.

The linguistic difference isn’t made clear by the research papers available today. In recent publication the difference is called “social networks vs. online communities”. According to this definition, the public internet with Wikis, online forums and blogs is an online community. That means there is no social hierarchy, instead the information speaks for themself. While the e-mail communication in a company can be called a social network, which is mainly the interaction of the same people who are working on shared goals. That means, the typical e-mail is send with the aim to produce a decision. For example a decision about who want’s to attend a meeting.

Which communication style is better? The answer is, that both have a different purpose. It is not possible to mix them. From a technical perspective both communication can be run with Internet-Technologies which is TCP/IP. That means for a router there is no difference between e-mails and port 80 communication. But from a linguistic description there is a massive difference.

Why is e-mail important?

Companies are using e-mail as a communication tool. They have recognized that it has a lot of advantages. E-mail is firstly an individual communication between people and right now it is the best one. The more important question is why companies need individual communication? The answer is based on the definition of management. Management is equal to let people working in groups and this is only possible with communication. E-Mail is the standard form of groupware.

The next question is: why need companies groupworking? The answer is because companies are manual cooperation. They are based on manual labor which can not be executed by robots. Sure every company has the aim to increase the productivity and replace people by machines but especially for new industries in the service sector and in information management this is not possible. The average companies is based on humans who are communicating about a topic and create decisions together. E-mail is the key to understand the inner working of companies.

What is unclear right now is how to increase E-Mail productivity. Some management expert of the past have identified e-mail as a productivity enemy, because the average employee gets 100 e-mails every day and needs 20 hours in the week to answer all the mails. But right now, no better working alternative to e-mail was found. The ongoing experiments with activity streams, cooperate wikis and cooperate social networks have shown, that e-mail can not be replaced by something which is better.

Increasing the e-mail productivity is not based on technical aspects like faster E-Mail server or a better working client than outlook. From the technical situation all these tools are working great. The bottleneck is the linguistics of the e-mail content itself. That means what the people are writing in the e-mails and how often they send an e-mail. This topic isn’t research very well in the literature. Some papers have analyzed the Enron corpus with statistical pattern search algorithm, but they have not recognized any useful advice for improving communication. That means the situation is unexplored.

In my opinion the future is, that apart from the Enron corpus different e-mail corpus from other companies will be released for scientific research. Right now we have the problem, that the researchers doesn’t know what e-mails are send in a company so they are not able to research the linguistic aspect. That might look be unfamiliar, because from the public internet everything is known. It is possible to enter in the Google Search engine any keyword and we can read the messages. In the case of e-mail no such search engine is available. This has to do with the inner working of e-mail. E-mail is according to the definition an individual communication. It not written for the public but for other people and for groups. That means, an e-mail is secret as default while a scientific paper is open as default.

Social aspects

From a technical perspective E-Mail is trivial. Programming an E-Mail client with C++ is a bit advanced but nor really unexplored. The same is true for E-Mail servers and the infrastructure to send the messages. After some initial problems in the 1990s in which the first business ready e-mail software was created the e-mail software was improved very fast and todays programs are mature. What is unclear are the social aspects. That means the question what e-mails should be send, which not, what somebody should put in the header and why e-mail is so important. In the introduction I have tried to answer some of these question. The main idea behind E-mail is to enable group working in a company. That means it is possible to define some key factors which are equal for every e-mail.

It is not allowed to send an e-mail to a super-large group, for example to the whole world. From a technical perspective it might be working, this is called spam, but from a social perspective it makes no sense. E-Mail was invented for individual communication, that means it some kind of time delayed chat over internet. The freedom of the user is surprisingly great. He can decide who should receive the email and he can decide what is written in. The open question is how to match this opportunities to the social role somebody is playing in a company. That means, a boss is using a different kind of communication than a small worker.

Let us investigate who e-mails experts are using the medium. An e-mail expert is somebody who has a long experience with the medium, is using it very often and is able to reach his individual goals. It is allowed to call managers in a company an expert. A manager is somebody who acts successful in the company, that means he is able to communicate with all the people in a productive way.

The key in understanding e-mail is comedy. Comedy has the obligation to make social aspects clear. The right way to understand e-mail is to make a comedy sketch about the e-mail behavior of a manager. This would make the situation more explicit.


There are some trends visible. At first, e-mail is preferred by all companies for internal communication. E-Mail is cheaper and faster then paper based letters, this is important for international companies. At the same time, the difference between individual communication through e-mail and medialike informations in the public internet is larger then ever. That means, today’s email in the intranet are more personalized then in the past, and public internet information ins Blogs and Wikis are more abstract and topic centric then ever. It is likely, that this trend will go on. That means, the gap between individual communication and mass education will raise quickly.

Let us make the gap between individual e-mails and public internet more obvious. The transition can be tested out with the “reply all” button in outlook. Technically it increases the number of receivers. Somebody who is not familiar with e-mail may argue that this isn’t a big thing, because information should be free. And in the public internet everybody can read the information. But let us make an experiment. What would happen if all message is replyed to all? Right, the chaos is there. The problem is, that information filtering in e-mails works different from information filtering in the public internet. In the public internet, Google is the information filter. That means, somebody is entering a keyword and gets only the websites he is interested in. E-Mails are working different. The information filter are other people, that means it is controlled with the “To” field. It is managed in each e-mail differently. And that is the main difference between online communities (=public internet) and social networks (E-Mail communication).

E-Mail at the university

An motivating subfield of e-mail communication is the case of sending e-mails in a university. That means, not in the intranet of a company but in the intranet of a university for individual communication. At first, the language sort is different from public internet information. It remains a workflow communication with the aim to work in a group. At the same time, a university is place of higher education which is grouped around academic texts. And this mix allows a very new insight into e-mail.

I want to explain the difference in detail. An acedemic paper which is published at Arxiv is topic oriented. That means, there is no hierarchy or social structure, instead the information speaks for it’s own. In contrast, e-mail communication with students and a professor follows group thinking and a social hierarchy. This is similar to e-mail communication in a normal company.

The Enron corpus – first impressions

While surfing on the Internet I’ve recognized the Enron Corpus, which is an cooperate E-Mail in the size of 400 MB. In the literature, the Corpus is some kind of standard example, it was the first and until now the most impressive publication of company’s internal e-mail communication. At first it is important to make clear what it is not in the .gz file. The Enron corpus doesn’t contain scientific information which is available over Google Scholar and Arxiv. That means, there are no academic papers or valuable information in it. Instead all the e-mails have the context colloquial workflow centric communication. That means the English language in the e-mails was used by professional interaction in a company for managing time, resources, tasks and hierarchies.

It is a bit hard to analyze the data in detail. As far as i can see from different Google Scholar paper, many researchers have done so in the past. Usually they have used automatic parsing and machine learning. But perhaps the better approach is to simply manually read the information? I’m not sure, right now I’ve downloaded the file itself which is 423 MB in size. After unpacking the file the directory structure is 1.4 GB. It contains less information than expected. As far as i can see from a first overview, the zip file contains 150 folders, each of them was created by a person and contains their outlook content. That means the received and the sended e-mails. Sometimes the attachments are also in the corpus. We are not talking about thousands of Enron employees but only 150 of them. In theory it is possible to analyze the communication entirly by hand. This is what I’d like to do in the next week or so. I will read through the E-Mails, note down the person list, and try to memorize all the names. If this gives new insight is unclear, but the Enron Corpus seems the ideal starting point for such an analysis.


A short look into the Enron Corpus showed, that the complexity is higher then expected. The amount of people in the dataset is higher then only 150. Because all of the employees have send E-Mails to further employees and these persons are not available in the corpus. That means the 150 people are only the subset of a much larger communication flow.

The most interesting question right now is the textflow itself. As far as i can see from the e-mails there are no formal documents like in an academic paper, in a blog, in a wiki or in the newspaper. Instead the communication is person centric. That means, Larry has send an E-Mail Becky because he want’s that only Becky reads the information but not everybody else. This is the most difference to normal text which are posted to the internet. If somebody is posting to an Online forum or to his weblog he want’s the that whole world is reading the information. In the Enron Corpus the intention is to support the workflow of the company. That means it is has to do with persons who have roles in the company and scheduling their worktime.

I don’t know what the correct term in linguistic is for separate between two text groups. But in the context of social networks there is made a difference between an online community and a social network. An online community is topic centric, for example Stackoverflow is an online community. That means, people posting content there for an anonym audiance. They are not interested in getting information from a certain person, but they discuss a technical problems with strangers. The names are given by the Stackoverflow forum software but it doesn’t matter. In contrast, social networks are working different. They are person centric. That means, it is a communication between friends. The Enron Corpus can be called a social network but not an online community. That means it is not a discussion about energy trading in a simulated environment, but the idea is to talk with people in the same company and not with the whole world. It is some kind of intracommunication which is different from public communication in the media.

Browsing through the messages is surprisingly easy. A simple “cat *|less” is all what is needed. If this command is exectuted in the inbox folder of a person, all the messages are shown in one large file which can be scrolled with cursor keys. In the communication style it is very interesting that each paragraph starts at least with a firstname and sometime more names are given. A typical example could be: “Maik has said … and Lucy has argued …”. Like I mentioned above, it is not about a certain subject it is about what a person thinks.

The next important question is: is it possible to increase the technology? That means to use something which works better than e-mail, for example a wiki or a blog? IMHO the answer is no. A blog or a wiki are communication systems which are subject oriented. That means, everybody can read the messages. It is equal to a media. What the enron corpus is about is person-centric communication like in a social network. That means in theory it is possible to replace E-Mail by a webfrontend, but the result would be the same. That means, instead of the Outlook software a webfrontend can be used to manage the e-mails. It is not an improvement, and perhaps some of the e-mails were already send through a webfrontend.

I think the most important difference to a weblog is, that in the Enron Corpus the amount of receiver is limited. Sometimes the amount of receiver is 1, sometimes 10 and sometimes 40. But in all cases, the amount is restricted. I would suggest that a direct result of this person-centric text-distribution is the language style which is always about persons. (“Maik has said .. and Lucy has argued …”).

The open question right now is: what is the difference between Enron social network and an online-community communication. I would guess the difference has to do with persons. In an online community the persons are not important. They are acting in an abstract gutenberg galaxis in which somebody ads information to a topic. For example, Blogger A can post a text about Robotics and Blogger B is writing a comment. The names of the persons, their social status and what they are doing isn’t important. In contrast, the communication at Enron is grouped always around people. Not strangers are sending e-mails to each other but person who know each other since 1 week, since 1 month or since 10 years. Perhaps a simple example.

In the Enron E-Mails it is very often that somebody want’s to go to vacation or is talking about last vacation. In comparison to Stackoverflow, there is no such a vacation topic. I’ve never seen on Stackoverflow that a user is asking other users for vacation. The reason is, that this is his private space in which nobody is interested. In contrast, in the Enron E-Mail it is quite normal to talk about vacation because it is part of the business. If somebody isn’t at his workplace he can’t answer e-mails and the other employees have to take his work. I wouldn’t call this an Enron specific attitude, it is common for any company which is driven by humans. The more general question is: what is the difference between the communication in a company and at Stackoverflow?

In a previous paragraph I’ve called this the difference between a social network and an online-community. Here the summary:

Online community: is talking to the public, that means 7 billion people can read the information

Social network: is talking to 1 person, 10 person or at the maximum to 100 persons. Not to everybody.

Online community is topic centric, that means it is about robotics, computing and software engineering

while social networks are person centric. It is about what Lucy has talked to Maik, and what Ann thinks about Jim.

The interesting aspect is, that social networks are not about private topics as default. There are much online communities out there which are talking about private topics, for example an online forum about food, travel or clothes. And in contrast, most of the e-mails in the Enron dataset are about cooperate needs. That means the people not talking about food and travel, but about the last meeting and shared goals for Enron’s future. The difference is more on the linguistic level:

social network: not more then 100 receivers of a message, person centric

online community: receiver is the world, topic centric


In some publications, a cooperate Wiki is recommended as an alternative to sending E-Mails in the office. But what does that mean for the Enron Corpus? Is it possible to convert the E-Mails into a Wiki? No, it is not possible. Because a Wiki has similar to a newspaper and an academic paper the goal to inform the public, that means everybody. A wiki usually contains topic centric information, for example about how to use a software right. This kind of information is different from the Enron corpus. Here the aim is to express workflow knowledge which is person centric. That means a Wiki isn’t personalized to a certain receiver or a certain group it is created for the complete company.

Sure, company internal wikis, blogs and forums are great. But they can’t replace E-Mail. The only thing what can replace E-Mail in theory is a so called enterprise social network. Sometimes it is called the advanced form of E-Mail, because all the user have a photo of them online. The communication style in an enterprise social network is the same. That means, a message is send only to a small amount of people not to everybody and it contains person centric information, for example “Maik has said and Lucy has argued”. The difference between an Enterprise social network and E-Mail is very small. A raw dump of a social network will result into the same language style like a dump of the e-mail traffic. In most cases the difference has to do with the filter options. E-Mail is per default not every effective to suppress information, in a social network it is possible to block certain users and to search in the archive for older postings.

Why companies don’t like teamwork but employees do

From the perspective of a company, team work is equal to occupy more resources for the same work. If the aim is to build a house, the cost effective idea is to let only one worker at the same time build the house. He gets the order to complete the project in 1 month, so he must very hard to fulfill the goals. The costs are low, because the worker only gets the average salary. From the perspective of the worker the situation is the other way around. He gets a super-natural task and his salary is compared to the project low. What the worker likes is to require more resources for the same task. He gets 100 helping hands which are building together with him the house. Each of them gets money for the job and together is workload is lower.

Now we are switching back to the company perspective. If not one worker but 100 are doing the same, the house is build ontime but the costs for the workers are higher. This results into a loss of productivity. So it is a bit funny to hear, but successful companies are not interested in team work. They are trying to limit the amount of ressources and trying to convince the worker to do the same job with less workforce. But what will happen with the house building project if not 100 workers are doing the job but 1000 are available over the period of 1 month? For the worker it will be a great time. They will complete the task and they have enough time for telling jokes and make party. From the perspective of productivity the situation is worse. Every worker who is occupied by the project increases the costs. The problem is, that the new workers are welcomed by the old one because they help to reduce the load. Because for the worker it makes no matter how high the costs are, it is not their job to calculate the profit. They are only building the house, the calculation is the task of the management.

Why I’m telling this story? Because teamwork is not supported by everyone. Workers like groupworking because it reduces their stress, while the management doesn’t like teams because of the same reason. What the management wants are highly productive workers that is equal to a low amount of ressources. If less people are doing the same job, the productivity is much higher.

Let us observer the workforce what their strategy is to increase the team. At first, most workers are teamplayers. That means, they are loving to work with other people and if the team is bigger they will like it more. Every external new worker which increases the circle helps the old workers to stay calm. That means, they can work less. So it is their natural interest to remain a healthy atmosphere which welcomes newbies and prevents any kind of anti-social behavior. They people don’t want that their team becomes less powerful and consists of less personal. They want to increase. A bigger team is equal to more power, more ressources and less work for the individual. He can hide behind the team. That is the reason, why communication at the workplace works out of the box. There is no need to explain the workers the advantages of teamwork, they know the benefit for themself. Teamwork is equal to working less. The individual can work a bit slower without risking his job. He knows, that more people are equal to more ressources which means the team can do more complex things in a short amount of time. The limit in teambuilding is not social conflicts but the accounting department. If they are notice, that 50 workers of 100 have nothing to do, they could decide that the time is too big. They want to reduce the team, they want to lighten the group.