Last week, I wrote an article and used the phrase “turning data into information”. I stopped and realized that on an intuitive level I knew what this phrase meant, but I started thinking what actually distinct data from information?
After a quick Google search, I realized that the term ‘information’ is differently defined and used across different disciplines like computer sciences, linguistics, biology, history, or theology. After some more re-search I found a communications engineering-heavy definition of information, which helped me to see the difference between data and information.
Please be aware that this definition of information is created by a creationist to prove that there is no evolution. As critics have pointed out, transforming this communications engineering definition of information into the area of biology and evolution is an attribution error. I also struggle to follow the reasoning that there is no evolution (based on this information definition). However, this is not a blog about beliefs and I found it helpful to understand the difference between data and information.
Basically, there are five levels, which I want to briefly introduce.
The statistics of an information interprets the descriptive part based on quantitative measures. For example, questions around count and composition of characters, frequency of character combinations and so on are answered. It relays heavily on the mathematical theory of sets.
The syntax of an information summarizes all structural features. This level does not look at content or meaning, but focuses on sign systems, which are used to code the information. It also includes formal and informal rules about possible combinations of characters and strings (of characters). Additionally, morphology, phonetics and vocabulary are looked at. If all aspects are given, it is called a language. Even though there a lot of different languages all of them have sign systems and agreed-on rules.
Only through language it is possible to transmit and store information. Actually, sender and receiver of an information (humans, animals and technical systems) need to know the aspect of syntax if they want to understand the information.
The semantic of an information describes the meaning. Meaning is basically the invariance of the information. Even if the statistics and syntax of an information changes, the meaning stays the same. For example, if you encode an information into a natural language, a blueprint or programming language, the meaning stays the same.
The pragmatism of an information focuses on the aspect that information transfer always includes the sender’s intention to produce some result from the receiver. The sender might openly formulate his intention, but it might also be implicit within the message.
Apobetics describes the result or purpose of the information itself. Each information contains a purpose or objective of the sender. For example, the sentence ‘buy our great new product for only $10!’ requires the receiver to go to the store (action part, pragmatism), while the intended purpose is that the receiver of the information buys the product.
With this definition, I think the differences between data and information become more evident. Data and information have both statistics and syntax in common. Both can be statistically analysed and there must be syntax so that the receiver and sender of data and information can understand each other. Even semantic is common for data and information. Data can be represented in a lot of different format; the data stay the same. For instance, a height deviation during the paving process might be shown as number on a screen, but also as a blinking arrow. It is still the same data.
However, if we get to the aspects of pragmatism and apobetics, it becomes evident, how data are turned into information. For example, current engine data from a ship in an ocean most likely does not create any kind of action from you as the receiver (the pragmatism aspect). You also do not care about the purpose of the message. However, if you get engine data from your machine on a jobsite, this turns into information. If you see unusual data, you start actions to figure out what is wrong with the engine by remotely connecting to the machine or calling the operator (the pragmatism aspect). The overlaying purpose of the information is to avoid unexpected downtime.
This might all sound rather theoretical, but there is a good take away from this reflection on data and information for me. If you a use a control system for a construction machine or a telematics systems, which can provide heaps of data, make sure that only the data are shown and used, on which you need to act on and which help you to solve a higher purpose. Then you turned data into information. Otherwise it is just data clutter.
What do you think? Did I miss important aspects which turn data into information? I am keen to hear from you.