📜 ⬆️ ⬇️

Trivium Theory of Measurement

Statistics and data analysis implies that all values ​​are real numbers (vectors of real numbers) or can be easily reduced to them. But, for example, in non-parametric and non-numeric statistics, as well as in econometrics, it is very important on what scale the data are taken in order to understand which operations and methods are applicable to them.

The problem with the definition of scales is that they are built by mathematicians, strictly formalizing, which makes it incomprehensible to the majority. For example, in the classic Pfantszagl book, scales are defined as follows:


')
Where with. about. - a system with relationships, and h. about. - numeric with. o., the same as those used in algebra and the theory of normal forms of relational databases. If this is simple and clear to you, you can no longer read, for the rest I will talk about the scales simply and clearly and the substantive importance of understanding this material.

Scale of names (nominal scale). It is used to describe features that can be compared only for equivalence (equal - not equal). Such scales measure, for example, musical tastes, parts of speech, political views. It is important to know that other operations, besides checking for coincidence in such scales, cannot be done, that is, rap fans are simply not equal to Justin Bieber fans, which of them is more abrupt to say in this scale. The numbers here can only be used to classify objects.

In this scale, operations of grouping and classification are also allowed, moreover, most of the classifications are created specifically for such scales.

Order scale, or rank scale (ordinal scale). This scale has all the properties of the name scale, with the addition of an order relationship. For example, we cannot say who is cooler than a fireman or a taxi driver (scale of names), but we can definitely say that the major is steeper than ensign (ranking scale).

For this scale, it is very important to understand that the numbers are used only in comparison operations, they cannot be added or the average is calculated (the general plus the private is not equal to two lieutenants). I will give one more example. Everyone loves jokes like: “After Vasya’s move from Russia to India, the average IQ of both countries increased,” meaning that the average IQ in Russia is higher than in India, and Vasya is not as good as the average Russian. So the concept of "average IQ" is incorrect, since IQ is calculated according to the rank scale and was originally designed so that the values ​​were distributed normally among the population, and in no case can we say that between IQ 141 and 142 is the same difference as between IQ 120 and 121. Just joke correctly: “After Vasya’s move from Russia to India, the average intelligence of both countries increased.”

Difference scale, or interval scale (interval scale). These scales measure dates, Celsius and Fahrenheit temperatures. There is no natural starting point in such scales, although some people will long argue that the countdown from Christmas or January 1, 1970 is quite natural.

Most Big Data presentations begin with a story about a pregnant schoolgirl. Testers have their own bike about airplanes. In short: the American plane crashed in Israel in the area of ​​the Dead Sea due to the fact that its system divided by zero as soon as the height of the plane above sea level became negative. I heard many versions of this tale: either the plane flew upside down, or the stealth jigsaws went off into the sea itself. This bike is not very plausible if it is understood that there is no point in dividing it by the value taken from the interval scale, which is the height above sea level. In fact, try to find a formula in which the Fahrenheit temperature or the breadth of the terrain would stand in the denominator.

For the results of measurement in such scales, it is possible to consider the arithmetic average, carry out correlation and regression analyzes, but it is impossible to consider the harmonic average or the geometric one.

Scale of relations (ratio scale). For such a scale, the presence of origin is natural. Sorry for the pragmatism, but everything that is measured in money falls on this scale. If the date is on an interval scale, then age will be on the relationship scale. It is sometimes said that this scale has all the properties of an interval, but a small nuance: if linear transformations are allowed for an interval scale (multiplication by a pole pole constant), here there are only similarity transformations (multiplication by a constant). Most methods of statistical analysis imply that the values ​​will be on such a scale, so before you feed a package of analysis with numbers, it is important to make sure that there is a natural reference point, otherwise many of the statistical characteristics will be uninformative.

These four scales are nowadays generally accepted; however, when the theory of non-numeric statistics only appeared, many researchers introduced their classifications. Here, for example, is a page from the unpublished book of Tyurin:



The approach of “inventing” your own scales can be productive in many projects. However, it is more important to do a check on the performed data operations and write the corresponding tests before the values ​​are obtained. And remember that just checking the units of measurement (as some programming languages ​​do) is not enough: time and age are measured in the same units.

Source: https://habr.com/ru/post/246983/


All Articles