Sometimes it is interesting based on the data to try to build some kind of essential mathematical model of the phenomenon in order to be able to simulate the events taking place, when there is no possibility to conduct numerous field experiments taking into account various factors, situations or contexts. But not all the same only in the neural networks to drive the data and see what happens.
Introduction and formulation of the problem
The example considered below is a sufficiently gaming example of building such a model. It does not address issues of data validation , the representativeness of the sample against which the model is built, and the validation of the model itself on similar samples. Naturally, when building these models, such questions come to the fore, but much has been written about this. In this case, voluntaristically it is assumed that everything is representative and verified (and this is not so) in order not to distract from the essence of the process. Those. maybe it is representative, we still have to think about which general population . So, what is it actually about?
Vkontakte was created a survey that combined two questions with three choices of answers: How many snobs do you have about your subjective feeling? ')
Lot
Average
Few
How much do you like social interactions and any movement?
I'm always for, just give me
Everything is fine on average
Social interactions annoy me.
Naturally in the survey were all combinations of heterogeneous options. I wanted to check the banal hypothesis that people who think that there are many snobs are not directly torn into society. Well, or something else. That's what happened.
Baseline data (survey results)
Men (Table 1)
Social
Pofigistic
Annoying
Many snobs
6
five
6
Medium snobs
9
ten
7
Few snobs
eight
22
eleven
Women (Table 2)
Social
Pofigistic
Annoying
Many snobs
four
3
four
Medium snobs
17
18
five
Few snobs
ten
12
ten
Hypothesis and qualitative description
For a start, you should see that there are large deviations from the uniform distribution for men with a “small number of snobs”, and for women with an average. At the same time, in men, with a small number of snobs, pofigism is strongly pronounced, while in women, with an average amount, there is very little irritation.
And everything seems to be somehow incomprehensible, why it is and what it means. Therefore it is worth considering the cut by the number of snobs in percent:
Table 3
Many snobs
Medium snobs
Few snobs
Men
20%
31%
49%
Women
13%
48%
39%
The first anomaly of this table is that it was assumed that in the framework of this sample the theoretical distribution of snobs for different sexes is about the same. But in practice it turns out significantly different for different genders. Therefore, it can be assumed that the table reflects not some real distribution of snobs (which are not present in nature (distributions)), but subjective gender representations within the sample regarding the presence of snobs around.
Go ahead. The highs are almost half for samples of women and men. And here we can see that these maxima just coincide with strongly abnormal (other than uniform distribution) results in the survey. At the same time, it is also obvious that the difference in the number is still large enough not to explain such pronounced anomalies by the insufficient number of people in the samples.
What can be assumed about this?
It can be assumed that those who voted for the most popular answer, somewhere deep inside themselves, knew that they voted for the most popular (for their gender) answer. How "deep" that this knowledge manifested itself in the fact that they voted with an "anomalous" distribution, and not uniform with respect to the general two-dimensional distribution. All people who "do not belong" to the majority voted with a uniform vote. Those. The person inside himself realizes whether he belongs to the majority or not. And a person who belongs to the majority has some characteristic “properties” in relation to assessments of society, which minorities do not have. In this case:
Men, the same as most men (on the issue of snobs), are nihilistic about soc. activity (abnormally many voted for pofigistiichnost)
Women, such as most women (on the subject of snobs), tend not to be irritable ( anomalously few voted for irritation ).
“Understanding” of its “strangeness” makes the choice of certain social properties random.
Naturally, all these ismyshlisms concern only a sample from a survey, whatever it may be.If you wish, this is all, of course, you can check more formally, but again, somehow, some other time and in another magazine (preferably with some good impact factor).
Model
This is all great, but I want to make another model so that she considers it all. And this is how it can be written down with the help of icons, transferring the natural understanding of what is happening to the language of the random variable generator.
The degree of subjective perception of the number of snobs around for each person is naturally modeled by a random variable with a lognormal distribution .
For men
For women
The model assumes that the degree of philanthropy varies from 0 to 100 (Owing to the distribution, it can climb for it, but not much). Accordingly, the values are interpreted as:
S [0-33] snobs many
S (33-66] snobs average
S (66, ...] little snobs
The distribution coefficients are chosen so that the results are obtained from table 3.
For men, the likelihood is that he is aware of himself as an "average man":
If a Otherwise: (if p <0, p is 0),
- coefficient showing for men in the sample the degree of their feeling that they are the same as everyone else.
As you can see, the probability formula simply links the probability of realizing one's “mean” with the estimate of the number of snobs around and makes the probability value for 1 not go out. That's all. Nothing smarter. If a person is inclined to think that there are few snobs, it means that he is conscious of himself as the average p = 1. Otherwise, subtract the specific value of S from the average, take it in absolute value, multiply it by some coefficient, subtract the resulting one from the unit. Initially, the exponent was used to calculate the probability, but without it, it turned out even better
Further, if , Otherwise: where rand () is a uniform random value from 0 to 1. Normal distributionhere it simulates an anomaly for “man of the majority”, and - a random choice for a person not from the majority. Those. if a person is “average”, he has an “anomalous” distribution, if not “average” is even.
Next, we interpret R :
R [0-33] is social
R (33-66] nihilist
R (66, ...] social phobia
For women, the probability is considered that she is aware of herself as an “average woman”:
,
- coefficient showing for the women in the sample the degree of their feeling of being the same as everyone else. If a otherwise, R is evenly distributed between 0-66, and other options are skipped. This simulates an abnormally small number of women with irritation regarding social activity. We interpret R as well as for men.
As you can see, the verbal description and understanding of what is happening, in a sense, “stretched” on a simple mathematical framework. It is only necessary to imagine a little bit that these or other distributions can simulate , but that's all. It remains only to choose the coefficients to get a result similar to the original experiment.
Table 5 (real percentages from the survey in brackets)
Social
Pofigistic
Annoying
Many snobs
6% (7%)
6% (6%)
6% (7%)
Medium snobs
11% (11%)
12% (12%)
11% (8%)
Few snobs
10% (10%)
28% (26%)
10% (13%)
Women Modeled Interest
Table 6 (in brackets real interest from the survey)
Social
Pofigistic
Annoying
Many snobs
5% (5%)
4% (4%)
5% (5%)
Medium snobs
22% (20%)
19% (22%)
10% (6%)
Few snobs
12% (12%)
12% (14)
11% (12%)
Setting error (total percentage mismatch) 22% mismatch (at maximum 400) which is beautiful. The average error of the percentages in the model to the real values is 1.22%, the standard deviation is 1.16%. Chi-square remains, but with a scratch. If you combine the remnants, then pass without squeaking. In general, bearable. In fact, there is a systematic error that the “anomalous” distribution for women is not well modeled, but with this something can be done.
Conclusion
In such a straightforward way, one can quite well model a rather complex and incomprehensible behavior of people, without attracting neural networks and other black-box methods to the question. All parameters have a fairly simple and intuitive meaning that can be operated upon when fitting some other sample to this model. And then you can easily and pleasantly interpret the difference in the results. For example, consider the bias of the estimated anomalies or the difference in the feeling of people in the sample themselves the same as others.
Each of the parameters can easily be made dependent on something else and simulate deeper connections. In general, the tool seems to me extremely useful. It’s not a fact that it will reveal some depths of understanding the phenomenon, but it will definitely make you look more closely at the data and what can stand behind them.