📜 ⬆️ ⬇️

Public "Schastmatrinstva" and its small statistical study

Introduction (January 2018)


Sometimes people take on matters that they cannot handle themselves. And I am not an exception.

There is such an interesting group of VK - # of source of information ( https://vk.com/zaiki_luzhaiki ). It is one of the most enchanting sources of coarse realism. If you want to be disappointed in your family, children, husbands and anything else, you go there. The existential crisis is provided for you (at least by the fact that they write 15 posts a day there and these are real people). And, of course, this public and in many ways attractive.

At some point, my wife and I, who work as a perinatal psychologist , developed an interest in researching what is happening in this public. For example, impose banal statistical methods on the content of the public, and suddenly there is something interesting there. I especially wanted to make some loud conclusion. Say public helps people ... Or public creates hatred in people ... Or something else so expressive.
')

As a result, the number of all investigated grew.
The number of intermediate conclusions grew.
The number of graphs, tables grew.
And the amount of understanding how to evaluate it did not increase.

Intermediate conclusions carried away the fantasy to complex little based on what constructions, but to a large extent, the conclusion suggested one. Very interesting and exciting, but quite static. An endless cycle of one-on-one recurring problems that are always uniformly evaluated by the participants of the process. Some kind of endless samsara in which nothing really changes. The waves come moving and the waves go away, leaving no traces.

It remained to still sum up and write on this issue something beautiful. And on this all died. For half a year. This task proved to be very heavy. I could not, could not other people.

But something is done and it is necessary to show it. Therefore, look. It is not straightforward objectively objective and unbiased. Many of the things that are in this public cause me to reject and it feels. But you can always look only at graphs and tables, and draw conclusions yourself.

Briefly, what is in the text:



(In the text there are a number of bad words, but purely for scientific reasons, during the study of the frequency of use thereof)

Introduction (August 2017)


Group # of source of mathematics ( https://vk.com/zaiki_luzhaiki ) is an extremely interesting phenomenon of the social network era. Huge frequency of posts. An average of 13-17 posts per day. At the same time, no advertising and any distractions from the essence of the repost. Only authentic content. The concept of the group is based on anonymous publications with forbidden comments. The authors of posts - moms, tired of various circumstances of motherhood. In general, the group has quite reasonable rules for such a community and its content.

With all this, a rather strongly ideological administration allows itself to comment or embed links to its program literature like books - “Men who hate women, and women who love them” and write down the personal blog of the main creator of the group. Well, the moms themselves periodically try to correspond, inserting links to previous posts in their own. The administration, for some time struggled with this, inserting signatures after such posts in the spirit of “From the administration - you understand that this is at your own risk and risk, anyone can answer you. Be careful. ” Then she quit. In general, the process was quite active.

It would be even more interesting to follow the reactions of husbands to this. But there is no separate group response to this community, so there are no statistics. Although it is rumored that husbands bombed robustly. Especially from their gentle names in the group of "nitaka" and "my gutschina." However, this is not checked, unfortunately.

Some of these processes are: administration interventions, the use of characteristic words, mommies' communication, negative dynamics, etc. I will try here to examine rather superficially from the point of view of any numbers and simple mathematical models.

I can not say that everywhere something extraordinarily unusual and exciting has happened, but certain moments are extremely expressive.

Posts were collected from the creation of the community until August 25, 2017.

The number of words in the post


I wanted to check, and suddenly got tired of writing all this time? Suddenly, all became more concise and dull. But no. Nothing changes.



About the same average word count always. Although, if you close your eyes to emissions in the middle, it can be tentative to assume that people are becoming a bit more verbose. A little bit. Apparently reading this same group gives mothers the opportunity to use additional speech turns in describing their misfortune.

Number of posts per month


Here is our question. What activity has been in the group all this time? Maybe there are more posts? Or less? Or how? Made the simplest. We counted the number of posts per month for the entire existence of the group (the red trend obtained by approximation with the 6th order polynomial (do not ask why the 6th) ):



If, looking at the picture, we assume that in June and July 2016 there was a rather unusual decline in activity, then a quite obvious seasonality of the flow of posts of dissatisfied mummies emerges.

The most active in expressing momma's discontent in the summer. The least active in winter.
Possible explanations can be many. For example:

  1. In the winter, all the same, nothing special can be done about it, and in the summer it seems that the whole life passes by while you are sitting with your child.
  2. It is bad in winter, so there are no express reasons to rationalize it through the problems of motherhood
  3. In winter mommy ??? less give birth ???, and a fairly large flow of discontent associated with the childbirth and the fact that after them. Here about the frequency of birth for months

Choose an explanation that you like ....

Number of likes per month


To look at the average number of likes per month is quite meaningless, because the number of people in the group was constantly growing, it is clear that something similar should happen to likes. But we will see.



Not being able to get into the official statistics of the group with their ugly hands, it can be assumed that the number of users in a group changed approximately in this way. And the number of likes, in general, simply depends on the number of users in a group. But I will try to use a more cunning figure.

I believe that “the number of posts per month” Ni is a good indicator of activity. Now if we divide the average number of Li likes on Ni, we will get some tricky indicator of the type “which part of the average number of likes gave birth to one post this month”. Those. as if some assessment of the "generating capacity" of posts to produce likes.



And then an interesting thing appears. We see seasonality inverse seasonality of posts. Obviously, because we have this number of posts in the denominator. What does this tell us? This suggests or that moms may not write their posts in winter, but they read other people and like them no less actively than in summer. Or that mommies have nothing to do with it, and most people who do not write to the group like the most of them . And this seems to me the most realistic explanation.

The number of posts per month as an indicator of activity for likes does not work. And this is quite an interesting conclusion for such a group. HYIP is not created by the people who create the content of the group.

Activity by day of the week


We reasonably assumed that the number of likes is a good indicator of the number of people in a group. And, looking at the schedule of likes, it can be assumed that in the first half of 2017, there is some stabilization of the number of users. Therefore, activity on days of the week was considered in this first half of 2017, as in the stable period of the group. 0 this is monday. 6 is sunday.



Comments are almost superfluous, although it can be assumed that on Sunday, admins are scoring to spread and post most of it on Monday.

One of the alternative explanations says that the fucking one comes on the weekend, when everyone is at home and the husband demands, the child demands and no white light is visible. At the same time with her husband, of course, such posts will not be written. Therefore, as soon as in the morning one leaves for work, and the other in kindergarten / school moms sit down to write an essay in public - “how I spent the weekend.”

Administration Intervention


With dirty hands, of course, it is immediately interesting to look for who has podsarted, violated the rules (because they can) or something else they did. And the main protagonist here, of course, is the administration, which climbs with its assessments and advice on how-to-live-correctly, while not allowing others to do the same.

The administrators kindly enough allocated their statements in the posts with the entries “from adm:” or “from Demakova:”, etc. But not all of them were "inadequate." Some were just informational, like what was mentioned in the introduction, they say, you can not, do not write, be careful ...

Thus, I filtered out informational messages and left only arrogant (due to the impossibility of discussion) advice on how to live unhappy authors. And got such an interesting schedule:



Immediately, it would be obvious who wanted to play god, but he was quickly fed up with it. For six months, the fervor of sociability has faded a little. True, the last few months they have shown some activation. It can be seen summer increase in activity and captures them too.

Communion moms around the rules


Moms are not less than the administrators are eager to break something and write something extra to bypass the rules. To do this, they again kindly insert at the beginning of the post a link to the post that is answered. The easier it is for me to count all this ... Really?



Interest in communication is awakened and generated by the arrival of new users. When new users do not come, apparently it becomes equally uninteresting to respond equally to very similar complaints. Thus, the most stable period of the group is characterized by a rather sharp decrease in the number of feedback.

True, there is another option. Admins are more hard on erasing answers now.

Word frequency


It is a great anguish to try to depict the dynamics of the popularity (frequency) of words in posts. Therefore, I will leave only 2017 here, although there are certain changes in priorities from 2015. Naturally, all words are represented by their “roots”, in order to combine into one different forms one word: “child”, “child”, “child”, ...

It is worth mentioning that a child is not just a word child. These are words like children, son, daughter, etc. “Husband” is still “nitaka”, “devout”, etc. ... “Time” includes “year”, “day”, “hour”, “week”, etc. If they are not combined, these forms of words with one content fill the entire table of popular words.

Upstairs are the most popular words, down their popularity decreases.

(2017, 1)
(2017, 2)
(2017, 3)
(2017, 4)
(2017, 5)
(2017, 6)
(2017, 7)
(2017, 8)
child
child
child
child
child
child
child
child
time
time
time
time
time
time
time
time
husband
husband
husband
husband
husband
husband
husband
husband
mom
is simple
is simple
is simple
is simple
mom
mom
is simple
is simple
house
mom
mom
house
is simple
is simple
mom
could
mom
want to
house
mom
one
want to
genus
house
one
one
could
one
could
house
house
want to
could
den
den
of works
house
one
one
den
of works
house
one
want to
want to
life
den
talk
talk
of works
of works
talk
talk
of works
life


It is interesting to note, but in the initial stages of the group the “husband” did not have such significance as from 2016 and could not fall into the top three. Apparently the general, somewhat misandry discourse formed by the creators added the importance of men as causes of maternity troubles (it’s hard to imagine that over the past 2 years, husbands have really become much worse).

In general, the main problem topics of mothers are quite obvious. Lack of time, opportunities, help from the husband, unfulfilled desires, problems with work, with the house and who told what to whom.

Tag frequency


One of the important indicators of the content of the group are used hashtags. They show what topics are up in the current period. Opposite the hashtag indicates how many times it has been mentioned. Hashtags that have been used less than 5 times are not shown.

(2017, 4)
(2017, 5)
(2017, 6)
(2017, 7)
(2017, 8)
Homewarding - 52.00
Homewarding - 54.00
Homewarding - 78.00
Home Matters - 81.00
Good morning - 60.00
happy maturity - 7.00
shastyamaterinstva - 7.00
Good morning to be - 11.00
child rejuvenation - 31.00
child rejuvenation - 58.00
shastyamaterinstva - 5.00
Happiness - 7.00
Happiness - 6.00
Happy party - 9.00
Nitakoy - 6.00
Good to be married - 7.00
to be a daughter - 5.00


In principle, until the summer of 2017, hashtags were not widely used except for the hashtag of the group name in various forms. In the summer of 1917, the topic of “ rejuvenating childbirth ” became popular. The hashtag "nitaco" did not catch on.

TF-IDF


In the most frequent words, there is usually no specific subject matter. In principle, it is clear that once the group is about motherhood, then there is about moms, husbands, children and all sorts of such things. But it would be interesting to know, and what specifically worried people in different periods of the group’s existence. For this is used this very criterion for sorting TF-IDF . In this case, a variation for 6 monthly periods (windows) for calculating IFD.

I will not explain what it is, but this type is the most important thing that worries people besides the general line of the entire public during this period. Words that are very often in this month and practically nonexistent in the previous 6 months.

(2017, 1)
(2017, 2)
(2017, 3)
(2017, 4)
(2017, 5)
(2017, 6)
(2017, 7)
(2017, 8)
christmas
globally
March
vybeshiva
nitac
chaos
childbirth
childbirth
is dead
Samoyed
posed
ukat
sat down
strangle
rejuvenated
smoked
product
old
rent
knocked down
bolt
medicament
fagot
chesslov
hanging
silent
knocked down
novopass
brought
sarcasm
fire
episode
howl
zakid
stuffed
torpl
bacter
umet
thirty
scoliosis
crush
lived
diplomat
though
will call
vyaza
banged
fount
after drinking
candy wrapper
boiled over
prank
comfort
upas
suffered
remote
flat
parent
feminine
will come
climbed
hostess
flush
thick
duty
mood
huyn
wipe out
five years
hospitalized
pulse
hyperhidrosis
Bibik
intimate
leave
here you are
will go away
push
crawled
hell


It should be noted that anti-aging births have an extremely high TF-IFD compared with other words in the first places ~ 40. Approximately 10 times more than the average value of the first place ~ (3-4). Only the word “flashmob” in the spring of 2016 with a few other words reached a comparable value:


I'm afraid to even imagine what it was.

Bigrams


Popular pairs of words that occur most often.

(2017, 4)
(2017, 5)
(2017, 6)
(2017, 7)
(2017, 8)
feel like
everyday
everyday
everyday
after childbirth
everyday
me just
all day
after childbirth
everyday
Eat me
after birth
even
feel like
all day
all day
Eat me
feel like
it was necessary to
feel like
guilt
feel like
it was necessary to
all day
after birth
me just
may be
of my life
Eat me
it was necessary to
after childbirth
it was necessary to
after birth
after birth
thank God
it could be
all day
all day
recent times
right after
even
moment when
order to
most
it could be
all day
after that,
all this
all day
in a month


One feels that some routine of what is happening and the feeling of missed opportunities are clearly not happy. However, this conclusion is banal, as well as the fact that immediately after birth, there is always some kind of trash.

Purely of sports interest, it should be noted that frequent bigrams are very connected with the motif of the same frequent theme of time in texts. There are far less stable couples about childbirth and even less about husbands.

Augmented Bigrams


By themselves, the bigrams do not sufficiently reveal emotionality or context. For this, we tried for each bigram to find words that come across most closely to the most popular bigrams (plus 5 words) .

Bigram
Words that appear next to bigrams are often
feel like
[(mater, 10), (women, 7), (husband, 6), (could, 6), (terrible, 6)]
everyday
[(one, 21), (child, 17), (affairs, 14), (husband, 14), (each, 11)]
all day
[(husband, 8), (game, 6), (child, 6), (mn, 5), (ho, 5)]
me just
[(forces, 10), (could, 4), (reb, 3), (lyubl, 3), (duma, 3)]
after childbirth
[(first, 14), (year, 14), (pregnant, 13), (month, 11), (right away, 10)]
it was necessary to
[(Duma, 7), (Children, 5), (Affairs, 5), (Speaking, 5), (Mat, 5)]
all day
[(home, 10), (husband, 10), (mouth, 8), (night, 8), (child, 8)]
after birth
[(child, 28), (son, 11), (month, 10), (reb, 9), (nka, 9)]
even
[(game, 6), (bud, 5), (husband, 5), (evening, 4), (child, 4)]
Eat me
[(very, 6), (could, 6), (son, 6), (husband, 5), (one, 5)]


The number to the left of the word forms in the second column shows how many times in 2017 this word was located less than 4 words from the digram in the first column.
How can this be interpreted?

For example, the most common problem is that “every day” mother is “alone”. What can be seen from the second line. And after the "first" birth something happens "immediately."

However, confused by the abundance of "most frequent words" that are characteristic of any text in this public. To fix this a bit, we filter the most popular ones from the search for close words. Thus, we will be able to see which words are specific for these bigrams, and not for the public.

Bigram
Words that appear next to bigrams are often
feel like
[(Mater, 10), (women, 7), (terrible, 6), (happy, 6), (last, 6)]
everyday
[(every, 11), (simple, 11), (cheat, 10), (mouth, 9), (hate, 9)]
all day
[(game, 6), (mno, 5), (cartoon, 5), (descent, 4), (hands, 4)]
me just
[(forces, 10), (lyubl, 3), (duma, 3), (killing, 3), (zna, 3)]
after childbirth
[(first, 14), (pregnant, 13), (at once, 10), (hair, 9), (became, 9)]
it was necessary to
[(duma, 7), (dialect, 5), (mate, 5), (simple, 4), (neighbor, 4)]
all day
[(mouth, 8), (mornings, 7), (move, 7), (slept, 5), (yelling, 5)]
after birth
[(nca, 9), (younger, 9), (right, 5), (simple, 4), (early, 4)]
even
[(game, 6), (bud, 5), (evening, 4), (equal, 4), (sleeps, 4)]
Eat me
[(sem, 4), (simple, 4), (sign, 3), (girlfriends, 3), (feelings, 3)]


Trigrams


The most frequent triples.

(2017 4)
(2017 5)
(2017 6)
(2017 7)
(2017 8)
guilt before
in few days
instead of
love my son
immediately after delivery
strong enough to
instead of
also mother is to blame
instead of
after the first birth
be strong enough
every time when
after giving birth
every time when
the biggest mistake
need to be enough
day after birth
only when
after the second birth
I can afford
mother Mother Mother
fuss. fuss. fuss.
me most


For August, it was typical, as we can see, to write posts about childbirth, but in principle, for the entire period from mid-2015, the main topics of trigrams were:


The author and nitakoy patriarchal mimocrocodile


Of particular interest is the use of some specific words characteristic of the group and its discourse.

The femd discourse had a rather strong influence on the group, due to the ideology of the administration. Therefore, the dynamics of the manifestation of fem novels in posts is interesting. The most commonly used is the artificial authorica feminitive in relation to writing mothers.



It is interesting that this word experienced some decline in use in early 2017. Perhaps this is due to the fact that at these moments the administration did not particularly interfere in the life of the group. It is she who most often uses this word in her comments.

The word "patriarchal" is not so often used, but there is.



In general, everything hints to us that the peak of interest in this ideology was in the middle of 2016 and the very “flash mob” that was often mentioned at that time.

But there are other characteristic words taken from different contexts. For example, the word - "mimocrocodile". For those who do not understand, this word means, for example, a commentator who got into public with his very important and useful opinion. And in general, the one who walked by and said something, and would rather walk by.



The beginning and peak of the use of this word coincides with the peak of commenting on the posts of moms in the group. The word clearly arose from dissatisfaction with the results of this comment. In the future, the responses to posts became less and the word ceased to be so actively used.

Well, finally, the designation of her husband as "nitakogo."



The most beautiful schedule. He shows how a meme takes root in a group, its use becomes ubiquitous and the number of mentions of nitak starts to grow exponentially.
In general, it is worth noting that femlovs are used much less frequently and they take root worse than subject-specific group expressions.

Dynamics of negative in the group


The question arises. And how does the group influence the authors of this group? How much do they change? Maybe this group generates in the writing of anger and intolerance, which grows with the number of posts? Or, on the contrary, the realization that so many people have similar problems?

We decided to check it out. We collected a list of "bad" words. We have compiled two lists. I will give here a shortened second:

crap, fucking, dick, fag, dick, ohotorny, nikher, fuckin, pussy, chobl, shit, rubbed, fuck, fuck, dick, nahr, fuck, fuck, blah, fuck, fuck, pzdts

Then we looked at how the average number of these bad words per post varies by month.



In general, it can be seen that the number of polivalov decreases with uncertainty over time. Perhaps this is the position of the administration. But maybe not, because the administration doesn’t mind huyososit husbands, children and relatives. Maybe it just makes us all a little kinder. Or just all tired.

And how do readers rate this all? Will the post mate be more attractive? We have chosen the last 6 months (02.2017-2017) as the most stable period in history. For him, we calculated the average number of likes, depending on the number of bad words in the post.



On average, the correlation is not too convincing given the variation in estimates. Therefore, we can safely assume that if you swear as a shoemaker, you will hardly get more likes.

The most "otlaykannye" words


The question remains. And what words lead to the fact that the post is evaluated positively? It seems we have shown that any abuse is not very helpful. Then the experiment should be conducted like this.

We watched posts for the last 6 months. For each of the possible words from these posts, they remembered how many likes this post received. Passed through all the posts. For each of the words recruited some sample likes. This sample was considered average if the sample was quite large.

Thus, the words that were present ONLY in the posts, which usually gained the number of likes much more than the average, stood out:

go, discharge, give birth, man, says, should, man, years, child, cook, childhood, fuck, new, our, money, your

The spread of the “number of likes” for these words is from 370 to 440, with a total average of 290.

Least successful words


If you can check the most successful words, you can also check the words that "guaranteed" the lack of likes and the average number of likes "per word" was much less than the average.

fever, scary, tearing, hysterical, relive, refuses, cough, tantrums, face

The scatter of likes “for such words” is 214 to 230, with a total average of 290.

Words resulting in the smallest standard deviation in the estimates


But besides the words with the best and worst grades, you can still find words for which the marks for posts with these words have always been very similar.Such words, which, as it were, "guaranteed" that the assessment of fasting by people will not change much. The words that most influence the score, no matter what. Negative or positive.

she, screaming, wild, chest, only, little, suddenly, alone, her, mom, together, wanted.

The standard deviation for these words varies from 73 to 88, with an average of 190.

The concept of the perfect post


It remains to think of what plot can cause the greatest and least resonance. With a perfectly underrated post, everything is quite simple. His plot can be traced from a set of "undervalued" words quite clearly.

My sick. Temperature 39.8, cough . Refuses to eat, rolls up tantrums , throws things and terribly angry. I break down and I also have hysterics . All the time I go around the house with a discontented face . How to survive all this ?

Naturally, such a post, which will be super-underestimated and contains all the "bad" words, can be provided with more details and made more like reality, but my business is simply to convey a script that does not cause compassion in others.

And an interesting aspect of this scenario is that it is underestimated due to the fact that there is no enemy image. The child is sick and hysteria. Mom also can not stand. This is all logical and understandable, albeit unpleasant. There is no one who could be showered here. In general ... There is nothing to regret, nothing to sympathize.

With a set of good words, everything is somewhat more complicated. The ideal picture does not appear, except that there should be a husband, childbirth, discharge, money and years ... preferably lost. But you can try.

Immediately after dischargeOn the same day, the man says that he will not do it. Our apartment should be cleaned, given birth and prepared by women. At the same time, how to earn money, so he, too, is not in the business. Man , nothing to say. I spent years of my life on this monster and should give it to the child as much? Bitches nitaki, - “go on, you all are fucking .” A

clearly expressed antagonist in the form of a husband may well guarantee you quite a lot of likes. At the same time, it is obvious that practically anyone can play the role of antagonist. For example, a doctor in the hospital or grandparents.

Summary / Conclusion


The huge amount of any separate measurements made does not allow (at least for me) to write a beautiful, juicy conclusion with a global conclusion about life.

Therefore, a few unsure microscripts list:


Here, in fact, everything ... There are some methodological flaws in this all. There is no adequate comparison, for example, of a specific public dictionary with an external (or basic) dictionary. Some slightly deeper and more fun issues related to the use of neural networks and the generation of posts are also past. Again, no code examples. But it would still be more bloated, and most likely everyone will be able to count the words on the python and use nltk themselves (moreover, I’m not the best role model of a pythonist to show off the code).

If you have your own insights and interesting ideas from all this, I am always ready to listen.

Source: https://habr.com/ru/post/346554/


All Articles