📜 ⬆️ ⬇️

Analysis of SMS texts of Megafon users

image

We analyzed 862 unique SMS texts on the frequency of the words used, distributed them into cities and grouped by recipients. The basis was taken texts that are a short time in the public domain on a popular search service.

To prevent the recipient of the message from being identified, we deleted phone numbers, texts containing passwords, and other information that could damage senders or recipients.
')
This project is interesting to us exclusively for research and promotional purposes.

Some results under the cut.


Quantitative parameters


The number of unique messages in the database: 862

Regions of sms recipients (Top-5):
  1. Moscow - 399
  2. St. Petersburg - 60
  3. Samara region - 40
  4. Orenburg region - 31
  5. Republic of Bashkortostan - 28

Period of sending: 07/07/2011 — Jul 16, 2011
Total number of words (including prepositions): 23,581
The number of non-repeating words: 5,559
The average number of words in one sms: 27.3 of them are non-repeating - 6.4

Restrictions


The sms database has certain limitations and is not considered as a representative source.

Limitations:Text messages sent via web forms differ in the average number of characters (more characters), the audience and, accordingly, the content. Web forms are most often used in the following cases:
In the issuance of Yandex, there were more than 8,000 entries, while there were less than 1,000 views available. The search results did not include all messages sent through the service, but only ranked by a search robot.

Nevertheless, the source is of interest for the study of modern writing because of the originality and privacy of the texts.

Some observations


The positive attitude of the majority of users pleases, they love each other, kiss , miss and wait , ask to write , speak and call more . They congratulate each other on the day of love, family and loyalty, happy birthday and wedding, they come more often than they leave .

On the other hand, “love” often coexists with the particle “not” , and in the texts there are words and threats that do not miss family filters.

Home for most users is above work , but money is written much more often. Good , joy and happiness conquer evil and problems . They write more often about their mother than about their wife and children , prefer more today and tomorrow , less remember what happened yesterday .

Most messages overwhelm emotions so that the number of exclamation marks and emoticons catches up with the number of letters. Texts do not differ in literacy, which becomes the norm, rare messages are written without errors.

Summary


Surprisingly, the number of "good" comments won the number of "evil" , although, at first glance, everything looked the other way around.

Source: https://habr.com/ru/post/124464/


All Articles