
We analyzed 862 unique SMS texts on the frequency of the words used, distributed them into cities and grouped by recipients. The basis was taken texts that are a short time in the public domain on a popular search service.
To prevent the recipient of the message from being identified, we deleted phone numbers, texts containing passwords, and other information that could damage senders or recipients.
')
This project is interesting to us exclusively for research and promotional purposes.
Some results under the cut.
Quantitative parameters
The number of unique messages in the database: 862
Regions of sms recipients (Top-5):
- Moscow - 399
- St. Petersburg - 60
- Samara region - 40
- Orenburg region - 31
- Republic of Bashkortostan - 28
Period of sending: 07/07/2011 — Jul 16, 2011
Total number of words (including prepositions): 23,581
The number of non-repeating words: 5,559
The average number of words in one sms: 27.3 of them are non-repeating - 6.4
Restrictions
The sms database has certain limitations and is not considered as a representative source.
Limitations:
- only sms sent via web forms were included in the sample;
- the sampling technique is not random and is based on Yandex algorithms;
- The total number of sent sms for the period is not known.
Text messages sent via web forms differ in the average number of characters (more characters), the audience and, accordingly, the content. Web forms are most often used in the following cases:
- cost savings;
- no or locked cell phone;
- to preserve anonymity;
- Other: out of habit, long type a long text, more convenient.
In the issuance of Yandex, there were more than 8,000 entries, while there were less than 1,000 views available. The search results did not include all messages sent through the service, but only ranked by a search robot.
Nevertheless, the source is of interest for the study of modern writing because of the originality and privacy of the texts.
Some observations
The positive attitude of the majority of users pleases, they
love each other,
kiss ,
miss and
wait , ask to
write ,
speak and
call more .
They congratulate each other on the day of love, family and loyalty, happy birthday and wedding, they
come more often than they
leave .
On the other hand,
“love” often coexists with the particle
“not” , and in the texts there are words and threats that do not miss family filters.
Home for most users is above
work , but
money is written much more often.
Good ,
joy and
happiness conquer
evil and
problems . They write more often about their
mother than about their
wife and
children , prefer more
today and
tomorrow , less remember what happened
yesterday .
Most messages overwhelm emotions so that the number of exclamation marks and emoticons catches up with the number of letters. Texts do not differ in literacy, which becomes the norm, rare messages are written without errors.
Summary
Surprisingly, the number of
"good" comments won the number of
"evil" , although, at first glance, everything looked the other way around.