With the birth of the Internet and search engines, new opportunities have emerged for the study of human behavior. Every time a user makes a request in the search line, he shares a grain of information about his life and interests.
Summarizing requests from millions of people, we can identify trends, dynamics, and some interdependencies as a result of changes in the interests of certain groups of people.
Such studies provide space for the activities of scientists, and for the growth of the efficiency of search engines. For example, by analyzing the interdependencies of user requests in a demographic slice, you can improve search suggestions. This technology is already used
in the gregstah of Poisk.ca.Ru .
')
Search engine developers pay a lot of attention to analyzing search query logs. But often such studies are limited to analyzing user requests within a single search session (usually lasting up to 30 minutes). Or, during the analysis, more attention is paid to the frequency of requests and generalized data on their popularity. In this case, as a rule, researchers consider this information in a short-term cut. However, human life is multifaceted and people's interests change over time. Therefore, it is a retrospective long-term analysis of search query logs that opens up truly rich research opportunities.
We examined the anonymous search logs of several million users over the past year and concluded that:
• Searches have a long-term effect.
• change over time
• Affect each other
• Interdependent
• Differ by gender and age characteristics.
Indicative studies of the relative dependence of requests, which are well illustrated by graphs.

The horizontal scale is time. The zero day is the day when the initial request “a” is completed, in relation to which we are investigating the correlation of the request “b”. On the left are the 250 days preceding the request, on the right are the days after it.
The vertical scale is “user interest”. It shows the relative likelihood of the request “b” being made by the average user who searched for the original request “a” on day zero. Interest may be reduced (less than one) or increased (greater than one). Unit - shows "average temperature", which usually means the lack of communication between requests.
For example, we see that the correlation between the “news” and the “wedding” request is close to zero:

And between the "wedding" and "dress" - is high. And users usually look for a dress either a few days before or immediately after they were looking for a "wedding":

In addition, you can analyze the relationship of various interests, for example, "Dresses" (orange graph) and "Rest" (blue graph) for the Wedding (zero day):

The graph shows that people are interested in rest on average 70 days before and 150-200 days after thinking about the wedding, while interest in dresses and wedding occurs almost the same day.
Today we talked a little about the method and its applicability, as well as how the results of graphs are interpreted. In perspective, an analysis of long-term user logs will help improve the relevance of search results. The system, based on this data, will be able to better understand what problem the user is solving by entering “short”, monosyllabic requests, and will offer the appropriate answers to his problem.
In the following posts, we will publish examples of some interesting research conducted by search engines Poisk@Mail.Ru
We hope you were interested.
We recommend to read on the topic:
Learning about the World through Long-Term Query Logs , MATTHEW RICHARDSON, Microsoft Research
Thanks for attention!
Search Team@Mail.Ru