Reader problems. Is paradise somewhere near?

Each of us in the network has a set of sites on which we constantly go to draw new, interesting information. I tried to start using the RSS aggregator many times and all the time refused this idea. Having patiently laid out all the tapes for my daddies, I soon realized that I didn’t get what I expected. If you wanted to read about .NET, then the articles were mixed with other, unrelated to the topic. Sometimes you just want to read about information technology, but alas, there was no such section in my rubricator. After a while I forgot about the reader.

With the advent of social networks, the hope arose that all the power of the social graph will help in solving the problem of finding interesting materials. But if there is no special need to make "virtual friends", then you are on the edge of this graph, and the information does not reach, and if it does, it is very late. Standing aside from the place where “life is in full swing”, only echoes of loud phrases are heard. You try to approach, but then it begins to flood with a pile of unnecessary information and hope for getting what you want is lost. People need to share photos, chat with friends. They post photos of dogs not to please their loved ones, but to get a liking to the treasury of their social capital. It is a deceptive impression that a tiny percentage of people in our time are striving to gain new experience and new knowledge.

If we consider the usual social graph, then the situation is as follows. If node A publishes some content that is interesting to node B, then node B will receive it if only a chain of network nodes connecting A and B exists, each node of which has "approved" the publication.

')
In order for you to get more information, you need to have more social connections. But if there are too many, it becomes expensive. Therefore, some turned their eyes towards the graphs of interests.

Hope of paradise

In the open spaces of the network, the “model of arbitrary typed objects connected by arbitrary typed links” was considered. Including was an attempt to understand what can be built on this. On Habré, the idea of an interest graph was discussed, which is a special case of this model. The bottom line was that all interests (tags) are always unique, so the entire relevant audience focuses around them. Next, the problem was to order everything related to a specific tag. The authors of these publications, outlined the principles by which this interaction will be built using the proposed ideology. Recently, many have taken steps towards this concept, including the main social networks. With a genuine interest in watching the development of services implementing this idea, like many, I had my own view on the problematic, which I wanted to share.

General wishes for the development of such networks can be expressed by the following theses:

In such networks there will never be drunk photos, or, more precisely, they will, but they will be seen only by those who want it themselves;
In such networks will discuss useful things and solve immediate problems;
What matters is what you write, not how many subscribers you have;
We will decide for ourselves what will be in our tape, not our social connections;
Any point of view has the right to exist. Anyone can accept it or not accept it, and technology should create a choice for the person, which way to go;
There is no spam - the “self-moderated” community very quickly makes spammers go beyond the relevance of the request.

The image of an “ideal social network” is a “window” into a multidimensional information space. This “window” is positioned in such a way as to provide a person with a cut of the information that meets his current needs. The needs in this context can be divided into two informational streams - updates from the spheres, with whom there is constant contact and updates in areas that are interesting to the user. New social connections are formed through local communication circles. The interlocutors see each other's activity in the context of overlapping interests. Private data play a smaller role. It is important who you are, not how old your dog is, and where you are at the moment.

Is paradise here?

If we consider the steps of the main social networks in this direction, then this is:

Hashtags;
Interest pages;
Groups

In fairness, I must say that hashtags implement similar functionality, but quoting a comment to one of the articles - “if I want to read what they write about isomorphically-palliative dissonance, then what hashtag should I search on Twitter for?” Many people have the feeling That tags are useful and powerful, but something is missing. The main problem of tags is the multiplicity of writing a tag with one value (including in other languages), and vice versa - the same writing of tags with a plural value.

About groups need to say separately. It is possible to project groups on interests, in fact this is the same interest to which activity is tied - this is the prototype of the group of “like-minded people”. And if you are not comfortable in the global interest group, you can "go down deeper." But each group has a critical mass of participants. If there are more people, this becomes the cause of the “destruction” of a large group and / or the formation of smaller groups of identical subjects. Many people recall the usenet conferences, as well as the fido: “When there were few people there, it was interesting, when the number of people grew, the best ones left.”

The problem with groups is that they have clear boundaries. You are either a member of the group or not. The publication belongs to the group and does not leave its boundaries. Ideally, the publication should relate to a particular topic / interest / group with different weights. Different things interest a person to a different degree, besides, there may be many interests, and it is difficult to look at groups for each interest. Interest-based can generate a single tape for the user.

It should also be noted that there are connections between groups and their relationships with each other. This pushes some to study this phenomenon and use it in their projects. Just such relationships fit well into the model of the graph of interests and like-minded people, when there are groups of like-minded people within the same interest, and the connections between them are made through "borderline" (common) users. Groups “approximate” such connections, while interests with like-minded groups represent them more harmoniously. Another problem of the groups is that the members of these communities do not know anything about other communities and do not want to hear anything about them. They perceive the new community as an encroachment on their territory, which hinders the exchange of valuable information.

Analyzing the above, we can conclude that the groups in this case do not meet expectations. If something serious is needed, then, as a rule, it ends with tough moderation and totalitarian measures on the part of the group administrators, banning entries, publications. While interests create end-to-end connections. Interest (marker, tag) harmoniously realizes the possibility of specifying a search among unrelated hierarchies.

We will build a new paradise

We will not go deep into the wilds of the description of the theory, if we briefly imagine a graph in which peaks are people and interests, and edges are the fact of the user's attention to a particular topic (interest).

This approach is implemented in many services inspired by this model. Fasting in the system is tied to a certain set of topics and spreading along the links between all the “fans” of interests. Thereby ensuring the "delivery" of content on the network.

In this system, several important questions arise:

How to maintain a base of interest?
How to rank content under one interest?
Invalid publication interests.
Incomplete specified publication interests.

Base of Interest

You can give the base to those at the mercy of users. As practice shows, this leads to chaos, the base is overflowing with duplicates and meaningless topics.
You can fix the catalog and fill it with the power of service administrators as needed. This entails the problem of initial content, the difficulty of adding new topics, and consequently the development of the resource.
The best way out seems to be based on an existing rubricator. Wikipedia immediately comes to mind. This is a well-structured knowledge base, which is ideally suited for the role of a rubric catalog. Each of its articles is a rubric to which a post can be attached.

Content ranking by one interest

Content ranking is a rather complicated topic with a sufficient number of articles written on it. If we recall groups in social networks, the emergence of groups of similar / identical subjects can be viewed as a kind of attempt to rank content within the network under one interest.

Some topics may be perceived differently by people. The issue of relevance is rather complicated. Different people want to see different content in the same interest. Suppose I choose the “.NET Framework” theme. Personally, I am not interested in entry-level articles on how to write Hello World. I would like to cut off these materials and get in the issue were more or less interesting to me. Since there are many sources, they all contain content of different value, I would like to get a tool to help with searches for really interesting publications.

Solving by the majority (hospital average) is a bad option, since each of us has different requirements for the quality of the content and outlook on life. Therefore, in this case, various algorithms of recommendations and issuance personalization will be better suited, on the subject of which quite a lot of material has been written. I will say right away that I am a supporter of the application of this approach, I believe in mathematics and big data. But on the other hand, there is a skepticism about these technologies. In particular, there is the concept of "filter bubble", which received sufficient support.

This is how Wikipedia describes the bubble filter. The concept developed by Eli Paraiser is a phenomenon in which websites use selective guessing algorithms, what information the user would like to see, based on information about his location, past clicks and mouse movements and his search history. As a result, websites only show information that is consistent with the past views of a given user. This is similar to a phenomenon in which people and organizations look for information that initially seems to be correct, but it turns out to be completely useless or almost useless, and avoid information that seems and perceived by them as incorrect and insignificant, but turns out to be useful.

Parasiser, in her book The Filter Bubble, warns that a potential drawback to filtering search queries is that it “closes us from new ideas, items and important information” and “creates the impression that our narrow self-interest is all that exists and surrounds us. ” This brings potential harm to the individual as well as to society as a whole. Freedom of choice is very important - the person himself finds the information and determines its usefulness.

I really liked the comment to one of the articles on the topic of personalization: “So we would never have met our contented ones. Would live with hackers like us. ” However, the US temporary release commissions use a special computer algorithm that, based on 50–100 factors, recommends what to do with a particular prisoner. In my opinion, the problem is not in the personalization itself, but in the absence of controls on this tool (except how to turn it on / off).

It is possible to assume that the information of interest to the user in such a network will reach him faster. The user will receive quality content (in his personal opinion) more likely.

Invalid publication interests

Some users, seeking to maximize the distribution of their publications, form a redundant set of irrelevant topics. By this they destroy the very essence of the project. We need an effective tool to protect the service from becoming a garbage.
As Habr's practice shows, there must be people who will “collect” and organize information. But ideally, I would like to give the task to the mercy of the algorithms.

Incompletely indicated publication interests

The main task in implementing the concept of "graph of interests" is to make so that any information is tied to all the interests to which it relates, but at the same time that these interests have different weight relative to each other. And so that this weight of interests would form a more accurate delivery of information for those who are really interested in it. In this case, automatic content categorization can help.

Epilogue

Each of the above tasks deserves a separate discussion. I must say that they with different degrees of success are solved by various projects, including those mentioned on Habré.
In the future, I would like to talk about my decision. To share thoughts on the difficulties of implementing each of them, both technical and algorithmic.
Thank you for taking the time to post.

Publications on the topic

Source: https://habr.com/ru/post/201230/

All Articles