This article will describe only the general algorithm on the example of Facebook. However, the same approach can be used everywhere.
Task
Based on existing content (for the past 30 days) on this Facebook page, determine which entries will potentially be more popular.
Theory
The first, and perhaps the key question that I encountered is
“What is the popularity of a post and how to calculate it?” .
Usually, by the popularity of recording in the social network mean the number of likes under it. But our case is not ordinary. If you simply rely on the number of likes, then we will not be able to determine a potentially popular post before it actually becomes one.
')
Sometimes use the ratio of likes / time of publication. Thus, it is possible to calculate the increase in likes per second. And where the increase is greater, then there will be our result. In part, this approach is correct, but only in part.
The fact is that the increase in likes is not linear. And the longer the post is published, the less it likes. And we need to take into account this decline in activity.
The formula for the calculation is as follows:
R = likes / (time^β)
Where β is our damping factor. It can be calculated by the formula:
β = 1 / τ
Where τ is the time during which the growth of likes has decreased by a factor of
e .
Implementation
We will need:
- NodeJs, which will upload hourly information about posts for the last 30 days
- Database in which the status of records of the previous cycle will be stored
The algorithm is as follows:
- Download entries from Facebook. For each entry we need:
- ID
- Posting time
- Number of likes
- Increase likes per hour (this parameter will be calculated and added by ourselves)
- Find and load from the database the state of each record that was saved during the previous cycle (an hour ago)
- We calculate the decay rate individually for each record:
- Comparing the number of likes of the current and previous cycle, we calculate the growth of likes in the current hour (ΔL)
- Now that we have a growth in likes for the current (ΔL) and the previous interval (ΔpL) we can calculate the drop in activity for the current record. This can be done by the formula:
βp = 1 / ( (ΔpL - (ΔpL/e)) / ((ΔpL-ΔL)/time) )
Where time is the number of seconds between requests.
- Calculate the average rate of decline (β) for the entire page.
- We calculate the rating for each post using the formula
R = likes / (time^β)
- We calculate the average rating for the entire page.
- We select records whose individual rating exceeds the average value for a page by 2 (or more) times.
- We save all the records in the database for the next iteration. Do not forget to add a field with the value ΔL
Using this approach, we can determine those records that are most likely to cause more interest in the future. In the calculations, we take into account both the date of publication of the recording and the activity of the audience, which allows us to obtain the most accurate results.