The number of blog followers. Number of user posts posted. The number of positive and negative votes for the comment. The number of paid goods orders. Did you have to count something like that? Then, I bet that it has occasionally been lost in you. Oh well, even at VKonka got off:
I don’t know about you, but in my life the counters are almost the first problem after invalidating the cache and naming. I will not argue that I decided it finally. I just want to share with the community the approach to which I came in the process of working on Habr, Daru ~ dar, Durty, Thripster and other projects. I hope this will help someone save time and nerve cells.
I'll start with the two most common wrong approaches to counters.
Incrementally increment / decrement the counter value in all places where a change may occur (create, edit, publish, publish a post, delete a moderator, change an admin, etc.).
As well as various combinations of these approaches (for example, to increment in the right places, and, once a day, completely recalculate in the background). Why are these approaches wrong? In short, the answer is: I tried, I did not succeed.
Surely, the method described in the article is not the only one. But I came to two important principles, and, IMHO, they are applicable to all the "correct" methods:
Update one counter should occur in one place.
The following section is an attempt to explain how I came to them. Consistently, step by step, on the example of the increasingly complex requirements for the publication count. In the explanation, I will use pseudocode in Python.
The easiest option. We need a counter of all created posts.
@on('create_post') def update_posts_counter_on_post_create(post): posts_counter.update(+1) @on('delete_post') def update_posts_counter_on_post_delete(post): posts_counter.update(-1)
Now we will introduce the concept of “draft” in the project so that the user can save an unfinished post and modify it later, as in Habré. To the counter, we add the condition to consider not all, but only published posts.
@on('create_post') def update_posts_counter_on_post_create(post): if post.is_published: posts_counter.update(+1) @on('delete_post') def update_posts_counter_on_post_delete(post): if post.is_published: posts_counter.update(-1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): if post_old.is_published != post_new.is_published: # , # if post_new.is_published: posts_counter.update(+1) else: posts_counter.update(-1)
Next, we understand that deleting a post from the database without the possibility of recovery is bad. Instead, add the is_deleted
flag. Deleted posts, of course, also should not be counted.
@on('create_post') def update_posts_counter_on_post_create(post): if post.is_published and not post.is_deleted: update_posts_counter(+1) @on('delete_post') def update_posts_counter_on_post_delete(post): if post.is_published and not post.is_deleted: update_posts_counter(-1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): is_published_changed = post_old.is_deleted != post_new.is_deleted is_deleted_changed = post_old.is_deleted != post_new.is_deleted # / if is_published_changed and not is_deleted_changed: if post_new.is_published: update_posts_counter(+1) else: update_posts_counter(-1) # / if not is_deleted_changed and not is_published_changed: if post_new.is_deleted: update_posts_counter(-1) else: update_posts_counter(+1) # , if is_published_changed and is_deleted_changed: pass
Already a pretty jumbled code ... Nevertheless, we add multiblogging to the project.
The blog_id
field appears at the post, and for the blog I would like to have my own post counter.
(of course, published and unreleased). In this case, it is necessary to provide the possibility of transferring the post from one blog to another. About the total post count forget.
@on('create_post') def update_posts_counter_on_post_create(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, +1) @on('delete_post') def update_posts_counter_on_post_delete(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, -1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): # , if post_old.blog_id == post_new.blog_id: is_published_changed = post_old.is_deleted != post_new.is_deleted is_deleted_changed = post_old.is_deleted != post_new.is_deleted # / if is_published_changed and not is_deleted_changed: if post_new.is_published: update_posts_counter(post_new.blog_id, +1) else: update_posts_counter(post_new.blog_id, -1) # / if not is_deleted_changed and not is_published_changed: if post_new.is_deleted: update_posts_counter(post_new.blog_id, -1) else: update_posts_counter(post_new.blog_id, +1) # else: if post_old.is_published and not post_old.is_deleted: update_blog_post_counter(post_old.blog_id, -1) if post_new.is_published and not post_new.is_deleted: update_blog_post_counter(post_new.blog_id, +1)
Wonderful. Those. disgusting! I don’t even want to think about a counter that counts not just the number of blog posts, but the number of blog posts for each user [user_id, post_id] → post_count . And we needed them, for example, to display statistics in a user profile ...
But let's pay attention to the code for transferring a post from one blog to another. Suddenly, it was easier and shorter. In addition, it is very similar to the create / delete code! In fact, this is what happens: removing the post from the old blog and creating it on a new one. Can we apply the same principle when the blog remains the same? Yes.
@on('create_post') def update_posts_counter_on_post_create(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, +1) @on('delete_post') def update_posts_counter_on_post_delete(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, -1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): if post_old.is_published and not post_old.is_deleted: update_blog_post_counter(post_old.blog_id, -1) if post_new.is_published and not post_new.is_deleted: update_blog_post_counter(post_new.blog_id, +1)
The only drawback is that every time you save a post, the counter will be updated twice. In addition, most often wasted. Let's first calculate the increment of the counter, and then update it if necessary?
@on('create_post') def update_posts_counter_on_post_create(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, +1) @on('delete_post') def update_posts_counter_on_post_delete(post): if post.is_published and not post.is_deleted: update_blog_post_counter(post.blog_id, -1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): increments = defaultdict(int) if post_old.is_published and not post_old.is_deleted: increments[post_old.blog_id] -= 1 if post_new.is_published and not post_new.is_deleted: increments[post_new.blog_id] += 1 for blog_id, increment in increments.iteritems(): if increment: update_blog_post_counter(blog_id, increment)
Already much better. Let's now get rid of the duplication of post.is_published and not post.is_deleted
by creating the counter_value
function. Let it return 1 for a post which is counted and 0 for deleted or published.
counter_value = lambda post: int(post.is_published and not post.is_deleted) @on('create_post') def update_posts_counter_on_post_create(post): if counter_value(post): update_blog_post_counter(post.blog_id, +1) @on('delete_post') def update_posts_counter_on_post_delete(post): if counter_value(post): update_blog_post_counter(post.blog_id, -1) @on('change_post') def update_posts_counter_on_post_change(post_old, post_new): increments = defaultdict(int) increments[post_old.blog_id] -= counter_value(post_old) increments[post_new.blog_id] += counter_value(post_new) for blog_id, increment in increments.iteritems(): if increment: update_blog_post_counter(blog_id, increment)
Now we are ready to combine the create / change / delete events into one. When creating / deleting instead of one of the parameters post_old
/ post_new
just pass None
.
@on('change_post') def update_posts_counter_on_post_change(post_old=None, post_new=None): counter_value = lambda post: int(post.is_published and not post.is_deleted) increments = defaultdict(int) if post_old: increments[post_old.blog_id] -= counter_value(post_old) if post_new: increments[post_new.blog_id] += counter_value(post_new) for blog_id, increment in increments.iteritems(): if increment: update_blog_post_counter(blog_id, increment)
Super! Now back to the counting of blog posts for each user. It turns out it is now quite simple.
@on('change_post') def update_posts_counter_on_post_change(post_old=None, post_new=None): counter_value = lambda post: int(post.is_published and not post.is_deleted) increments = defaultdict(int) if post_old: increments[post_old.user_id, post_old.blog_id] -= counter_value(post_old) if post_new: increments[post_new.user_id, post_new.blog_id] += counter_value(post_new) for (user_id, blog_id), increment in increments.iteritems(): if increment: update_user_blog_post_counter(user_id, blog_id, increment)
Please note that the above code takes into account the change of the author of the publication, if it is ever needed. It is also easy to add accounting for other parameters: just add a new key for the increments
.
Moving on. On our serious multiblogging platform, ratings of publications probably appeared. Suppose we want to count not just the number of posts, but their total rating for each user on each blog in order to display "the best authors". We will counter_value
so that it returns not 1/0, but the rating of the post, if it is published, and 0 in other cases.
@on('change_post') def update_posts_counter_on_post_change(post_old=None, post_new=None): counter_value = lambda post: post.rating if (post.is_published and not post.is_deleted) else 0 increments = defaultdict(int) if post_old: increments[post_old.user_id, post_old.blog_id] -= counter_value(post_old) if post_new: increments[post_new.user_id, post_new.blog_id] += counter_value(post_new) for (user_id, blog_id), increment in increments.iteritems(): if increment: update_user_blog_post_counter(user_id, blog_id, increment)
To summarize, here is an abstract formula for a universal counter:
@on('change_obj') def update_some_counter(obj_old=None, obj_new=None): counter_key = lambda obj: ... counter_value = lambda obj: ... if obj_old: increments[counter_key(obj_old)] -= counter_value(obj_old) if obj_new: increments[counter_key(obj_new)] += counter_value(obj_new) for counter_key, increment in increments.iteritems(): if increment: update_counter(counter_key, increment)
As without a spoon of tar! The given formula is ideal, but if you take it out of a spherical vacuum into a harsh reality, then your counters can still be lost. This will happen for two reasons:
To intercept all possible scenarios of changing objects, in practice, is not an easy task. If you use the ORM providing create / modify / delete signals, and you even managed to write a bicycle that preserves the old state of the object, then calling a raw request or a multiple update by condition will ruin everything for you. If you write, for example, Postgres-triggers tracking changes and send them directly to PGQ, then ... Well, try it)
Ask questions. Criticize. Tell us how you handle the counters.
Source: https://habr.com/ru/post/311436/
All Articles