📜 ⬆️ ⬇️

What is worth cleaning datastore from sessions using Mapper API

Those who included session support in their application on GAE know that sessions, first, are recorded in the datastore, and second, they do not automatically disappear from there. From rotten sessions, you must somehow get rid of yourself.

I somehow did not care about the rotten sessions, and in a year and a half they had accumulated a half million pieces. Recently, the size of the data stored in the datastore exceeded the free quota and, since 99% of the sessions were occupied, I decided to remove the foul.


')


Of course, it was a sin not to use the recently released Mapper API for this purpose. Nakoryabal simple Mapper. For a start, I decided to just count, without deleting:

public class SessionCleanupMapper extends AppEngineMapper<Key, Entity, NullWritable, NullWritable> { @Override public void map(Key key, Entity value, Context context) { Object expiresProperty = value.getProperty("_expires"); if (expiresProperty instanceof Long) { long expiresTimestamp = ((Long)expiresProperty).longValue(); if (expiresTimestamp < System.currentTimeMillis()) { context.getCounter("Session", "expired").increment(1); // DatastoreMutationPool mutationPool = this.getAppEngineContext(context).getMutationPool(); // mutationPool.delete(value.getKey()); } } } } 


And launched. Gears spun, GAE in four hands began to stir my sessions. A few hours later I found that the site is lying, exceeding the quota. I looked into the console and saw that the CPU quota was exhausted (8.5 CPU hours). I was surprised. I increased the quota and the next day I started mapreduce again, now having uncommented the lines deleting entities.

Hooray. In just 2.5 hours of absolute time, cloud megatechnologies coped with the task, consuming 22 CPU hours in the end.




Thought it out. Something is wrong in the clouds. I have not tried it, but for some reason I think that even MySQL will cope with
 DELETE FROM _ah_SESSION WHERE _expires < NOW() 

in a few minutes. Even with a million lines and even with two millions. But this is so old-fashioned, everything on one machine, without scalability and redundancy, and so on ...

Update: To remove a sessionreduce from the session datastore, it is not necessary to write mapreduce, you can also pull the magic servlet and manually iterate with the cursor. But in order to calculate how much they are rotten every day (something like SELECT COUNT (*) FROM _ah_SESSION GROUP BY _expires / (86400 * 1000)) nobody wrote the servlet, and most likely you will have to drive mapreduce, with about the same loss of a primitive DBMS.

Source: https://habr.com/ru/post/104714/


All Articles