📜 ⬆️ ⬇️

Google, where are you going my place in GMail? Do you know exactly how labels work in GMail?


I began to notice that out of the 15 gigabytes of free space provided by Google, my mail already takes up almost 12 gigabytes. And this trend does not please me.
On the other hand, I use Thunderbird with full synchronization as an email client. Those. All letters must be downloaded. So the Thunderbird folder with all the letters and indexes takes only 3 gigabytes. Although according to the logic of things, the size should not just more or less coincide with the occupied place on GMail, but be bigger, since Thunderbird does not archive letters, but stores as it is and also builds indexes to speed up the search.
Problem on face! We start to get to the bottom of the essence.

I started the fact that I entered the label (yes, in the case of GMail, it’s correct to say just the label, not the folder, the details are here ) “All Mail” and saw that I have a little more than 500 thousand messages. The situation was complicated by the fact that I have about 100 labels! And the shortcuts in GMail are typical folders in Thunderbird. How quickly I could not find the total number of letters in Thunderbird. But looking ahead, I will say that I have about 200 thousand of them in it. From here it becomes clear why there is less space on the disk.
But still the same question remains: what are these 300 thousand messages in GMail, which are not visible in Thunderbird, but occupy a place on GMail?

Inquisitiveness of the mind + the desire not to sleep at night + the desire to touch Go on a real task led me to the decision that you need to take the Go compiler, study the GMail API and see what is under the hood of GMail.
Very briefly about the impressions of Go
Only the laziest did not write about error handling in Go. Only on them I drew attention more intently.
For the rest:
  • I started writing the next evening
  • Another language
  • Life will force - I will write on Go
  • For me, C / C ++, Python, Java (and PHP too) are also languages ​​for my niches
  • I guess I'm just omnivorous.

And the article is not about Go.

As I noted above, I have about hundreds of labels. Letters usually have one label. And I wanted to find out how many letters I have labeled with each label and how many they occupy in total space.
I did not find a way to find out the sizes of labels in the GMail web-interface (the volume of letters marked with one or another label).
I rolled up my sleeves, installed the Go compiler, picked up the MongoDB container in the Docker (Yes, I am such a pervert! But this is my pet project and what I want, I use it, especially for educational purposes) and began to make shit .
Further I will refer to this this my project .
I take all my tags from GMail and add them to the Users.labels database : list :
GMailMessagesSize -importLabels -mongoConnectionString 10.211.55.5 Imported labels: 112 

I am taking the ID of all messages that are in the Users.messages: list box:
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -importMessages Processed 100 messages Processed 200 messages Processed 300 messages ....... Processed 523100 messages Processed 523115 messages 

Of course, it does not get quickly, but I could not find a parallelization here (the API does not allow).
So far we only have a list of message IDs, but we need to know about each message its labels and size. For this there is a method Users.messages: get . But it does not work quickly, even despite the fact that in the request I indicate exactly which fields interest me (internalDate, labelIds, sizeEstimate).
I did not find the implementation of Batching Requests .
But I write on Go and a sin not to use gorutiny! No sooner said than done. We pull the information in the number of threads (as we want, but I put a limit of 50). If the Internet is fast and the computer is not stupid, then we begin to quickly rest against the limit of the rate of requests from Google. The script can be stopped and continued, or you can just persistently wait, because when the limit is triggered, the gorutines sleep for 5 seconds and then continue to torment Google. Yes, it would be possible to increase the sleep time each time, for example, twice and not forget about the restriction from above. But in this case, a mere 5 seconds is quite a solution.
I have processed my 500 thousand letters in total, it seems, in about 3 hours. In general, the time sane.
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -processMessages -procNum 20 ............................Procecced 100 messages ............................Procecced 200 messages ............................Procecced 300 messages .... ............................Processed 523100 messages ............................Processed 523115 messages 

There are not only points popping up. If you rest against the limit, then instead of the S point (sleep) or maybe the message has already been deleted, then NF (NotFound).
As a result of all the suffering listed above, MongoDB has a collection of labels and a collection of messages:
 { "SizeEstimate" : NumberLong(63422), "_id" : ObjectId("5677188d2afd90a80e5e06f2"), "id" : "136b83b1ff739dec", "internaldate" : ISODate("2012-04-15T22:47:51.000+0000"), "labelids" : [ "CATEGORY_PROMOTIONS" ], "processed" : true } 

Now you have all the data at hand to start analyzing them.
At first I decided to export the information on labels, the number of messages and their total size to CSV.
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes LabelId;Label name;Messages size;Messages count Label_11;Archives;21279;4 Label_12;Archives/2012;18684;3 CATEGORY_FORUMS;CATEGORY_FORUMS;519396295;30038 CATEGORY_PERSONAL;CATEGORY_PERSONAL;5040188875;268116 CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;2990655727;36508 CATEGORY_SOCIAL;CATEGORY_SOCIAL;205976374;6553 CATEGORY_UPDATES;CATEGORY_UPDATES;2769764066;180729 CHAT;CHAT;0;0 DRAFT;DRAFT;82817;6 IMPORTANT;IMPORTANT;6600492209;159268 INBOX;INBOX;40306538;334 UNREAD;UNREAD;479586429;11678 ..... Label_97;INBOX/Coursera;6021524;151 Label_77;INBOX/;1077571;28 Label_63;INBOX/!!!;6195999;12 Label_67;INBOX/  ;1693366;11 

This is a CSV, which was convenient for me to open in Excel and study (sort and filter).

And at this stage I seriously thought. What are 6 gigs of some important (with the label IMPORTANT) messages? What is 11678 unread messages (with UNREAD label)? I (as I thought) all the messages read! Even if you enter label: unread in the GMail search bar, it displays only 106 unread messages! What's happening?

Googling this situation led to forums where others wondered why messages deleted in Thunderbird are not deleted in GMail? Well, there are many different cases. I will tell you about the sad thing, in my opinion.
')
At this point, those who use GMail exclusively in the browser may regret having started reading this article. BUT!!! You may be reading mail, including from mobile. And maybe you have a non-GMail client there. In that case, maybe you have the same problem as me!

I will not continue to torment and tell you what's going on.
Watch your hands. The sequence of events is as follows:
  1. A letter arrives at GMail
  2. The letter is assigned the labels INBOX, UNREAD and ( it’s important here ) maybe some additional label, for example CATEGORY_PROMOTIONS
  3. In the mail client, you opened the letter. The label UNREAD has appeared.
  4. In the mail client you deleted the letter
  5. Drum roll: label INBOX starred. And ... everything, nothing more
  6. Message has CATEGORY_PROMOTIONS label.

Messages labeled CATEGORY_PROMOTIONS are displayed if you type in the search: category: promotions Often do you do this?
If it is quite short, then the letters simply are not deleted! I delete them, and they remain on GMail.
Now is the time to recall the archiving of letters . And it seems that this is the case!
When deletion is configured in Thunderbird via “Mark for deletion”, then “Compression”:

And what should a daw be put in the basket:

What happens is EVERYTHING archiving !
Total: letters go to the archive. From the point of view of GMail, the archive is letters that have no visible labels and have not visited the basket.
On the one hand, nothing terrible. But letters can always be found through a search.
And what if I do not want it? What should I do now?
How to find and delete all messages from the archive? Here is a good answer. But I did not dare to delete everything at once.
By the way, in the search bar I did not find a way to show messages that have only one specific label. Those. for example, I decided to delete all messages that have the CATEGORY_PROMOTIONS shortcut and no other. I definitely do not need these promotional letters in the archive. By the way, how many are there?
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes -l CATEGORY_PROMOTIONS -onlyThisLabel LabelId;Label name;Messages size;Messages count CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;1197364170;14618 

I have them there for gigabytes accumulated.
-onlyThisLabel is an important option that just allows you to find only those messages that have this single label.
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -showSizes -l CATEGORY_PROMOTIONS -l IMPORTANT -onlyThisLabel LabelId;Label name;Messages size;Messages count CATEGORY_PROMOTIONS;CATEGORY_PROMOTIONS;1197364170;14618 

Yes, I have another half a gigabyte of "important advertising" messages :) Please note that this is in addition to just a gigabyte of unimportant advertising.
Hands immediately itched to remove it all!
 GMailMessagesSize -mongoConnectionString 10.211.55.5 -deleteMessages -l CATEGORY_PROMOTIONS -l IMPORTANT -onlyThisLabel -procNum 10 

In fact, letters are not deleted, but are placed in the basket. There, after 30 days, they will either be removed completely, or you can go and manually clean it yourself.

TOTAL: If you delete messages not through the GMail web interface, but through a third-party client (possibly mobile), then there is a possibility that your messages are not deleted, but archived. For some, it is even good. And for someone, this leads to the fact that the box just swells indecently.
And it's not even 2 bucks a month. You can eat 100 gigs and more. I just wanted to understand the essence of the issue.

ATTENTION!!! The project was written for himself. This is my first Go program. For the safety of your letters, I do not answer! But if you do not use the -deleteMessages option, then nothing will happen to your mailbox.
What can I do to make the application work?
  • Click here for the Google Developers Console and automatically turn on the API. Click Continue, then Go to credentials.
  • At the top of the page, select the OAuth consent screen tab. Select an Email address, enter a Save Name button.
  • Select the Credentials tab, click the Add credentials button and select the OAuth 2.0 client ID.
  • Select the application type Other, enter the name "Gmail API Quickstart", and click the Create button.
  • Click OK to dismiss the resulting dialog.
  • Click the (Download JSON) button.
  • Move this file to your client directory and rename it client_secret.json.

Source: https://habr.com/ru/post/273701/


All Articles