📜 ⬆️ ⬇️

Sending important messages from VKontakte by email

Good day!
Surely, many were faced with the situation that important information (news, announcements, etc.) is published by VKontakte. But firstly, it is not always possible to get there (generally it’s indecent to sit on VKontakte during working hours), and secondly, you have to receive information by polling, that is, constantly updating the group page or something similar.
From here a wonderful thought was born - it would be convenient for important notifications to arrive in the mail. And at work you can look and convulsively press F5 is not constantly updated. As it turned out, using python you can easily cope with such a task.

Attempt # 1: VK API


For starters, I tried to be honest and use VK API. On the net, I even managed to find a couple of libraries that knew how to log in and perform functions from the API. Unfortunately, none of them did not suit me, so I managed to build my bike in a couple of hours. Okay, it's done, but then I came across an unpleasant moment, namely, using the current version of the API, it is not possible to get messages from the group wall (or I did not find how to do it, which is also likely). There is only one option - to parse the VK pages independently. On the one hand, this is not very legal, on the other hand, this approach makes it possible to obtain any information that I can see directly in the browser.

Attempt # 2: parsing the page directly!


Login to the site

First of all, I tried using httplib and urllib to get the login page. Everything is wonderful and beautiful. It just turned out that I would have to write a lot of ugly code, and even work with cookies ... And somehow it saddened me very much. I started looking for a replacement. And I found a wonderful mechanize library, which did a wonderful job for me all the uninteresting work on creating connections, processing sessions and cookies, etc ...
So, with the help of mechanize we get the main page of VKontakte:
def initVK(): # Browser br = mechanize.Browser() # Cookie Jar cj = cookielib.LWPCookieJar() br.set_cookiejar(cj) # Browser options br.set_handle_equiv(True) br.set_handle_gzip(True) br.set_handle_redirect(True) br.set_handle_referer(True) br.set_handle_robots(False) # Follows refresh 0 but not hangs on refresh > 0 br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1) # Little cheating... br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] br.open('http://vkontakte.ru') br.select_form(nr=0) br.form['email'] = EMAIL br.form['pass'] = PASSWORD br.submit() return br 

As an explanation, I will say that on vkontakte.ru the first form is the login form. With the help of mechanize fill it and voila, we logged on to the site!
We receive important messages from the wall

The following code will allow us to get a group page:
 def getGroupHTML(br): br.open('http://vkontakte.ru/OUR_GROUP') html = br.response().read() return html 

Now we will directly parse the received html-code, in order to find the necessary messages in it.
For this we need the HTMLParser library. Let's create our own parser class, which will be inherited from HTMLParser.
For simplicity, we will look for messages that start with some kind of pattern (in my script I used '@ year2007').
 class MyHTMLParser(HTMLParser): def __init__(self): HTMLParser.__init__(self) self.recording = False self.export_tag = False self.message = unicode('') def handle_starttag(self, tag, attrs): if tag == 'div': for name, value in attrs: if name == 'class' and value == 'wall_text': self.export_tag = True if name == 'class' and value == 'wall_post_text': self.recording = True def handle_endtag(self, tag): if tag == 'div': if self.recording: self.recording = False year = re.compile(PATTERN) if year.match(self.message): message_queue.append(year.sub('', self.message).strip()) self.message = unicode('') if self.export_tag: self.export_tag = False def handle_data(self, data): if self.recording: self.message += unicode(data, 'CP1251') 

All text comes in CP1251, translate it to unicode. Layers with the wall_text classes are responsible for the messages, wall_post_text - for the message text itself.

Now the code for receiving messages can be wrapped in an infinite loop. In order not to send messages to the mail every pass, you can arrange a message queue or try to parse the time. For simplicity, let's make a queue.
It is also worth noting that there may be other tags in the text, for example, links. They can also be processed by cutting out unfortunate away.php. But these are details.
 import codecs message_queue = [] try: f = codecs.open('/tmp/vk-last-message', 'r', encoding='utf-8') last_message = f.read() f.close() if len(last_message.strip()) == 0 : last_message = PATTERN except: last_message = PATTERN import time browser = initVK() import mymail while True: #print "Getting vk.com pages" html = getGroupHTML(browser) p = MyHTMLParser() p.feed(html) #print message_queue msgSent = 0 for msg in message_queue: if msg == last_message : break #messageForSend = processMsg(msg) print msg mymail.sendMessage(msg) msgSent += 1 if len(message_queue) > 0 and msgSent > 0 and len(last_message.strip()) > 0: last_message = message_queue[0] f = codecs.open('/tmp/vk-last-message', 'w', encoding='utf-8') f.write(last_message) f.close() #print "last message: " + last_message message_queue = [] #print "Sleeping..." time.sleep(60) 

Sending a message to the mail

Now reveal the mystery of the module mymail.
 import smtplib from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText def sendMessage(text): if len(text) == 0: print "Empty message" return fromaddr = FROM_ADDR toaddrs = LIST_OF_RECEPIENTS #text = 'test message' msg = MIMEMultipart('alternative') msg['Subject'] = "year2007@vkontakte" msg['From'] = fromaddr msg['To'] = toaddrs mime_text = MIMEText(text, 'plain', 'utf-8') msg.attach(mime_text) # Credentials (if needed) username = USER password = PASSWD # The actual mail send server = smtplib.SMTP('SMTP_SERVER:SMTP_PORT') server.starttls() server.login(username,password) server.sendmail(fromaddr, toaddrs, msg.as_string()) server.quit() 

The simplest code to send messages. I used the smtp server of Yandex: smtp.yandex.ru en87. The list of recipients can be read from the config or zhardkodit one mailing address, as it was in my case.
')

What happened in the end


At the exit we have:
  1. important messages come to us in the mail, that is, you do not need to go to VKontakte and update the page yourself
  2. experience parsing pages
  3. self satisfaction and pride

It is worth saying that this code is the basis, you can draw a lot of things on it. For example, for GUI lovers, you can make a login and password entry box. You can also parse messages by the author's name (for example, send all messages on behalf of a group), process tags within messages, and so on, so forth.
Experience shows that you should not overstate the page refresh rate, anyway, all messages will be received, and for excessive activity they can punish a temporary ban.

That's all. Thanks for attention!

UPD. Transferred from VKontakte to Python.

Source: https://habr.com/ru/post/129224/


All Articles