📜 ⬆️ ⬇️

The story of how I am a master parser for the diary

A year ago, I started writing bots for everyone’s beloved Telegram. On Python, of course. And recently, my son went to school, where, as it turned out, there was an electronic diary called MRKO . As you might have guessed, the very first thought is to make a bot (for personal use so far), which could send appraisals, homework and comments to the Telegram. Who cares - I ask under the cat.



We write a parser


First, of course, you need to write a parser for the diary itself. For those who do not know, I will explain. The login system is like this: the student / parent enters the mos.ru portal, logs in to it, and already from this portal enters the main mrko.mos.ru. You might think - why not just immediately enter to mrko.mos.ru? The problem is that the server responds with this message:


Entry for parents and students is made only from the website of the Moscow Government Services Portal.

Here it turns out the main snag. Understandably, you need to make as few requests as possible so that the response rate is greater.


Researching the outgoing traffic sniffer I realized that a GET request to https://mrko.mos.ru/dnevnik/services/index.php?login=&password=__MD5 occurs first, put the necessary cookies and then you can go to https://mrko.mos.ru/dnevnik/services/dnevnik.php?r=1&first=1 . I started by importing my favorite HTTP library in Python - Requests. Next - create an elementary session:


 import requests def diary(): session = requests.Session() headers = {'Referer': 'https://www.mos.ru/pgu/ru/application/dogm/journal/'} auth_url = "https://mrko.mos.ru/dnevnik/services/index.php" auth_req = session.get(auth_url, headers=headers, params={"login": , "password": __MD5}, allow_redirects=False) 

Just want to draw attention to the header Referer. As it turned out later, it must be indicated, otherwise the diary will not allow us to enter, thinking that we have entered directly. I need to disguise ourselves, as if we entered with mos.ru. Now the main request for the diary:


 main_req = session.get("https://mrko.mos.ru/dnevnik/services/dnevnik.php?r=1&first=1") 

Parse the data


Fine. In the diary went. Now the most difficult part is parsing the data. For this plain I decided to use BeautifulSoup, because had dealt with him before.


 from bs4 import BeautifulSoup parsed_html = BeautifulSoup(main_req.content, "lxml") 

Not Long digging with the Chrome Developer Tools in the diary DOM tree, computed a div with the necessary information.


 columns = parsed_html.body.find_all('div', 'b-diary-week__column') final_ans = [] 

Now we have an array of data for each day of the diary, starting from Monday and ending on Saturday, and an empty array with the final data. Obviously, for traversing an array I use a for loop:


 for column in columns: 

Again, I found items with the information I need. Namely: the day of the week, the number, homework, grades and comments on the lessons. It turned out like this.


 date_number = column.find("span", "b-diary-date").text date_word = column.find("div", "b-diary-week-head__title").find_all("span")[0].text 

Now we write down the date data and iterate over each "cell" in the table.


 lessons_table = column.find("div", "b-diary-lessons_table") all_lists = lessons_table.find_all("div", "b-dl-table__list") for lesson in all_lists: lesson_columns = lesson.find_all("div", "b-dl-td_column") lesson_number = lesson_columns[0].span.text lesson_name = lesson_columns[1].span.text #    ,  if lesson_name == "": pass else: lesson_dz = lesson_columns[2].find("div", "b-dl-td-hw-section").span.text lesson_mark = lesson_columns[3].span.text[0:1] lesson_comment = lesson_columns[4].find("div", "b-dl-td-hw-comments").span.text final_ans.append( "<b>{0}. {1}</b>.  :\n" "<i>{2}</i>\n" "  : <i>{3}</i>\n\n".format(lesson_number, lesson_name, lesson_dz, lesson_mark)) final_ans.append("\n-------------------\n\n") 

As a result, we have a parser, which can produce something like this:




Well, that's all. Thanks for reading. I hope I saved you a lot of time. I will write the following article about the integration of this parser with the Telegram-bot.


Links



')

Source: https://habr.com/ru/post/323856/


All Articles