⬆️ ⬇️

As I learned that my visa is not ready, the message in Slack

Post relevant for the May holidays. 6 weeks ago I applied for a visa to Ireland. Departure is scheduled for April 30th. There is an embassy website that publishes lists of visa decisions . They do this on Mondays and Thursdays. And here I am sitting on Sunday, April 28th, on my visa there is no decision yet. And my further actions on Monday depend on whether my statement is in the new report or not. If not, then it will be necessary to go to the embassy and understand. If there is, then pull the visa application center. Sitting on the page and updating the whole day on Monday seemed like a dull pastime, so I wrote a script in Python.





Disclaimer. I'm not a programmer, but I can program. This means that I cannot write an elegant and efficient code, but I can make this barrel organ do what I need from it.



1. Check page for a new report



So what needs to be done:



  1. Need to parse this page.
  2. Find among the reports a new one by a new date (in my case, you can search by the word 24 April ).
  3. Get a link to this report.


The function code looks like this:



 def check_report(url, div_class, date): embassy_page = requests.get(url) page_text = BeautifulSoup(embassy.text, 'lxml') tags = page_text.findAll('div', {"class": div_class}) text = '' report_url = '' for tag in tags: tag_soup = BeautifulSoup(tag.text, 'lxml') report = s(text=re.compile(date)) if len(report) > 0: text = 'New report published' report_url = 'https://www.dfa.ie' + tag.find('a').attrs['href'] return text, report_url 


Now more in detail what happens in this code.



For a start, we use the requests library, which helps us download the required page. Then, we use another library BeautifulSoup , which helps turn this wild page layout into a more beautiful and comfortable look.



Before using BeautifulSoup :



 '<!DOCTYPE html>\r\n\r\n<html lang="en">\r\n <head>\r\n <META http-equiv="X-UA-Compatible" content="IE=edge">\r\n <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\t\r\n\t<meta name="viewport" content="initial-scale=1">\r\n \r\n \r\n \r\n \r\n \r\n <!-- Static Meta data -->\r\n \r\n\t<!-- <meta name="DC.Creator" content="Department of Foreign Affairs" />\r\n\t<meta name="DC.Publisher" content="Department of Foreign Affairs" /> \r\n\t<meta name="DC.Format" content="text/xhtml" /> \r\n\t<meta name="DC.Copyright" content="All material (c) copyright 2012 Department of Foreign Affairs" /> \r\n\t<meta name="DC.Source" content="Department of Foreign Affairs" /> \r\n\t<meta name="DC.Language" content="en" />\r\n\t<meta name="DC.Author" content="Department of Foreign Affairs" /> -->\r\n\r\n\r\n<meta name="author" content="Department of Foreign Affairs">\r\n<meta name="google-site-verification" content="HHtulupgM8GXpd9YYDjoXUb6MiU7_mGTkHixUrVPFYQ" />\r\n \r\n\t<title>Weekly Decision Report - Department of Foreign Affairs and Trade</title>\r\n <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" /> \r\n <link rel="stylesheet" type="text/css" media="screen" href="/media/dfa-2017/style-assets/css/font-defs.css" />\t<!-- 2017 font-defs.css --> \r\n\t<link rel="stylesheet" type="text/css" media="screen" href="/media/dfa-2017/style-assets/css/style.css" />\t<!-- 2017 style.css -->\r\n <link rel="stylesheet" type="text/css" media="print" href="/media/dfa-2017/style-assets/css/print.css" />\t<!-- 2017 print.css -->\r\n 


After:



 <!DOCTYPE html> <html lang="en"> <head> <meta content="IE=edge" http-equiv="X-UA-Compatible"/> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="initial-scale=1" name="viewport"/> <!-- Static Meta data --> <!-- <meta name="DC.Creator" content="Department of Foreign Affairs" /> <meta name="DC.Publisher" content="Department of Foreign Affairs" /> <meta name="DC.Format" content="text/xhtml" /> <meta name="DC.Copyright" content="All material (c) copyright 2012 Department of Foreign Affairs" /> <meta name="DC.Source" content="Department of Foreign Affairs" /> <meta name="DC.Language" content="en" /> <meta name="DC.Author" content="Department of Foreign Affairs" /> --> <meta content="Department of Foreign Affairs" name="author"/> <meta content="HHtulupgM8GXpd9YYDjoXUb6MiU7_mGTkHixUrVPFYQ" name="google-site-verification"/> <title>Weekly Decision Report - Department of Foreign Affairs and Trade</title> <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/> <link href="/media/dfa-2017/style-assets/css/font-defs.css" media="screen" rel="stylesheet" type="text/css"/> <!-- 2017 font-defs.css --> <link href="/media/dfa-2017/style-assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/> <!-- 2017 style.css --> 


With this you can now somehow live and work. In particular, in my case we will look for a special div class , which is used for links to reports. We can see this from the source code of the page: <div class="gen-content-landing__block"> . In my code, we are looking for all such tags.



Next, we go over the collected tags and look for the one that contains the date of the new report: 24 April . If such a result is found, then we pull out a link from it and form the text that the new report is published.



2. Search visa id in the new report



So what needs to be done now is here:



  1. Download this new report.
  2. Parse pdf-file.
  3. Find in it my id.
  4. Find the status corresponding to it.


The function code looks like this:



 def check_visa(report_url, filename, visa_id, text): pdf = requests.get(report_url) file_path = Path(filename) file_path.write_bytes(pdf.content) pdfFileObj = open(filename, 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj, strict=False) for pageNum in range(0, pdfReader.numPages): page = str(pdfReader.getPage(pageNum).extractText().encode('utf-8')).split('\\n') if visa_id in page: visa_index = page.index(visa_id) status = page[visa_index + 1] text = text + '\t' + visa_id + '\t' + status return text 


With the help of the requests library again, we make a request to this report. Next, save it locally. Using the PyPDF2 library, PyPDF2 read the file. Then we visa_id over its pages and look for a visa_id in the array of tokens. The markup of this pdf-file is such that the next token after visa_id is directly the consideration status: Approved or Refused . Next we concatenate the existing text with id and status.



3. Sending status by message to Slack



Good. Script found, let's say, my id, but I need to notify me about it somewhere. We use Slack as a messenger in our company, so I thought it would be convenient for me to get a notification there.



By this link you can customize your webhook. There you can select a channel or addressee, who can post a message (perhaps, for this step you will need to be a WorkSpace administrator). There you will also receive a unique webhook url that can be used in the code.



 def send_to_slack(webhook_url, text): post = {"text": "{0}".format(text)} json_data = json.dumps(post) req = requests.post(webhook_url, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'}) return req.status_code 


Using the same requests library, we make a POST request with the text content at the webhook address.



4. Using functions



The rest of the code looks like this:



 url = 'https://www.dfa.ie/irish-embassy/russia/visas/weekly-decision-report/' div_class = 'gen-content-landing__block' date = '24 April' filename = 'weekly_report.pdf' visa_id = '38644112' webhook_url = 'https://hooks.slack.com/services/...' text, report_url = check_report(url, div_class, date) if text != '': text = check_visa(report_url, filename, visa_id, text) print(send_to_slack(webhook_url, text)) 


We assign values ​​to all necessary variables and then run the prescribed functions.



5. Start according to plan



OK. I wrote the script, but if I had to tug it on my own every time, I didn’t go far from the initial state of affairs where I would have to sit and update the page.



 $ crontab -e 


Added a line to it:



 */10 * * * * python3 /home/ubuntu/embassy.py >/dev/null 2>&1 


I thought that it’s enough for me if this script is processed every 10 minutes by a cron on a server with ubuntu.



6. Conclusion



At 11:50 I received a message that a new report appeared, but my visa was not in it ... After that, I went to the embassy. He took him by storm (letters and calls were not answered for several weeks) and eventually received his passport with a visa.



In general, programming skills are important in the modern world, even if you are not a programmer. It allows you to automate some of your routine operations, which makes your world a little more convenient. In fact, it could even be arranged in a separate service, where a person simply enters his id and e-mail, and he receives a message about the readiness of a visa by mail.



')

Source: https://habr.com/ru/post/450060/



All Articles