Post relevant for the May holidays. 6 weeks ago I applied for a visa to Ireland. Departure is scheduled for April 30th. There is an embassy website that publishes lists of visa decisions . They do this on Mondays and Thursdays. And here I am sitting on Sunday, April 28th, on my visa there is no decision yet. And my further actions on Monday depend on whether my statement is in the new report or not. If not, then it will be necessary to go to the embassy and understand. If there is, then pull the visa application center. Sitting on the page and updating the whole day on Monday seemed like a dull pastime, so I wrote a script in Python.
Disclaimer. I'm not a programmer, but I can program. This means that I cannot write an elegant and efficient code, but I can make this barrel organ do what I need from it.
So what needs to be done:
24 April
).The function code looks like this:
def check_report(url, div_class, date): embassy_page = requests.get(url) page_text = BeautifulSoup(embassy.text, 'lxml') tags = page_text.findAll('div', {"class": div_class}) text = '' report_url = '' for tag in tags: tag_soup = BeautifulSoup(tag.text, 'lxml') report = s(text=re.compile(date)) if len(report) > 0: text = 'New report published' report_url = 'https://www.dfa.ie' + tag.find('a').attrs['href'] return text, report_url
Now more in detail what happens in this code.
For a start, we use the requests
library, which helps us download the required page. Then, we use another library BeautifulSoup
, which helps turn this wild page layout into a more beautiful and comfortable look.
Before using BeautifulSoup
:
'<!DOCTYPE html>\r\n\r\n<html lang="en">\r\n <head>\r\n <META http-equiv="X-UA-Compatible" content="IE=edge">\r\n <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\t\r\n\t<meta name="viewport" content="initial-scale=1">\r\n \r\n \r\n \r\n \r\n \r\n <!-- Static Meta data -->\r\n \r\n\t<!-- <meta name="DC.Creator" content="Department of Foreign Affairs" />\r\n\t<meta name="DC.Publisher" content="Department of Foreign Affairs" /> \r\n\t<meta name="DC.Format" content="text/xhtml" /> \r\n\t<meta name="DC.Copyright" content="All material (c) copyright 2012 Department of Foreign Affairs" /> \r\n\t<meta name="DC.Source" content="Department of Foreign Affairs" /> \r\n\t<meta name="DC.Language" content="en" />\r\n\t<meta name="DC.Author" content="Department of Foreign Affairs" /> -->\r\n\r\n\r\n<meta name="author" content="Department of Foreign Affairs">\r\n<meta name="google-site-verification" content="HHtulupgM8GXpd9YYDjoXUb6MiU7_mGTkHixUrVPFYQ" />\r\n \r\n\t<title>Weekly Decision Report - Department of Foreign Affairs and Trade</title>\r\n <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet" /> \r\n <link rel="stylesheet" type="text/css" media="screen" href="/media/dfa-2017/style-assets/css/font-defs.css" />\t<!-- 2017 font-defs.css --> \r\n\t<link rel="stylesheet" type="text/css" media="screen" href="/media/dfa-2017/style-assets/css/style.css" />\t<!-- 2017 style.css -->\r\n <link rel="stylesheet" type="text/css" media="print" href="/media/dfa-2017/style-assets/css/print.css" />\t<!-- 2017 print.css -->\r\n
After:
<!DOCTYPE html> <html lang="en"> <head> <meta content="IE=edge" http-equiv="X-UA-Compatible"/> <meta content="text/html; charset=utf-8" http-equiv="Content-Type"/> <meta content="initial-scale=1" name="viewport"/> <!-- Static Meta data --> <!-- <meta name="DC.Creator" content="Department of Foreign Affairs" /> <meta name="DC.Publisher" content="Department of Foreign Affairs" /> <meta name="DC.Format" content="text/xhtml" /> <meta name="DC.Copyright" content="All material (c) copyright 2012 Department of Foreign Affairs" /> <meta name="DC.Source" content="Department of Foreign Affairs" /> <meta name="DC.Language" content="en" /> <meta name="DC.Author" content="Department of Foreign Affairs" /> --> <meta content="Department of Foreign Affairs" name="author"/> <meta content="HHtulupgM8GXpd9YYDjoXUb6MiU7_mGTkHixUrVPFYQ" name="google-site-verification"/> <title>Weekly Decision Report - Department of Foreign Affairs and Trade</title> <link href="https://stackpath.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/> <link href="/media/dfa-2017/style-assets/css/font-defs.css" media="screen" rel="stylesheet" type="text/css"/> <!-- 2017 font-defs.css --> <link href="/media/dfa-2017/style-assets/css/style.css" media="screen" rel="stylesheet" type="text/css"/> <!-- 2017 style.css -->
With this you can now somehow live and work. In particular, in my case we will look for a special div class
, which is used for links to reports. We can see this from the source code of the page: <div class="gen-content-landing__block">
. In my code, we are looking for all such tags.
Next, we go over the collected tags and look for the one that contains the date of the new report: 24 April
. If such a result is found, then we pull out a link from it and form the text that the new report is published.
So what needs to be done now is here:
The function code looks like this:
def check_visa(report_url, filename, visa_id, text): pdf = requests.get(report_url) file_path = Path(filename) file_path.write_bytes(pdf.content) pdfFileObj = open(filename, 'rb') pdfReader = PyPDF2.PdfFileReader(pdfFileObj, strict=False) for pageNum in range(0, pdfReader.numPages): page = str(pdfReader.getPage(pageNum).extractText().encode('utf-8')).split('\\n') if visa_id in page: visa_index = page.index(visa_id) status = page[visa_index + 1] text = text + '\t' + visa_id + '\t' + status return text
With the help of the requests
library again, we make a request to this report. Next, save it locally. Using the PyPDF2
library, PyPDF2
read the file. Then we visa_id
over its pages and look for a visa_id
in the array of tokens. The markup of this pdf-file is such that the next token after visa_id
is directly the consideration status: Approved
or Refused
. Next we concatenate the existing text with id and status.
Good. Script found, let's say, my id, but I need to notify me about it somewhere. We use Slack as a messenger in our company, so I thought it would be convenient for me to get a notification there.
By this link you can customize your webhook. There you can select a channel or addressee, who can post a message (perhaps, for this step you will need to be a WorkSpace administrator). There you will also receive a unique webhook url that can be used in the code.
def send_to_slack(webhook_url, text): post = {"text": "{0}".format(text)} json_data = json.dumps(post) req = requests.post(webhook_url, data=json_data.encode('ascii'), headers={'Content-Type': 'application/json'}) return req.status_code
Using the same requests
library, we make a POST request with the text
content at the webhook address.
The rest of the code looks like this:
url = 'https://www.dfa.ie/irish-embassy/russia/visas/weekly-decision-report/' div_class = 'gen-content-landing__block' date = '24 April' filename = 'weekly_report.pdf' visa_id = '38644112' webhook_url = 'https://hooks.slack.com/services/...' text, report_url = check_report(url, div_class, date) if text != '': text = check_visa(report_url, filename, visa_id, text) print(send_to_slack(webhook_url, text))
We assign values ​​to all necessary variables and then run the prescribed functions.
OK. I wrote the script, but if I had to tug it on my own every time, I didn’t go far from the initial state of affairs where I would have to sit and update the page.
$ crontab -e
Added a line to it:
*/10 * * * * python3 /home/ubuntu/embassy.py >/dev/null 2>&1
I thought that it’s enough for me if this script is processed every 10 minutes by a cron on a server with ubuntu.
At 11:50 I received a message that a new report appeared, but my visa was not in it ... After that, I went to the embassy. He took him by storm (letters and calls were not answered for several weeks) and eventually received his passport with a visa.
In general, programming skills are important in the modern world, even if you are not a programmer. It allows you to automate some of your routine operations, which makes your world a little more convenient. In fact, it could even be arranged in a separate service, where a person simply enters his id and e-mail, and he receives a message about the readiness of a visa by mail.
Source: https://habr.com/ru/post/450060/