📜 ⬆️ ⬇️

A simple family budget tracker with AWS SES, Lambda and DynamoDB (and Route53)

How to control the family budget?


image


I have always had difficulty accurately following the budget, especially now that all purchases are made on a credit card. The reason is simple - there is no bundle of money in front of your eyes, which is gradually shrinking, and at some point you realize that there is nothing more to spend. If the majority of purchases are paid by credit card, then the only way to find out how much money is left or how much is spent is to go to the Internet bank or mobile bank, or use financial aggregators, for example, Mint, which also need to go and check the balance. This is possible, but it requires discipline, and when not only you pay with the same card, it is difficult to establish it.


I thought that I would be satisfied with the option if I would receive a notification every day about how much money I still had this month. That is, I would set a budget for the month, and something would count my expenses and send a report on the state of the budget every day.


The most obvious option is to use the bank's API or go to its Internet bank programmatically using some kind of headless browser. Unfortunately, access to the API of my bank is paid, and going to the Internet bank is problematic due to two-factor authentication. However, there is another option. Almost all banks today send alerts to each transaction, informing when, how much and where the transaction took place. It is the information that is needed to maintain the budget. It remains to figure out how to handle it.


My bank can send alerts to your mobile phone and email. The option with a mobile phone was not considered due to the complexity of processing SMS messages. The e-mail version looks very tempting, software processing of e-mails could have been done decades ago. But now in my home only the laptop is not always turned on, which means we will automate the budget somewhere in the cloud, for example, AWS.


What do we need in AWS?


There are many services in AWS, but we need only three: to receive and send emails — SES, to process them — Lambda, and to store the DynamoDB result. Plus, a couple of auxiliary for the bundle - SNS, Kinesis, CloudWatch. This is not the only way to handle messages: instead of Lambda, you can use EC2, instead of DynamoDB, you can store data in RDS (MySQL, PostgreSQL, Oracle, ...), and you can even write a simple script on your small server on Perl and BerkleyDB.


What does the whole treatment look like in general? A transaction letter arrives, we record the date, amount and place of payment in the database, and once a day we send a letter with the balance for the given month. The whole architecture is a bit more complicated and looks like this:



  1. A letter arrives at SES.
  2. SES sends an email to the SNS topic.
  3. The ProcessCharge Lambda function is triggered by the arrival of a letter via SNS, parses the letter and writes transaction data to the DynamoDB table Transactions.
  4. The Lambda function UpdateSummary works as a trigger after writing to the Transactions table and updates the current state of the budget in the Summary table.

Consider these steps in more detail.


Receiving a letter


Simple Email Service, also known as SES, is a service for receiving and sending emails. When you receive a letter, you can specify which action should be performed: save the letter in S3, call the Lambda function, send the letter to the SNS and others. To receive letters, you need to bind your domain, namely, specify the SES server in the MX domain record. I didn’t have my own domain at that time, and I decided that it was a good reason to register it using another AWS service Route 53. I also acted in the same place in Route 53.


When a domain is bound to SES, verification is required. To do this, SES asks to add some records to the DNS zone (MX and TXT), and then checks their presence. If the domain is hosted in Route 53, then all this is done automatically. When the domain is verified, you can proceed to setting up rules for receiving mail. My only rule is very simple: all emails coming to the ccalert @ address of our domain should be sent to the SNS topic ccalerts:


aws> ses describe-receipt-rule --rule-set-name "ccalerts" --rule-name "ccalert" { "Rule": { "Name": "ccalert", "Recipients": [ "ccalert@=censored=” ], "Enabled": true, "ScanEnabled": true, "Actions": [ { "SNSAction": { "TopicArn": "arn:aws:sns:us-west-2:=censored=:ccalerts", "Encoding": "UTF-8" } } ], "TlsPolicy": "Optional" } } 

Letter processing


When a new letter is published to the SNS topic, the Lambda function ProcessCharge is called. She needs to do two actions - parse the letter and save the data to the database.


 from __future__ import print_function import json import re import uuid from datetime import datetime import boto3 def lambda_handler(event, context): message = json.loads(event['Records'][0]['Sns']['Message']) print("Processing email {}".format(message['mail'])) content = message['content'] trn = parse_content(content) if trn is not None: print("Transaction: %s" % trn) process_transaction(trn) 

The parse_content () method is responsible for parsing:


 def parse_content(content): content = content.replace("=\r\n", "") match = re.search(r'A charge of \(\$USD\) (\d+\.\d+) at (.+?) has been authorized on (\d+/\d+/\d+ \d+:\d+:\d+ \S{2} \S+?)\.', content, re.M) if match: print("Matched %s" % match.group(0)) date = match.group(3) # replace time zone with hour offset because Python can't parse it date = date.replace("EDT", "-0400") date = date.replace("EST", "-0500") dt = datetime.strptime(date, "%m/%d/%Y %I:%M:%S %p %z") return {'billed': match.group(1), 'merchant': match.group(2), 'datetime': dt.isoformat()} else: print("Didn't match") return None 

In it, we remove unnecessary characters and use a regular expression to check whether the letter contains information about the transaction, and if so, break it into parts. The text sought is as follows:


A charge of ($ USD) 100.00 at Amazon.com has been authorized on 07/19/2017 1:55:52 PM EDT.

Unfortunately, the standard Python library knows few time zones, and EDT (Eastern Daylight Time) is not among them. Therefore, we replace the EDT with the numerical designation -0400, and do the same for the main time zone, EST. After that, we can parse the date and time of the transaction, and convert it to the standard ISO 8601 format supported by DynamoDB.


The method returns a hash table with the transaction amount, store name and date with time. This data is passed to the process_transaction method:


 def process_transaction(trn): ddb = boto3.client('dynamodb') trn_id = uuid.uuid4().hex ddb.put_item( TableName='Transactions', Item={ 'id': {'S': trn_id}, 'datetime': {'S': trn['datetime']}, 'merchant': {'S': trn['merchant']}, 'billed': {'N': trn['billed']} }) 

In it, we store data in the Transactions table, generating a unique transaction identifier.



Budget update


I would like to dwell here in more detail, namely on the moment how the budget is monitored. We define for ourselves several values:



At any time we want to know all these values. This can be done in two ways:


  1. Every time you need to know the state of the budget, transactions are summarized to get total, then available = (budget - total).
  2. Each time a new transaction is recorded, the total is updated. When you need to know the state of the budget, is available = (budget - total).

Both approaches have pros and cons, and the choice is highly dependent on the requirements and limitations of the system. The first approach is good because it does not denormalize the data, keeping the amount of transactions separately. On the other hand, with him the amount must be calculated with each request. For my volumes this will not be a problem, but in my case I have a limitation caused by DynamoDB. To calculate the sum of N transactions, you need to read N records, which means to spend N read capacity units. Obviously, this is not a very scalable solution, which will cause difficulties (or high cost) even with several dozen transactions.


When using the second approach, the total is updated after each transaction and is always relevant, which avoids the summation of all transactions. I found this approach more rational in my case. Implement it, again, in different ways:


  1. Update total after each transaction is recorded in the same ProcessCharge Lambda function.
  2. Update total in a trigger after adding a new item to the Transactions table.

Updating in a trigger is more practical, including in terms of multithreading, so I created the Lambda function UpdateSummary:


 from __future__ import print_function from datetime import datetime import boto3 def lambda_handler(event, context): for record in event['Records']: if record['eventName'] != 'INSERT': print("Unsupported event {}".format(record)) return trn = record['dynamodb']['NewImage'] print(trn) process_transaction(trn) 

We are only interested in the event of adding elements to the table, all others are ignored.


 def process_transaction(trn): period = get_period(trn) if period is None: return billed = trn['billed']['N'] # update total for current period update_total(period, billed) print("Transaction processed") 

In process_transaction (), we calculate the period, in the form year-month, to which the transaction relates, and call the update method total.


 def get_period(trn): try: # python cannot parse -04:00, it needs -0400 dt = trn['datetime']['S'].replace("-04:00", "-0400") dt = dt.replace("-05:00", "-0500") dt = dt.replace("-07:00", "-0700") dt = datetime.strptime(dt, "%Y-%m-%dT%H:%M:%S%z") return dt.strftime("%Y-%m") except ValueError as err: print("Cannot parse date {}: {}".format(trn['datetime']['S'], err)) return None 

This code is very far from perfect, and the interesting feature of Python played the role that it cannot parse the date / time with the time zone in the -HH: MM format, which conforms to the ISO 8601 standard, and which Python itself generated (code above , in the parse_content ()) method. Therefore, I simply replace the time zones I need with the -HHMM format that it understands. It was possible to use a third-party library and make it more beautiful, I will leave it for the future. Perhaps my bad knowledge of Python still affects me - this project is my first development experience on it.


Total update:


 def update_total(period, billed): ddb = boto3.client('dynamodb') response = load_summary(ddb, period) print("Summary: {}".format(response)) if 'Item' not in response: create_summary(ddb, period, billed) else: total = response['Item']['total']['N'] update_summary(ddb, period, total, billed) 

In this method, we load the summary for the current period using the load_summary () method, the total in which we need to update. If the bulletin does not already exist, we create it in the create_summary () method, if it exists, update to update_summary ().


 def load_summary(ddb, period): print("Loading summary for period {}".format(period)) return ddb.get_item( TableName = 'Summary', Key = { 'period': {'S': period} }, ConsistentRead = True ) 

Since the update of the report can be made from several streams, we use consistent read, which is more expensive, but guarantees that we will get the last recorded value.


 def create_summary(ddb, period, total): print("Creating summary for period {} with total {}".format(period, total)) ddb.put_item( TableName = 'Summary', Item = { 'period': {'S': period}, 'total': {'N': total}, 'budget': {'N': "0"} }, ConditionExpression = 'attribute_not_exists(period)' ) 

When creating a new bulletin, for the same reason that it is possible to record from multiple streams, conditional notation is used, ConditionExpression = 'attribute_not_exists (period)' , which will save the new summary only if it does not exist. Thus, if someone managed to create a summary in the interval, when we tried to load it into load_summary () and it was not there, and when we tried to create it in create_summary (), our call to put_item () will end with the exception and the entire Lambda function will be restarted.


 def update_summary(ddb, period, total, billed): print("Updating summary for period {} with total {} for billed {}".format(period, total, billed)) ddb.update_item( TableName = 'Summary', Key = { 'period': {'S': period} }, UpdateExpression = 'SET #total = #total + :billed', ConditionExpression = '#total = :total', ExpressionAttributeValues = { ':billed': {'N': billed}, ':total': {'N': total} }, # total is a reserved word so we create an alias #total to use it in expression ExpressionAttributeNames = { '#total': 'total' } ) 

Updating the total value in a summary is done inside DynamoDB:


UpdateExpression = 'SET #total = #total +: billed'

Most likely, this is enough for a safe update, but I decided to act conservatively and added the condition that the record should occur only if the report was not updated in another stream, and it still contains the value that we have:


ConditionExpression = '#total =: total',

Since total is a keyword for DynamoDB, to use it in DynamoDB expressions, you need to create a synonym:


ExpressionAttributeNames = {
'#total': 'total'
}

This completes the transaction processing and budget update process:


periodbudgettotal
2017-071000500

Sending budget notifications


The last part of the system - notification of the state of the budget. As I wrote at the very beginning, it is enough for me to receive a notification once a day, which I implemented. However, nothing prevents to notify after each transaction, or after some threshold values ​​of expenses / balance. The architecture of sending an e-mail notification is quite simple and looks like this:



  1. The CloudWatch Timer timer is triggered once a day and calls the DailyNotification Lambda function.
  2. DailyNotification loads data from the DynamoDB Summary table and calls SES to send an email.

 from __future__ import print_function from datetime import date import boto3 def lambda_handler(event, context): ddb = boto3.client('dynamodb') current_date = date.today() print("Preparing daily notification for {}".format(current_date.isoformat())) period = current_date.strftime("%Y-%m") response = load_summary(ddb, period) print("Summary: {}".format(response)) if 'Item' not in response: print("No summary available for period {}".format(period)) return summary = response['Item'] total = summary['total']['N'] budget = summary['budget']['N'] send_email(total, budget) def load_summary(ddb, period): print("Loading summary for period {}".format(period)) return ddb.get_item( TableName = 'Summary', Key = { 'period': {'S': period} }, ConsistentRead = True ) 

First we try to load a summary for the current period, and if it’s not, then we’ll finish. If there is - we prepare and send the letter:


 def send_email(total, budget): sender = "Our Budget <ccalert@==censored==>" recipients = [“==censored==“] charset = "UTF-8" available = float(budget) - float(total) today = date.today().strftime("%Y-%m-%d") message = ''' As of {0}, available funds are ${1:.2f}. This month budget is ${2:.2f}, spendings so far totals ${3:.2f}. More details coming soon!''' subject = "How are we doing?" textbody = message.format(today, float(available), float(budget), float(total)) print("Sending email: {}".format(textbody)) client = boto3.client('ses', region_name = 'us-west-2') try: response = client.send_email( Destination = { 'ToAddresses': recipients }, Message = { 'Body': { 'Text': { 'Charset': charset, 'Data': textbody, }, }, 'Subject': { 'Charset': charset, 'Data': subject, }, }, Source = sender, ) # Display an error if something goes wrong. except Exception as e: print("Couldn't send email: {}".format(e)) else: print("Email sent!") 

Total


That's all. Now, after each transaction, the incoming letter is processed and the budget is updated, and a letter is sent once a day with a budget status notification. I still have plans to add functionality, for example, the classification of expenses by category, and the inclusion of a list of recent transactions in the notification, if you get something worthwhile - I will share in another article. If there are any questions, comments or edits - wait in the comments.


')

Source: https://habr.com/ru/post/334146/


All Articles