We are constantly working with Google BigQuery - importing data about users, their orders and advertising costs from various sources in order to be able to combine them with each other. What does this give us? For example, if you have an online store and a customer places an order by phone, and then logs into the site, then using Google BigQuery, you can link all of his actions backdating. You can track the client’s entire journey through the marketing funnel - from the first hit on the site to buying at the brick and mortar store, and evaluate advertising campaigns based on such offline sales.
In this collection, we have 6 python scripts to automate data import into
Google BigQuery using
Google Cloud functions : FTP; FTPS; HTTP (s); Intercom; MySQL and SFTP. The principle of operation is the same: using the HTTP POST request, the Cloud function is called, which receives data from the source and loads it into the Google BigQuery table. If the table already exists in the selected dataset, it will be overwritten.
Primary requirements
- Project in the Google Cloud Platform with activated billing.
- Access to editing (the “Editor” role of BigQuery data) and the execution of tasks (the role of the User of BigQuery tasks) for the service account of the Cloud-function in the BigQuery project where the table will be loaded;
- HTTP client to perform POST requests calling the Cloud function.
Setup steps
- Go to the Google Cloud Platform Console and log in with your Google account, or sign up if you don’t have an account yet.
- Go to the project with activated billing or create a new billing account for the project.
- Go to the Cloud Functions section and click "Create Function". Please note that there is a charge for using the Cloud functions.
- Fill in the following fields:
Name: for example, ftp-bq-integration or any other suitable name;
')
Allocated memory: 2 GB or less, depending on the size of the file being processed;
Trigger: HTTP;
Source code: Built-in editor;
Runtime: Python 3.X.
- Copy the contents of the main.py file into the built-in editor, the main.py tab.
- Copy the contents of the requirements.txt file into the built-in editor, the requirements.txt tab.
- Specify ftp / ftps / https as the called function, and so on, depending on the module you are using.
- In the advanced settings, increase the wait time from 60 seconds. up to 540 seconds or less, depending on the size of the file being processed.
- Complete the creation of the Cloud function by clicking on the "Create" button.
This module is designed to transfer files from FTP (FTPS, SFTP) servers to Google BigQuery using the Google Cloud function. The solution allows you to automatically download data to Google BigQuery from a file that is regularly updated on an FTP server.
The file that needs to be obtained from the corresponding server can be in any suitable extension (.json, .txt, .csv), but must be in one of the following formats: JSON (newline-delimited) or Comma-separated values ​​(CSV).
Usage example
from urllib import urlencode from httplib2 import Http trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/ftp/" headers = { "Content-Type": "application/json" } payload = { "ftp": { "user": "ftp.user_name", "psswd": "ftp.password", "path_to_file": "ftp://server_host/path/to/file/" }, "bq": { "project_id": "my_bq_project", "dataset_id": "my_bq_dataset", "table_id": "my_bq_table", "delimiter": ",", "source_format": "NEWLINE_DELIMITED_JSON", "location": "US" } } Http().request(trigger_url, "POST", urlencode(payload), headers = headers)
Module for transferring files from HTTPS servers to Google BigQuery.
Usage example
from urllib import urlencode from httplib2 import Http trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/https/" headers = { "Content-Type": "application/json" } payload = { "https": { "path_to_file": "https://server_host/path/to/file/", "user": "https.user_name", "psswd": "https.password" }, "bq": { "project_id": "my_bq_project", "dataset_id": "my_bq_dataset", "table_id": "my_bq_table", "delimiter": ",", "source_format": "CSV", "location": "US" } } Http().request(trigger_url, "POST", urlencode(payload), headers = headers)
Module to automate data transfer from Intercom to Google BigQuery using the Google Cloud function. Currently, the module allows you to import entities such as users, companies, contacts, admins, conversations, teams, tags, segments from Intercom. At the same time, the module does not support custom attributes.
Usage example
from urllib import urlencode from httplib2 import Http trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/intercom/" headers = { "Content-Type": "application/json" } payload = { "intercom": { "accessToken": "INTERCOM ACCESS TOKEN", "entities": [ "users", "companies", "contacts", "admins", "conversations", "teams", "tags", "segments" ] }, "bq": { "project_id": "YOUR GCP PROJECT", "dataset_id": "YOUR DATASET NAME", "location": "US" } } Http().request(trigger_url, "POST", urlencode(payload), headers = headers)
Used to transfer files from MySQL servers to Google BigQuery using the Google Cloud function. This solution allows you to automatically download data to Google BigQuery from tables that are regularly updated on the MySQL server.
Usage example
from urllib import urlencode from httplib2 import Http trigger_url = "https://REGION-PROJECT_ID.cloudfunctions.net/mysql/" headers = { "Content-Type": "application/json" } payload = { "mysql": { "user": "mysql.user", "psswd": "mysql.password", "host": "host_name", "port”: 3306, "database": "database_name", "table_id": "table_name", "query": "SELECT * FROM table_name" }, "bq": { "project_id": "my_bq_projec", "dataset_id": "my_bq_dataset", "table_id": "my_bq_table" } } Http().request(trigger_url, "POST", urlencode(payload), headers = headers)
More detailed documentation for each module can be found in the readme files in each section.
This is just the beginning, and now we are working on scripts for Bitrix and amoCRM, because we see that they are the most popular among our clients. Share what methods you use to merge data and what integrations you lack for this.