⬆️ ⬇️

Using AWS Lambda to create an archive of specific files with AWS S3

image



AWS Lambda is a computing service that runs your code in certain events and automatically manages your computing resources. Overview of the concept, principles of operation, prices and the like is already on the habr ( habrahabr.ru/company/epam_systems/blog/245949 ), I will try to show a practical example of using this service.



So, as the name implies, we will use AWS Lambda to create an archive of the files we have stored on AWS S3. Go!

')





Creating a new feature in the AWS console



AWS Lambda is in the “preview” stage, so if you use it for the first time, you will need to fill out a request and wait a few days for access.



To create a new function in the AWS console, click on the Create a Lambda function button and get to the form of setting the parameters of the new function.

First of all, we are asked the name and description of the new function:

image



Then its code:

image

You can either write the code directly to the editor or upload a specially prepared zip archive. The first option is only suitable for code without additional dependencies, and the second at the time of writing this article did not work through the web. Therefore, at this stage we create a function with the code from the editor, without changing the text of the proposed example, and later we will load the code we need with all the dependencies using software.



Role name determines what access rights to various AWS objects a function will have. I will not focus on this now, I’ll just say that the rights offered by default when creating a new role provide access to AWS S3 and therefore are sufficient for this example.



You must also specify the allocated amount of memory and the timeout of execution:

image

The allocated amount of memory affects the price of the function (the more, the more expensive). However, allocated processor resources are also tied to it. Since the task of creating an archive is highly dependent on processor resources, we choose the maximum available amount of memory — the increase in price is fully compensated by the reduction in processing time.



After completing the form, click the Create Lambda function and leave the AWS console in order to proceed to the direct creation of our function.



Function code, as well as its packaging and unloading in AWS



To solve our problem, we will use several third-party libraries, as well as the grunt-aws-lambda library for the convenience of developing, packaging and unloading the finished function.



We create packaje.json as follows:

{ "name": "zip-s3", "description": "AWS Lamda Function", "version": "0.0.1", "private": "true", "devDependencies": { "aws-sdk": "^2.1.4", "grunt": "^0.4.5", "grunt-aws-lambda": "^0.3.0" }, "dependencies": { "promise": "^6.0.1", "s3-upload-stream": "^1.0.7", "archiver": "^0.13.1" }, "bundledDependencies": [ "promise", "s3-upload-stream", "archiver" ] } 


and install dependencies:

 npm install 


The bundledDependencies archive in package.json contains dependencies that will be packaged along with our function upon upload.



After that, create the index.js file in which the function code will be located.

First, let's see how the code of a function looks like that does nothing:

 exports.handler = function (data, context) { context.done(null, ''); } 


A call to context.done indicates that the function is completed, while AWS Lambda stops its execution, counts the time used, and so on.



The data object contains the parameters passed to the function. The structure of this object we will have the following form:

 { bucket : 'from-bucket', keys : ['/123/test.txt', '/456/test2.txt'], outputBucket : 'to-bucket', outputKey : 'result.zip' } 


Let's start writing the actual function code.

We connect the necessary libraries:

 var AWS = require('aws-sdk'); var Promise = require('promise'); var s3Stream = require('s3-upload-stream')(new AWS.S3()); var archiver = require('archiver'); var s3 = new AWS.S3(); 


Create objects that will archive files and stream download the resulting archive to AWS S3.

 var archive = archiver('zip'); var upload = s3Stream.upload({ "Bucket": data.outputBucket, "Key": data.outputKey }); archive.pipe(upload); 


Create a promise to call context.done when the result is finished loading:

 var allDonePromise = new Promise(function(resolveAllDone) { upload.on('uploaded', function (details) { resolveAllDone(); }); }); allDonePromise.then(function() { context.done(null, ''); }); 


We get the files at the specified addresses and add them to the archive. After the download of all files, close the archive:

 var getObjectPromises = []; for(var i in data.keys) { (function(itemKey) { itemKey = decodeURIComponent(itemKey).replace(/\+/g,' '); var getPromise = new Promise(function(resolveGet) { s3.getObject({ Bucket: data.bucket, Key : itemKey }, function(err, fileData) { if (err) { console.log(itemKey, err, err.stack); resolveGet(); } else { var itemName = itemKey.substr(itemKey.lastIndexOf('/')); archive .append(fileData.Body, { name: itemName }); resolveGet(); } }); }); getObjectPromises.push(getPromise); })(data.keys[i]); } Promise.all(getObjectPromises).then(function() { archive.finalize(); }); 


All Code Assembly
 var AWS = require('aws-sdk'); var Promise = require('promise'); var s3Stream = require('s3-upload-stream')(new AWS.S3()); var archiver = require('archiver'); var s3 = new AWS.S3(); exports.handler = function (data, context) { var archive = archiver('zip'); var upload = s3Stream.upload({ "Bucket": data.outputBucket, "Key": data.outputKey }); archive.pipe(upload); var allDonePromise = new Promise(function(resolveAllDone) { upload.on('uploaded', function (details) { resolveAllDone(); }); }); allDonePromise.then(function() { context.done(null, ''); }); var getObjectPromises = []; for(var i in data.keys) { (function(itemKey) { itemKey = decodeURIComponent(itemKey).replace(/\+/g,' '); var getPromise = new Promise(function(resolveGet) { s3.getObject({ Bucket: data.bucket, Key : itemKey }, function(err, data) { if (err) { console.log(itemKey, err, err.stack); resolveGet(); } else { var itemName = itemKey.substr(itemKey.lastIndexOf('/')); archive .append(data.Body, { name: itemName }); resolveGet(); } }); }); getObjectPromises.push(getPromise); })(data.keys[i]); } Promise.all(getObjectPromises).then(function() { archive.finalize(); }); }; 




To package and upload files to AWS, create Gruntfile.js with the following content:

Gruntfile.js
 module.exports = function(grunt) { grunt.initConfig({ lambda_invoke: { default: { } }, lambda_package: { default: { } }, lambda_deploy: { default: { function: 'zip-s3' } } }); grunt.loadNpmTasks('grunt-aws-lambda'); }; 


And the ~ / .aws / creadentials file with AWS access keys:

 [default] aws_access_key_id = ... aws_secret_access_key = ... 


We pack and unload our function in AWS Lambda:

 grunt lambda_package lambda_deploy 


Calling the created function from our application



We will call the function from the application in java.

To do this, prepare the data:

 JSONObject requestData = new JSONObject(); requestData.put("bucket", "from-bucket"); requestData.put("outputBucket","to-bucket"); requestData.put("outputKey", "result.zip"); JSONArray keys = new JSONArray(); keys.put(URLEncoder.encode("/123/1.txt","UTF-8")); keys.put(URLEncoder.encode("/456/2.txt","UTF-8")); requestData.put("keys", keys); 


And directly call the function:

 AWSCredentials myCredentials = new BasicAWSCredentials(accessKeyID, secretKey); AWSLambdaClient awsLambda = new AWSLambdaClient(myCredentials); InvokeAsyncRequest req = new InvokeAsyncRequest(); req.setFunctionName("zip-s3"); req.setInvokeArgs(requestData.toString()); InvokeAsyncResult res = awsLambda.invokeAsync(req); 


The function will be executed asynchronously - we will immediately get the result that the execution request has been successfully received by AWS Lambda, and the execution itself will take some time.

Source: https://habr.com/ru/post/247661/



All Articles