Gluing Services Together With Lambda

Recently I've been running some experiments with AWS Lambda, a service that provides the ability to run functions of code in multiple supported runtimes, paying only for the time your function spends running.

Another excellent feature of this service is its event-driven integrations, allowing the function to be called based on multiple types of event, including several events built into other AWS services, as well as being able to connect these functions directly to the web with API Gateway.

The AWS service events you can use to trigger a lambda function include:

  • Creating / Deleting objects in S3
  • Creating / Updating records in DynamoDB
  • Pulling events from a Kenisis Stream
  • IoT Events
  • Cognito Sync Events
  • Cloudwatch Log Events
  • Scheduled Evens
  • SNS Notifications

Using these event types it is possible to use Lambda to implement all kinds of glue-logic to connect services and events together. This is especially useful when working within a service orienteted architecture.

While the Documentation provide great coverage of how Lambda works, charges your account and integrates with other AWS services, it only provides a few basic examples, so some experimentation will definately be required.

Top Tips for Lambda Usage

I've been working with AWS Lambda now for about a week and so far Ive noticed the following patterns emerge as best tips for creating your functions.

1. Always build your own IAM policies for Lambda Functions

Lambda provides several 'one-click' policies to be used for Lambda functions, to grant the function access to other AWS services. As is the case wich such 'one-click' setups, these policies are far too permissive for general use. A role policy should be created for your lambda function, providing it only the access that particular function needs to perform its actions.

For example, if you have a function which reads image files from an S3 bucket, extracts the EXIF data and stores it in a DynamoDB table, then the following rules should be applied to the AWS policy:

  • Allow GetObject access to the S3 bucket the function will be triggered by.
  • Allow PutItem access to the DynamoDB table the function will write the EXIF data too.

Thus the policy document for such a role would look like the following:

[
  {
    "Statement" : "AllowGetObjectForImageBucket",
    "Effect" : "Allow",
    "Actions" : [
      "s3:GetObject"
    ],
    "Resource" : "arn:aws:s3:::images-bucket/*"
  },
  {
    "Statement" : "AllowPutItemAccessForExifTable",
    "Effect" : "Allow",
    "Actions" : [
    "dyanmodb:PutItem"
    ],
    "Resource" : "arn:aws:dynamodb:REGION:accountID:exifDataTable"
    }
]

Note: I've not actually verified the validity of this policy document, I think the ARN values for Dynamodb tables, and action for PutItem may be different to what is shown here, this is an illustrative guide only.

This document now means that the function cannot read objects from other S3 buckets, nor can it write to other DynamoDB tables, or update/delete existing rows from the table, this means that the function can now only perform the AWS actions that it should require.

This is a best practice for all IAM policies, especially those used with resource roles.

2. Decouple your Lambda function handler and logic

In the documentation examples, all the function logic is implemented in the Lambda handler function, this is a design pattern which I personally disagree with, unless the given function is trivial in nature.

Better is to implement the Lambda handler function separately from the actual logic of what it will do, passing in the Lambda event and context from the handler function to the function logic. This allows the function to be tested outside the Lambda environment.

Its also a good idea to have the minimum of startup code, as Lambda is billed in intervals of 100ms, so startup code will impact both the time your function takes, as well as the cost of running it.

Pushing Posts with Jekyll & Git

So I've made some changes here today and the bulk of that is switching over from Joomla to Jekyll.

With this change of systems comes a change in how things are deployed. Jekyll is a static site generator. It builds a site of entirely static files from dynamic sources.

There are a couple of ways to make this process easier when running a site built using Jekyll, most notably Github pages.

However, I wanted something a bit different, and something which I could run against my own SSL certificates. Given these requirements I've chosen to continue to run the site on my own nginx-based web server.

Deployment

Pushing an update to the site is a pretty simple process. The entire work-flow is controlled by git.

Once I make some changes, such as this post. I commit the changes to git. I push these changes to github.

I then also trigger a push of the site independently using a git remote set up on the server:

git push site   

This does a push of the changes to my server, which has a hook set up for the post-receive event. In this script, I perform the following actions:

  1. Clone the site source from github to a temporary folder.
  2. Build the site within that temporary folder using jekyll build
  3. If the build succeeds: 3.1 Replace the existing site with the new one 3.2 Delete the temporary folder
  4. If the build fails the temporary folder is kept so I can investigate the cause.

All this is handled by a simple shell script of about 20 lines:

echo "jekyll_build: cloning $GIT_REPO to $TMP_CLONE"
git clone $GIT_REPO $TMP_CLONE

echo "jekyll_build: building site from $TMP_CLONE to $PUBLIC_WWW"
"$JEKYLL_BIN" build -s "$TMP_CLONE" -d "$PUBLIC_WWW"

echo "jekyll_build: cleaning up - deleting $TMP_CLONE"
rm -rf $TMP_CLONE

echo "jekyll_build: DONE!"
exit

On Jekyll

I've moved this here blog thing from Joomla to Jekyll.

I've not (yet) moved the site content over to this new site, as currently I'm unsure if any of the content I had written was really worth preserving, or if it was best to start anew.

The resons for moving to Jekyll are simple and mostly obvious:

No requirement for a server-side scripting language or database

This should be pretty clear but the main reason is that working with PHP and the popular PHP CMSes out there on a daily basis teaches you to be wary of them, this deployment mechanism means that the site itself cannot be hacked directly or used as an attack vector.

Of course, nothing is truely "hack-proof" so precautions still need to be taken, but it removes the vulnerabilities that a CMS like Wordpress would introduce.

Native Markdown Editing

Because most CMSes are not designed for people like me, who use markdown as their de-facto standard for formatting prose text. Many use an HTML WYSIWYG editor, which is great for most users, but ends up making editing less efficient, and the output less elegant. It also means that the only format the content can be delivered in is HTML.

No dedicated editing application

Using Jekyll, and a git based deployment process means that deploying changes to the site is simple and easy, and I can do it anywhere thanks to github's online editor. I only need to be logged into my github account in order to make changes or write a new post.

Currently, I'm using a git hook to rebuild the site and publish the changes, this is triggered by a git push to my server.

This script clones the repo from github to a temporary directory, builds it to the public directory, then deletes the temporary copy.

My next post will probably be about this deployment mechanism.

Warp Speed

Finally, this item actually returns to my first point, but the lack of server-side programming in this site means that it can be delivered at break-neck speeds. Even without any kind of CDN or HTTP accellerator like Varnish, a well tuned nginx configuration and a lack of server-side scripting means that the all-important TTFB is much lower.

I hope that these items above and the transition to Jekyll will give me cause to write better, more frequent posts here.