Switching from AWS Amplify to GitLab and CloudFlare

Created: October 30, 2020

Introduction

I initially published this (Jekyll) website using AWS Amplify. The process was really very easy and I can’t say I wouldn’t recommend it. You can read about it here: Yet Another Blog Creation Article.

However, I was left slightly worried about the bandwidth costs of AWS CloudFront (a traditional caching service). CloudFront charges for bandwidth and if I were to ever have a massive traffic spike, I might end up getting a huge bill. CloudFlare, on the other hand, does not charge for bandwidth usage and also offers additional free (worthwhile) features.

As Amplify builds the Jekyll website using a CI/CD pipeline and stores the result in S3 (behind the scenes), I also needed to replace the CI/CD component. I chose GitLab to use for version control and for a CI/CD pipeline. GitHub also offers CI/CD, including 2000 minutes, but it’s less mature than GitLab’s offering.

I discovered some pros and cons to the change along the way. Let’s talk about them first, in case you are thinking of following along.

Pros

CloudFlare does not charge for bandwidth; making it cheaper than CloudFront and protecting you from large surprise bills.
There was no longer any need for a Route 53 hosted zone, 50 cents saved per month. CloudFlare provides this functionality for free.
Free and automatic (after setup) minifcation of HTML, CSS, and Javascript. This makes your pages load faster and saves bandwidth. There are other methods of setting up minifcation, however, time is valuable and this a good way to save it.
Three free firewall rules. Using AWS WAF that would cost at least $9 / month + bandwidth charges, at the time of writing.
CloudFlare can obfuscate email addresses. I got rid of the ugly captcha-like image used to hide my email address, in favour of nice clean plain text.

Cons

It takes a bit more effort to set up.
Pretty Jekyll URLs (i.e. without .html extensions) were not were working. I suspect there is some way to fix this though.
There is no way of having SSL between the S3 bucket and CloudFlare (to my knowledge) without using CloudFront as well.
- We can (and should) create IP restrictions so only CloudFlare can access the bucket, however, that does not help us if one of the hops between AWS and CloudFlare is compromised.
- Hence, we shouldn’t use this method for anything of importance (but a personal blog, or similar, is fine). Think: what are the repercussions if this gets compromised?
If we use CloudFront between S3 and CloudFlare, we cannot have IP restrictions without also having AWS WAF in front of CloudFront.
- We need the IP restrictions as we don’t want people to bypass CloudFlare.
- WAF was about $6 plus a small bandwidth charge per GB; not worth it for this particular use case. However, if this a professional project and/or sensitive data is involved, this is definitely the way to go*.

* If you do go this route, turn the caching down to minimum on CloudFront. Having two caches is unnecessary and will just cause problems. SSL might also be a bit finicky (although I saw some blog posts that say it is possible).

Now for how…

Step 1: Set up your GitLab repos

If you don’t know how to do this.

Step 2: Create an S3 static website bucket and assign it permissions

Create an S3 bucket and turn static website mode on.
- I’m not going to go through the basic of making a bucket and turning on static website mode. I would suggest looking at the AWS docs if you need more info.
- Make sure to make the bucket completely public to start with. Don’t put anything in it until we have secured it though.
- Remember, the bucket name should be the address of your website. For example, lochhead.me.

Go into bucket policy and paste the following, replacing YOUR_BUCKET with the name of your bucket:

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "PublicReadGetObject",
          "Effect": "Allow",
          "Principal": "*",
          "Action": "s3:GetObject",
          "Resource": "arn:aws:s3:::YOUR_BUCKET/*",
          "Condition": {
              "IpAddress": {
                  "aws:SourceIp": [
                      "2400:cb00::/32",
                      "2606:4700::/32",
                      "2803:f800::/32",
                      "2405:b500::/32",
                      "2405:8100::/32",
                      "2a06:98c0::/29",
                      "2c0f:f248::/32",
                      "173.245.48.0/20",
                      "103.21.244.0/22",
                      "103.22.200.0/22",
                      "103.31.4.0/22",
                      "141.101.64.0/18",
                      "108.162.192.0/18",
                      "190.93.240.0/20",
                      "188.114.96.0/20",
                      "197.234.240.0/22",
                      "198.41.128.0/17",
                      "162.158.0.0/15",
                      "104.16.0.0/12",
                      "172.64.0.0/13",
                      "131.0.72.0/22"
                  ]
              }
          }
      }
  ]
}

These are CloudFlare’s IPv4 and IPv6 addresses at the time of writing.
CloudFlare publish their IPv4 addresses here and IPv6 addresses here.
Add your own IP address in, using the same format, so you can test the bucket is functioning. Remember a /32 prefix means a single IP address.
Put an index.html (or whatever you chose to call it) in the S3 bucket and test the website works.
Here are the relevant Amazon docs.

Step 3: Create an IAM user and a least privilege policy to allow access to the S3 bucket

Start with the user, it should only have “Programmatic access”, NOT console access.
- Don’t attach policies, we will leave it blank.
- Save the access key ID and secret securely, we will use it later.

Next, go into Policies within IAM.

Create a new custom policy. I like to prefix my custom policy names with underscores so they appear at the top of the list.

Activate JSON mode and paste in the following, replacing YOUR_BUCKET with the name of your bucket:

{
"Version": "2012-10-17",
"Statement": [
    {
        "Sid": "VisualEditor1",
        "Effect": "Allow",
        "Action": [
            "s3:ListBucket"
        ],
        "Resource": "arn:aws:s3:::YOUR_BUCKET"
    },
    {
        "Sid": "VisualEditor0",
        "Effect": "Allow",
        "Action": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject"
        ],
        "Resource": "arn:aws:s3:::YOUR_BUCKET"
    }
]
}

I have seen this done incorrectly many times. If you give the user too many privileges, you give an attacker more opportunity to move laterally in your environment, if they were compromise the account. This is the correct way, aka the way that grants the least necessary privileges. Please see these Amazon docs.
Attach the policy to the user.

Step 4: Add the user access ID, secret, region and bucket variables to your GitLab repo

In your repo go: “Settings” > “CI / CD” > “Variables”.
Every setting should probably be Protected and Masked. This is more of an issue for multi-user repos. However, it’s still good practice as you might add someone later, forgetting you have accessible credentials stored. Read the GitLab docs for more info.
We want to add AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY here - the two values we saved earlier in step 3.
Additionally, I added AWS_DEFAULT_REGION and the corresponding code for the region I use. Here’s a list of region codes.
I also stored S3_BUCKET with the name of my S3 bucket in a variable.
The latter two aren’t strictly necessary, we could do it in the CI/CD file, however, extra security at no extra cost is a no-brainer.

Step 5: The CI/CD pipeline

In my .gitlab-ci.yml, I defined a simple CI/CD pipeline (see below).
- In the build stage, we use an official Jekyll docker image to build the website. Then we grab the _site output, as an artifact, as that is what we want to deploy.
- We then use GitLab’s AWS Docker image to recursively remove the current contents of our S3 bucket, then add the contents of _site.
Defining the same Jekyll version that you are using locally is a good idea (if you are using it locally). Remember the check that the version you specify exists as a tag of Jekyll on DockerHub.

stages:
  - build
  - deploy

variables:
  JEKYLL_VERSION: 4.1.0

buildJekyll:
  stage: build
  image: docker.io/jekyll/jekyll:${JEKYLL_VERSION}
  script:
    - jekyll b --future
  artifacts:
    paths:
      - _site/

deploySite:
  stage: deploy
  image: registry.gitlab.com/gitlab-org/cloud-deploy/aws-base:latest
  script:
    - aws s3 rm s3://${S3_BUCKET} --recursive
    - aws s3 cp _site s3://${S3_BUCKET}/ --recursive

Doing a git push -u origin master should now start the build process and put the result in the S3 bucket, which can then be accessed using a web browser.