NOTE: this post may be subject to edits as I learn more and add more to the blog.

Towards the end of last year, I decided to set up this blog to document and motivate my research progress, and since it turned into a more involved task than I originally anticipated, I thought I’d better write something about the process here in order to potentially aid other budding blogeurs. Let me preface this post by stating at the outset that I am by no means a web expert- be that front-end, back-end or no-end -but when I have an itch to scratch, I do like to hack my way through it, sometimes tearing what remains of my hair out and sometimes learning some useful things along the way. This particular venture might best be considered as having nurtured some insidious combination of the two. So without further ado, allow me to present what I have uncovered amidst the fresh-plucked remnants of my follicular outgrowth.

Static Site Generators

Static site generators are web development toolkits that allow you to generate very nice looking web content via easy-to-use formats such as Markdown without having to muck around with databases or other similar dynamic web development horrors (caveat: this does not mean you get to avoid web horrors in their entirety). Since I had initially intended to host my blog with Github Pages and Github Pages use Jekyll as their default static site generator, I stuck with that when I eventually migrated to Gitlab Pages (more on that later). There may be other better options for setting up a research blog though. For instance, Christopher Olah’s machine learning blog uses Hakyll, which is a Haskell library for static site generation. A Python-based static site generator named Pelican is also available, which could potentially simplify the process of creating blog posts from Jupyter notebooks.

Jekyll

Since I have been using Jekyll and will be referencing it throughout this post, I’ll just briefly describe it here. Jekyll is based on Ruby so its workflow involves installing gems when you want to add fun new features to your site. As is proudly touted on the Jekyll homepage, it is possible to get a new Jekyll site up and running in seconds:

~ $ gem install jekyll bundler
~ $ jekyll new my-awesome-site
~ $ cd my-awesome-site
~/my-awesome-site $ bundle exec jekyll serve
# => Now browse to http://localhost:4000

That last bundle exec jekyll serve line is particularly useful, and you will find yourself using it all the time in order to test the current version of your site, or just leaving it running in a terminal for same since Jekyll will update most of your changes on-the-fly.

Themes

There are an abundance of themes available for these site generators (e.g. see here for Jekyll themes), and it is helpful to just pick one early on, fork its repo, and use it to kickstart your blog development process. I used the al-folio theme (see here for a demo), which in turn was based on the *folio theme, and hacked it up for my own purposes.

Free Hosting with Gitlab Pages

When I first decided to put together a blog, I had initially intended to use Github Pages for the task, but I wanted to use a custom domain secured end-to-end with SSL/TLS, something that is not currently possible with Github Pages (you can use Cloudflare for SSL, but it only secures the connection between users and the CloudFlare network, not between Cloudflare and the hosting service, i.e. Github - see here for a discussion) as well as custom Jekyll plugins, so I decided to go with Gitlab Pages instead. There are several other nice advantages to using Gitlab Pages over Github Pages e.g. your choice of static site generators and customizable build processes, and, importantly, there are ways and means of getting around the disadvantages, e.g. slow build times (see the below section on continuous integration). Some of the main differences between them are summarised here and here.

In order to get started with Gitlab Pages, I recommend following this guide from their documentation.

Continuous Integration and Gitlab Runners

Ok, so let’s get our cards on the table here- upon first reading these terms, I was as confused as you probably are. They are buzzwords that feel so abstracted away from the reality of what they might be doing that it is not at all obvious what their purpose is. Let me try to break it down for you as I understand it.

Continuous Integration

Continuous integration (CI), according to Wikipedia, refers to “the practice of merging all developer working copies to a shared mainline several times a day.” However, for our purposes working on Gitlab Pages, what this effectively means is that everytime you push a commit to the Gitlab server, it will rebuild your project. Why? Well, I’ll leave more detailed explanations to the Wikipedia page, Gitlab’s own explanatory effort, or the more procedurally cognisant, but suffice to say that this is a good way of making sure that everything is always working. The most important thing to note is that this process will build and deploy your site. How? In brief, this involves creating a .gitlab-ci.yml file that tells Gitlab Pages how to build your project.

The .gitlab-ci.yml File

This is what my current one looks like:

image: ruby:2.3  # Use Ruby Docker image

cache:  # Add Bundler cache to 'vendor' directory
  paths:
    - vendor/

before_script:  
  # Fix locale settings to stop invalid byte sequence in US-ASCII Jekyll error.
  - apt-get update >/dev/null
  - apt-get install -y locales >/dev/null
  - echo "en_US UTF-8" > /etc/locale.gen
  - locale-gen en_US.UTF-8
  - export LANG=en_US.UTF-8
  - export LANGUAGE=en_US:en
  - export LC_ALL=en_US.UTF-8
  # Install Gems to 'vendor' directory
  - bundle install --path vendor

test:
  stage: test
  script:  # Generate test site(s) into 'test' directory
  - bundle exec jekyll build -d test
  artifacts:  # Save a zipped version for download
    paths:
    - test
  except:  # Execute for all branches except master
  - master

pages:
  stage: deploy
  script:  # Generate public site and deploy
  - JEKYLL_ENV=production bundle exec jekyll build -d public # JEKYLL_ENV used for Google Analytics
  # Use this when creating a new letsencrypt cert,
  # since jekyll adds .html to the file and letsencrypt
  # does not expect a .html extension
  - cp ./public/.well-known/acme-challenge/XXXX.html ./public/.well-known/acme-challenge/XXXX
  artifacts:  # Save a zipped version for download
    paths:
    - public
  only:  # Only deploy the master branch
  - master

There are a few things happening here. Much of the construction of this file was inspired by this Gitlab documentation on .gitlab-ci.yml files, as well as this guide on using Bundler with Jekyll. Bundler is a gem manager for Ruby projects that takes care of gem installation amongst other things. Using Bundler effectively involves setting up a Gemfile that lists the gems that need to be installed for your Ruby project to function. My Gemfile looks like this:

source 'https://rubygems.org'

# Jekyll
gem 'jekyll'

# Added these to get al-folio working
gem 'jekyll-paginate'
gem 'jemoji'
gem 'jekyll-scholar'
gem 'pygments.rb'

# Needed for converting Gravatar to favicons
gem 'rmagick'

We’ll discuss the usefulness of Bundler and the contents of the Gemfile in more detail as we go along, but let’s get back to the .gitlab-ci.yml file for now and try to get an overview of what’s going on there in each of its sections:

The image Specification

This part is really simple - it just tells Gitlab which Ruby Docker image to load. For those who don’t know, Docker is the software containerization tool taking the world by storm that allows you to package your software in a “container”, i.e. a complete filesystem that contains all of the necessary bits and pieces needed for it to run, so that the software can always be run on any machine in the same environment. A nice overview can be found here. In this instance, we’re asking Gitlab Pages to use a Ruby 2.3 Docker image so that we can run Jekyll, which is Ruby-based.

The cache Specification

According to the Gitlab documentation, cache is used to specify a list of files and directories which should be cached between builds. You can only use paths that are within the project workspace.” So here, we tell Gitlab to keep the contents of the vendor directory between builds. Why? Well, as we shall see, we will be instructing Bundler to install gems into the vendor directory, so by caching that directory between builds, we can speed up the build process. Neat.

The before_script Specification

This section dictates what should be done before any build jobs are executed. I currently do two things here. Firstly, I fix some locale settings to avoid a problem caused by UTF-8 characters in the author names of some of my publications. Secondly, and perhaps more importantly for most use-cases, I tell Bundler to install Ruby gems to the vendor directory using the command bundle install --path vendor. As was explained in the previous section, this is an attempt to speed up the build process.

The test Job

This is the first build job definition. It instructs Gitlab to build the site in all branches except for master, place the results in test directories in each branch, and zip up the results for download. This really comes into its own when incorporated into a workflow where you use different branches for writing drafts of blog posts and so on, before merging them into the master branch for deployment. More on this later.

The pages Job

This is where the real magic happens- it is where we deploy the master branch of the project as a Gitlab Pages public site! According to the Gitlab Pages documentation, in order to make use of Gitlab Pages, the following three conditions must be satisfied:

  1. A special job named pages must be defined
  2. Any static content which will be served by GitLab Pages must be placed under a public/ directory
  3. artifacts with a path to the public/ directory must be defined

Since this job carries the name pages, the first condition is already satisfied. The instruction bundle exec jekyll build -d public tells Bundler/Jekyll to build the site in the public/ directory, so that satisfies the second requirement (the command is accompanied by some Google Analytics specifications, but more on that later). The artifacts setup is pretty much the same as in the test job case, and satisfies the third requirement.

That’s it! Once this file has been specified in the Jekyll project root directory and everything is committed and pushed to the Gitlab server, Gitlab will launch “runners” to build the project and deploy the site. And it is these Gitlab Runners that are the subject of the next section.

Gitlab Runners

When Gitlab builds your project during continuous integration, it needs machines to run the builds on. That’s where Gitlab Runners come in. Gitlab Runners are virtual machines that can run on either Gitlab’s own servers, some other server(s) linked to a Gitlab instance, or even your own laptop or other machine. These are categorised as either shared runners or specific runners.

Shared Runners

For most use-cases, these are going to be Gitlab’s own servers, which can be slow at times depending on their workload because they’re used to build the jobs of Gitlab’s other users as well. If you do not have a specific runner set up, then Gitlab will default to using its own shared runners. Good to be able to fall back on, but maybe not an ideal solution.

Specific Runners

Setting up a specific runner, e.g. on your own PC, allows you to dedicate your own resources to your own builds. No more waiting for shared runners on remote servers to queue your project! Caveat: you’ll still need a decent internet connection for speed, because the runner seems to like to ping the Gitlab server constantly during the build. This could of course be avoided if you had your own Gitlab instance running on a separate server, but we’re not dealing with that in this guide, so let’s not worry about it.

Here, I will discuss how to install a Docker specific runner on Ubuntu, but the instructions for other systems/methods are readily available. There are a few different options for executers that provide different ways and means of building a project in the runner. There are good security reasons, amongst other reasons, for using the Docker executer, so we will go along with that. Here is how to go about the runner installation (more detailed instructions can be found here):

# Install Docker
~ $ curl -sSL https://get.docker.com/ | sh
# Add Gitlab's official repo to sources
~ $ curl -L https://packages.gitlab.com/install/repositories/runner/gitlab-ci-multi-runner/script.deb.sh | sudo bash
# Install Gitlab Continuous Integration Multi Runner
~ $ sudo apt-get install gitlab-ci-multi-runner

After that, the specific runner needs to be registered in order to run builds for your project. After entering the sudo gitlab-ci-multi-runner register command as shown below, when prompted for a token, you need to go to the Settings -> Runners section of your Gitlab project page to retrieve the registration token provided in Step 3 under “How to setup a specific Runner for a new project”.

~ $ sudo gitlab-ci-multi-runner register

Please enter the gitlab-ci coordinator URL (e.g. https://gitlab.com )
https://gitlab.com
Please enter the gitlab-ci token for this runner
xxx
Please enter the gitlab-ci description for this runner
my-runner
INFO[0034] fcf5c619 Registering runner... succeeded
Please enter the executor: shell, docker, docker-ssh, ssh?
docker
Please enter the Docker image (eg. ruby:2.1):
ruby:2.1
INFO[0037] Runner registered successfully. Feel free to start it, but if it's
running already the config should be automatically reloaded!

After this, the next time you push a commit to Gitlab’s remote servers, your specific runner should pick up on the build request and build your project locally. Note again: this might still involve heavy network traffic between Gitlab’s servers and your machine, but it might just be a faster build overall.

Using a Custom Domain

You don’t necessarily need to have a custom domain for your site- Gitlab Pages will provide you with a nice URL along the lines of https://barryridge.gitlab.io by default -but a custom domain (e.g. barog.net) is a nice thing to have for various reasons, so I will try to explain how to set one up here. First of all, you will need to choose and register your domain name with a domain name registrar, and for that I would recommend namecheap.com, but there are many other options available.

The Gitlab Pages documentation here and here explains how to set things up from the Gitlab side. This involves going to the Settings -> Pages -> New Domain under your project dashboard and setting an A record pointing to 104.208.235.32 and a CNAME record pointing to username.gitlab.io. But you will still need to adjust your DNS settings on Namecheap so that your domain name points to Gitlab’s servers. David Ensinger provides a nice guide for setting the DNS for Github Pages on Namecheap, but the procedure for Gitlab Pages is not much different. Here is what my setup looks like:

Namecheap Setup

Again, similarly to on Gitlab Pages, you will need to set an A record to point to 104.208.235.32 (see Gitlab documentation here and here) and a CNAME record to point to username.gitlab.io.

Securing Your Site with SSL/TLS and Let’s Encrypt

You may notice when setting up your custom domain that there is a section in your Gitlab Pages project dashboard under Settings -> Pages -> New Domain where you can add an SSL/TLS certificate and its key. But where do you get a certificate? That’s where Let’s Encrypt, the free, automated, and open certificate authority, comes in. Long story short, it allows you to generate your own security certificates for free so that you can have that warm and reassuring HTTPS next to your domain name.

I largely followed this excellent guide to get this going, but I did run into some tricky issues that I will try to help you with here. The first thing that you will need to do is install Let’s Encrypt on your local machine:

~ $ git clone https://github.com/letsencrypt/letsencrypt
~ $ cd letsencrypt

Then you’ll want to use the letsencrypt-auto tool to generate a certificate for your site, or multiple certificates if your site has multiple names, e.g.

./letsencrypt-auto certonly -a manual -d barog.net -d www.barog.net

This will bring up an interface in the terminal (I think it was a blue screen) that asks you to accept your IP being logged, and that outputs something like this after you do:

Make sure your web server displays the following content at
http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX
before continuing:

YYYY

#
# output omitted
#

Press ENTER to continue

Here, I have replaced a generated filename with XXXX and a generated token with YYYY. You should keep this interface open WITHOUT pressing enter, and proceed to set up Jekyll page called letsencrypt-setup.html in your project root directory containing the following:

---
layout: null
permalink: /.well-known/acme-challenge/XXXX
---

YYYY

This will cause your Jekyll site to generate a file called XXXX.html in the public/.well-known/acme-challenge directory when deployed, served at http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX.html. The problem is, the letsencrypt-auto tool will look for the YYYY token at the URL http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX without the .html extension. To fix this, we add a shell copy instruction to .gitlab-ci.yml as follows:

pages:
  stage: deploy
  script:  # Generate public site and deploy
  - JEKYLL_ENV=production bundle exec jekyll build -d public # JEKYLL_ENV used for Google Analytics
  # Use this when creating a new letsencrypt cert,
  # since jekyll adds .html to the file and letsencrypt
  # does not expect a .html extension
  - cp ./public/.well-known/acme-challenge/XXXX.html ./public/.well-known/acme-challenge/XXXX
  artifacts:  # Save a zipped version for download
    paths:
    - public
  only:  # Only deploy the master branch
  - master

Remember: be sure to substitute XXXX and YYYY in the above with the actual strings generated by letsencrypt-auto! Once you’ve pushed the code to the Gitlab servers, you should then be able to test it as follows:

~ $ curl http://YOURDOMAIN.org/.well-known/acme-challenge/XXXX
YYYY

If the string YYYY is returned successfully, then you can return to the letsencrypt-auto tool terminal interface (that you should still have open!) and hit ENTER as instructed. The tool will then check the link just like you did to see if it returns the string, and if it does, it should congratulate you on successfully generating your certificate and you’re free to copy it over to your Gitlab Pages custom domain settings page. First you need to copy the certificate(s) with the following command:

~ $ sudo cat /etc/letsencrypt/live/YOURDOMAIN.org/fullchain.pem

Then you need to navigate to Settings -> Pages in Gitlab Pages, remove the old custom domain names, and add new ones where you paste in the certificate(s) where necessary. I would highly recommend referring to the Gitlab tutorial on securing your page with tls and letsencrypt for more details on all of this if you get stuck.

Adding Google Analytics

To set up Google Analytics, I followed this tutorial for Jekyll. I will not repeat the details here, other than mentioning that an important aspect is that you need to set JEKYLL_ENV=production environment variable ahead of the bundle exec command in your .gitlab-ci.yml file as follows:

JEKYLL_ENV=production bundle exec jekyll build

If you refer to my .gitlab-ci.yml file above, you’ll see that I have included this line.

A Drafting and Publishing Workflow

A solid blog post drafting and publishing workflow might then look something like the following:

# Create and checkout a new branch for a new post
~ $ git checkout -b my-fancy-new-post

# Create a new blog post - don't forget to change the times/dates
~ $ cat > _posts/yyyy-dd-mm-my-fancy-new-post.md
---
layout: post
title: My Fancy New Post
date: yyyy-dd-mm hh:mm:ss+0100
description: An exercise in posting fancily.
comments: true
---

This is my fancy new post!

# Test locally by running the following bundle command
# and navigating to http://localhost:4000/blog/yyyy/my-fancy-new-post
# (this might differ depending on your theme)
~ $ bundle exec jekyll serve -d public

# Add, commit and push the draft
~ $ git add _posts/yyyy-dd-mm-my-fancy-new-post.md
~ $ git commit -a -m 'My fancy new commit message.'
~ $ git push

# Check the Gitlab build pipeline to see if the build was successful
# or wait for Gitlab to e-mail you.

# Finish the draft
~ $ cat >> _posts/yyyy-dd-mm-my-fancy-new-post.md
All hail!

# Commit and push the changes
~ $ git commit -a -m 'Finished my fancy new post.'
~ $ git push

# Check the Gitlab build pipeline to see if the build was successful
# or wait for Gitlab to e-mail you.

# Merge with master
~ $ git checkout master
~ $ git merge my-fancy-new-post
~ $ git push

# Check the Gitlab build pipeline to see if the build was successful
# or wait for Gitlab to e-mail you.

# Your post should now be publically available!

Coming Soon!

As you have probably already figured out by now, setting up one of these blog thingies can be a deceptively complex process. I could go on and on writing about what I’ve had to do to get to this point, but I wanted to get something out there (like, you know, an actual blog post!), so I’ve decided to stop here for now. I would still like to write about some other things in relation to this journey at some point in the future though, so I will leave some placeholders here to give you a taste of what is, hopefully, to come.

Leveraging Bower, npm and Grunt for Package Management

The package managers Bower and npm, as well as the automation tool Grunt, are extremely useful doohickeys to have in your toolkit. Torsten Scholak has written an excellent post on how to best make use of them with Jekyll and Github Pages over on his Meticulous Disorder blog. There really isn’t too much difference when employing them on Gitlab Pages.

Creating Blog Posts from Jupyter Notebooks

I haven’t actually tried this out yet, but there is a nice post available here detailing how it might be acheived. This could be a very nice addition to the workflow of writing a research blog, so I’m hoping for good things here.

Be seeing you!