NOTE: this post may be subject to edits as I learn more and add more to the blog.
Towards the end of last year, I decided to set up this blog to document and motivate my research progress, and since it turned into a more involved task than I originally anticipated, I thought I’d better write something about the process here in order to potentially aid other budding blogeurs. Let me preface this post by stating at the outset that I am by no means a web expert- be that front-end, back-end or no-end -but when I have an itch to scratch, I do like to hack my way through it, sometimes tearing what remains of my hair out and sometimes learning some useful things along the way. This particular venture might best be considered as having nurtured some insidious combination of the two. So without further ado, allow me to present what I have uncovered amidst the fresh-plucked remnants of my follicular outgrowth.
- Static Site Generators
- Free Hosting with Gitlab Pages
- Using a Custom Domain
- Securing Your Site with SSL/TLS and Let’s Encrypt
- Adding Google Analytics
- A Drafting and Publishing Workflow
- Coming Soon!
Static Site Generators
Static site generators are web development toolkits that allow you to generate very nice looking web content via easy-to-use formats such as Markdown without having to muck around with databases or other similar dynamic web development horrors (caveat: this does not mean you get to avoid web horrors in their entirety). Since I had initially intended to host my blog with Github Pages and Github Pages use Jekyll as their default static site generator, I stuck with that when I eventually migrated to Gitlab Pages (more on that later). There may be other better options for setting up a research blog though. For instance, Christopher Olah’s machine learning blog uses Hakyll, which is a Haskell library for static site generation. A Python-based static site generator named Pelican is also available, which could potentially simplify the process of creating blog posts from Jupyter notebooks.
Since I have been using Jekyll and will be referencing it throughout this post, I’ll just briefly describe it here. Jekyll is based on Ruby so its workflow involves installing gems when you want to add fun new features to your site. As is proudly touted on the Jekyll homepage, it is possible to get a new Jekyll site up and running in seconds:
bundle exec jekyll serve line is particularly useful,
and you will find yourself using it all the time in order to test
the current version of your site, or just leaving it running
in a terminal for same since Jekyll will update most of your changes
There are an abundance of themes available for these site generators (e.g. see here for Jekyll themes), and it is helpful to just pick one early on, fork its repo, and use it to kickstart your blog development process. I used the al-folio theme (see here for a demo), which in turn was based on the *folio theme, and hacked it up for my own purposes.
Free Hosting with Gitlab Pages
When I first decided to put together a blog, I had initially intended to use Github Pages for the task, but I wanted to use a custom domain secured end-to-end with SSL/TLS, something that is not currently possible with Github Pages (you can use Cloudflare for SSL, but it only secures the connection between users and the CloudFlare network, not between Cloudflare and the hosting service, i.e. Github - see here for a discussion) as well as custom Jekyll plugins, so I decided to go with Gitlab Pages instead. There are several other nice advantages to using Gitlab Pages over Github Pages e.g. your choice of static site generators and customizable build processes, and, importantly, there are ways and means of getting around the disadvantages, e.g. slow build times (see the below section on continuous integration). Some of the main differences between them are summarised here and here.
In order to get started with Gitlab Pages, I recommend following this guide from their documentation.
Continuous Integration and Gitlab Runners
Ok, so let’s get our cards on the table here- upon first reading these terms, I was as confused as you probably are. They are buzzwords that feel so abstracted away from the reality of what they might be doing that it is not at all obvious what their purpose is. Let me try to break it down for you as I understand it.
Continuous integration (CI), according to Wikipedia,
“the practice of merging all developer working copies to a shared mainline several times a day.”
However, for our purposes working on Gitlab Pages, what this effectively means is
that everytime you push a commit to the Gitlab server, it will rebuild your project.
Why? Well, I’ll leave more detailed explanations to the Wikipedia page,
Gitlab’s own explanatory effort, or
the more procedurally cognisant, but suffice to say that this is a good way of making sure
that everything is always working.
The most important thing to note is that this process will build and deploy your site.
How? In brief, this involves creating a
.gitlab-ci.yml file that
tells Gitlab Pages how to build your project.
This is what my current one looks like:
There are a few things happening here. Much of the construction of
this file was inspired by
this Gitlab documentation on .gitlab-ci.yml files,
as well as this guide on using Bundler with Jekyll.
Bundler is a gem manager for Ruby projects that takes
care of gem installation amongst other things.
Using Bundler effectively involves setting up a
Gemfile that lists
the gems that need to be installed for your Ruby project to function.
My Gemfile looks like this:
We’ll discuss the usefulness of Bundler and the contents
Gemfile in more detail as we go along,
but let’s get back to the
.gitlab-ci.yml file for now and
try to get an overview of what’s going on there in each of
This part is really simple - it just tells Gitlab which Ruby Docker image to load. For those who don’t know, Docker is the software containerization tool taking the world by storm that allows you to package your software in a “container”, i.e. a complete filesystem that contains all of the necessary bits and pieces needed for it to run, so that the software can always be run on any machine in the same environment. A nice overview can be found here. In this instance, we’re asking Gitlab Pages to use a Ruby 2.3 Docker image so that we can run Jekyll, which is Ruby-based.
According to the Gitlab documentation,
cache is used to specify a list of files and directories which should be cached between builds.
You can only use paths that are within the project workspace.”
So here, we tell Gitlab to keep the contents of the
between builds. Why?
Well, as we shall see, we will be instructing Bundler to install gems into the
vendor directory, so by caching that directory between builds,
we can speed up the build process. Neat.
This section dictates what should be done before any build jobs are executed.
I currently do two things here.
Firstly, I fix some locale settings to avoid a problem caused by UTF-8 characters
in the author names of some of my publications.
Secondly, and perhaps more importantly for most use-cases,
I tell Bundler to install Ruby gems to the
vendor directory using the command
bundle install --path vendor.
As was explained in the previous section, this is an attempt to speed up the build process.
This is the first build job definition.
It instructs Gitlab to build the site in all branches except for
place the results in
test directories in each branch, and zip up the results
This really comes into its own when incorporated into a workflow where
you use different branches for writing drafts of blog posts and so on,
before merging them into the
master branch for deployment.
More on this later.
This is where the real magic happens- it is where we deploy the
branch of the project as a Gitlab Pages public site!
According to the Gitlab Pages documentation,
in order to make use of Gitlab Pages, the following three conditions must be satisfied:
- A special job named
pagesmust be defined
- Any static content which will be served by GitLab Pages must be placed under a
artifactswith a path to the
public/directory must be defined
Since this job carries the name
pages, the first condition is already satisfied.
bundle exec jekyll build -d public tells Bundler/Jekyll to
build the site in the
public/ directory, so that satisfies the second requirement
(the command is accompanied by some Google Analytics specifications, but more on that later).
artifacts setup is pretty much the same as in the
test job case, and
satisfies the third requirement.
That’s it! Once this file has been specified in the Jekyll project root directory and everything is committed and pushed to the Gitlab server, Gitlab will launch “runners” to build the project and deploy the site. And it is these Gitlab Runners that are the subject of the next section.
When Gitlab builds your project during continuous integration, it needs machines to run the builds on. That’s where Gitlab Runners come in. Gitlab Runners are virtual machines that can run on either Gitlab’s own servers, some other server(s) linked to a Gitlab instance, or even your own laptop or other machine. These are categorised as either shared runners or specific runners.
For most use-cases, these are going to be Gitlab’s own servers, which can be slow at times depending on their workload because they’re used to build the jobs of Gitlab’s other users as well. If you do not have a specific runner set up, then Gitlab will default to using its own shared runners. Good to be able to fall back on, but maybe not an ideal solution.
Setting up a specific runner, e.g. on your own PC, allows you to dedicate your own resources to your own builds. No more waiting for shared runners on remote servers to queue your project! Caveat: you’ll still need a decent internet connection for speed, because the runner seems to like to ping the Gitlab server constantly during the build. This could of course be avoided if you had your own Gitlab instance running on a separate server, but we’re not dealing with that in this guide, so let’s not worry about it.
Here, I will discuss how to install a Docker specific runner on Ubuntu, but the instructions for other systems/methods are readily available. There are a few different options for executers that provide different ways and means of building a project in the runner. There are good security reasons, amongst other reasons, for using the Docker executer, so we will go along with that. Here is how to go about the runner installation (more detailed instructions can be found here):
After that, the specific runner needs to be registered in order to run builds
for your project. After entering the
sudo gitlab-ci-multi-runner register
command as shown below, when prompted for a token, you need
to go to the
Settings -> Runners section of your Gitlab project page
to retrieve the registration token provided in Step 3 under
“How to setup a specific Runner for a new project”.
After this, the next time you push a commit to Gitlab’s remote servers, your specific runner should pick up on the build request and build your project locally. Note again: this might still involve heavy network traffic between Gitlab’s servers and your machine, but it might just be a faster build overall.
Using a Custom Domain
You don’t necessarily need to have a custom domain for your site-
Gitlab Pages will provide you with a nice URL along the lines of
https://barryridge.gitlab.io by default -but a custom domain
(e.g. barog.net) is a nice thing to have
for various reasons, so I will try to explain how to set one up here.
First of all, you will need to choose and register your domain name with
a domain name registrar, and for that I would recommend
namecheap.com, but there are many other
The Gitlab Pages documentation here
explains how to set things up from the Gitlab side.
This involves going to the
Settings -> Pages -> New Domain under your
project dashboard and setting an
A record pointing to
CNAME record pointing to
But you will still need to adjust your DNS settings on Namecheap so that your domain name
points to Gitlab’s servers.
David Ensinger provides a nice guide
for setting the DNS for Github Pages on Namecheap, but the procedure
for Gitlab Pages is not much different.
Here is what my setup looks like:
Securing Your Site with SSL/TLS and Let’s Encrypt
You may notice when setting up your custom domain that
there is a section in your Gitlab Pages project dashboard
Settings -> Pages -> New Domain where you
can add an SSL/TLS certificate and its key.
But where do you get a certificate?
That’s where Let’s Encrypt,
the free, automated, and open certificate authority, comes in.
Long story short, it allows you to generate your own security
certificates for free so that you can have that warm and
HTTPS next to your domain name.
I largely followed this excellent guide to get this going, but I did run into some tricky issues that I will try to help you with here. The first thing that you will need to do is install Let’s Encrypt on your local machine:
Then you’ll want to use the
letsencrypt-auto tool to generate
a certificate for your site, or multiple certificates if your site
has multiple names, e.g.
This will bring up an interface in the terminal (I think it was a blue screen) that asks you to accept your IP being logged, and that outputs something like this after you do:
Here, I have replaced a generated filename with XXXX and a
generated token with YYYY.
You should keep this interface open WITHOUT pressing enter,
and proceed to set up Jekyll page called
in your project root directory containing the following:
This will cause your Jekyll site to generate a file called
public/.well-known/acme-challenge directory when deployed, served at
The problem is, the
letsencrypt-auto tool will look for the
at the URL
To fix this, we add a shell copy instruction to
.gitlab-ci.yml as follows:
Remember: be sure to substitute XXXX and YYYY in the above with the
actual strings generated by
Once you’ve pushed the code to the Gitlab servers, you should then
be able to test it as follows:
If the string YYYY is returned successfully, then you can return to
letsencrypt-auto tool terminal interface (that you should still have open!)
ENTER as instructed.
The tool will then check the link just like you did to see if it returns the string,
and if it does, it should congratulate you on successfully generating your
certificate and you’re free to copy it over to your Gitlab Pages custom domain settings page.
First you need to copy the certificate(s) with the following command:
Then you need to navigate to
Settings -> Pages in Gitlab Pages,
remove the old custom domain names, and add new ones where
you paste in the certificate(s) where necessary.
I would highly recommend referring to
the Gitlab tutorial on securing your page with tls and letsencrypt
for more details on all of this if you get stuck.
Adding Google Analytics
To set up Google Analytics,
I followed this tutorial
for Jekyll. I will not repeat the details here, other than
mentioning that an important aspect is that you need to
JEKYLL_ENV=production environment variable ahead of the
bundle exec command in your
.gitlab-ci.yml file as follows:
If you refer to my .gitlab-ci.yml file above, you’ll see that I have included this line.
A Drafting and Publishing Workflow
A solid blog post drafting and publishing workflow might then look something like the following:
As you have probably already figured out by now, setting up one of these blog thingies can be a deceptively complex process. I could go on and on writing about what I’ve had to do to get to this point, but I wanted to get something out there (like, you know, an actual blog post!), so I’ve decided to stop here for now. I would still like to write about some other things in relation to this journey at some point in the future though, so I will leave some placeholders here to give you a taste of what is, hopefully, to come.
Leveraging Bower, npm and Grunt for Package Management
The package managers Bower and npm, as well as the automation tool Grunt, are extremely useful doohickeys to have in your toolkit. Torsten Scholak has written an excellent post on how to best make use of them with Jekyll and Github Pages over on his Meticulous Disorder blog. There really isn’t too much difference when employing them on Gitlab Pages.
Creating Blog Posts from Jupyter Notebooks
I haven’t actually tried this out yet, but there is a nice post available here detailing how it might be acheived. This could be a very nice addition to the workflow of writing a research blog, so I’m hoping for good things here.