2020-06-21

Website creation with Pandoc

2021-05-06: Updated paragraph about hosting.

2021-05-08: Added usage of citations and open graph protocol.

2024-04-06: Added an exemplary workflow for markdown to html conversion.

As you can see, web design is none of my hobbies.

Also, this side is not responsive. While responsive sites that adjust to devices with smaller screens can be done with pandoc, I didn’t want to spend the time for this.

1 Exemplary project workflow

If you want to see an example of how this website is generated, you can find the code on GitHub: pandoc-md2html-example

2 Why create a website in 2020

I miss the days where random people would put stuff on their own website instead of Instagram. The webdesign was pretty bad most of the time but that didn’t matter because it had something more personal compared to everybody using the same service.

Also I wonder who is still reading stuff on the internet. Sometimes I get the feeling that everybody is just watching videos and that really a lot of content is presented as a video, even if the exact same information and sturcture is first created as a text, e.g. a tutorial for some software module.

3 Markdown to html with pandoc

I wanted to spend as little time as possible with styling the website. This is why is use pandoc to first write every page of this website in markdown (.md) and then convert it to a browser readable format (.html).

The styling is then handled by a template file I designed once which can always be reused for new pages.

While the table of contents for each page is created from an individual markdown file automatically, the navigation bar was created only once in html in the template file which was then read by pandoc and applied to every resulting page.

Which section (e.g. Home, Projects, …) a page belongs to, can be specified in the yaml data at the beginning of the markdown document:

---
# standard attributes like title and date here at the top.

# custom attributes specific for my template file:
# determines whether the page should show a table of contents
toc: true
# select which part of the navigation bar should be active
navbar_proj: true
---

These variables are optional and placed at the beginning of every markdown document. I use them to alter the .css classes that are associated with an element which changes the look of a page. Pandoc does this by encapsulating a string (optional text) that should only added to the html document when a variable var is set:

${if(var)} optional text ${endif}

All of the buttons in the navigation bar belong to the class nav_item but only some of them also have the class nav_item_active.

<div class="navbar">
  <div href="home.html" class="nav_item ${if(navbar_home)}nav_item_active${endif}">
    <a href="home.html">Home</a>
  </div>
  <!-- more navigation bar items here -->
</div>

Variables that are not defined in the document’s header and not given by pandoc evaluate to false instead of giving an error.

3.2 Table of Contents

The table of contents can be auto-generated by pandoc depending on the structure of the Markdown file that was used as an input. This was relatively straight forward, I only disabled the dot that pandoc would place in front of each entry in the page’s style sheet: (but only for the dots (i.e. markers) in the table of contents)

.table_of_contents li::marker {
    color: transparent;
}

3.3 Citations

Some articles use external resources that I want to link at the bottom of a page and pandoc already comes with support for bibliography processing.

Note: Using citations requires pandoc version 2.11 or newer. You can check your version with pandoc --version.

First, you need to provide a bibliography. This can either be a separate file, or, as I’ve done it, can be part of the yaml section at the beginning of a markdown file. You can include an attribute titled references under which you list all of the entries of your bibliography. A short example is given in the code section below.

# [ ... Omitting the beginning of the config for readability  ... ]

references:
- id: VaswaniShazeer2017
  author:
  - family: Vaswani
    given: A.
  - family: Shazeer
    given: N.
  issued: 2017
  title: 'Attention Is All You Need'
  URL: http://arxiv.org/abs/1706.03762

link-citations: true

The link-citations setting is optional and if it is set, the links to a citation will be clickable hyperlinks in your website.

Pandoc requires a specific flag so that it know that it should process bibliography as well. Just add the --citeproc option when calling pandoc.

pandoc <remaining args stay untouched> --citeproc
# e.g.
pandoc document.md --from=markdown --to=html --citeproc

The list of references will be added at the bottom of the document. If you’re using an HTML template (--template=my_template.html), the list of references is added as the last element in the $body$ variable. This is why you might want to end your markdown file (document.md) with a heading for references, e.g.:

... Contents of the document up here ...

# References

< pandoc will add the list of references to the end of your document.md file >

The only thing that’s left to do is to actually cite your bibliography in the markdown file. This is easily done by writing an @ sign directly followed by the id you gave to your reference.

Writing [@VaswaniShazeer2017] in your markdown document will result in (Vaswani and Shazeer 2017) in the rendered output. Notice that the authors are mentioned in the citation as well. If you only want to use the publication year, writing [-@VaswaniShazeer2017] will result in (2017).

All of the citations are clickable links, that will scroll to the bottom of the page where the full list of citations is located. If you want to suppress this behavior, set link-citations: false in the markdown document’s yaml header.

More info on formation formatting is available in pandoc’s manual.

3.4 Open Graph Protocol

The open graph protocol defines a few standard that enable services like slack, whatsapp or facebook to include a preview section if a link is shared in a chat. This section usually consists of the title and an image, which is more visually appealing than just sharing a link.

Chat preview of a shared link. The link was shared along with some text and the preview was generated by the server of the chat provider.

To enable the chat provider to generate such previews, your website’s html header needs to include certain properties.

<html prefix="og: https://ogp.me/ns#">
<head>
  <meta property="og:title" content="$pagetitle$" />
  <meta property="og:type" content="website" />
  <meta property="og:url" content="https://montebaur.tech/$document_path$" />
  <meta property="og:image" content="https://montebaur.tech/media/m_text_header.png" />

  <!-- ... -->

The first addition (<html prefix...) tells the server that’s accessing this file in which format the meta information are specified. As stated earlier, we’re using the open graph protocol here.

The four meta tags inside the document’s head are used to determine which information is used in the preview.

I’ve included these additions to my page’s html source inside the html template file I’m giving to pandoc. This is why there are variables ($varName$) in the listing above.

When the file is processed by pandoc, $pagetitle$ will be replaced by the title of the article and $document_path$ will be replaced by the path to that html file relative to the website’s root folder.

The latter one is not a standard pandoc variable. I give it to pandoc as an argument and pandoc can then replace it in the template:

pandoc --variable=document_path:"projects/keypoint_detection.html"

Of course I’m not generating the pandoc conversions by hand, but I’m using a bash script where the call to pandoc then includes those lines:

pandoc --variable=document_path:"$filepath/$filename" <other args>

An additional step that’s on my TODO-list, is generating distinct preview images for each article. I’m already using preview images in the project list, but I’ve not yet used them as a link preview as well.

4 Hosting

The website is currently hosted on Uberspace but was hosted on AWS until April 2021. See the last section for details.

4.1 Hosting with Uberspace

Uberspace is a hosting provider based in Germany. You can rent an account on a server which you can then use to do whatever you like. At the moment, I’m only hosting the website, but you could also set up your own git server, or a lot of other stuff which is described in their setup guides.

I became aware of their services because they are hosting the website of youtube-dl, a popular command line tool that allows downloading videos from YouTube and other pages that don’t use DRM protection on their videos. While some actors in the music industry tried to take down youtube-dl in the past, Uberspace didn’t back down, left youtube-dl’s website online and was faced with a lawsuit in Germany [github issue, netzpolitik article].

4.2 Hosting on AWS

This website was hosted on AWS where I rented an EC2 instance, which was a small linux server in my case. The setup process mostly aligns with the steps shown in this video by Luke Smith.

Make sure to configure nginx and certbot as root which is not the default user when logging in on AWS Ubuntu instances. Also pay attention to the way the ssh access is configured. Luke accidentally edits the wrong file. But if you use AWS it will be preconfigured anyways.

References

A list of exemplary references that were used to explain how citations in pandoc work.

Vaswani, A., and N. Shazeer. 2017. “Attention Is All You Need.” http://arxiv.org/abs/1706.03762.