Linking articles to their version control system to effectively track changes

This website has been a super fun side-project, but there's a lot of room for improvement. For example, even though its source code is publicly available, it's not quite visible. I intend to finally fix that, and this post is going to cover that experience in detail.

We're going to tackle blog posts, a category of articles where it's pivotal to remain transparent by tracking and divulging any alterations we might make to them over time.

As this website is source-controlled, the most passive and efficient way to solve this transparency problem, is to utilize my version control system, which in my case is git… the "stupid content tracker".

There's no denying that git (or any other VCS) happens to be more effective at tracking changes than you, or me. This blog post is a proposition (as well as a guide) for writers to cease the manual writing of changelogs, and simply let our version control systems take care of this task for us.

I'm going to be building on top of ox-html, an org-mode library, because my website is built upon the tools provided by the Emacs ecosystem. Before we dive into the technicalities, you should familiarize yourself with the exportation and publishing processes that a file submits to when instructed by the user.

1. First Implementation

We'll need to implement a format string, which is recognized by ox-html as a placeholder for something. We can take the %a and %e format strings for example, which are translated/expanded respectively into the author's name and email.

But how do we define our own format strings?

A tiny bit of research landed me on format-spec, which lets us create them, however many we want!

How does ox-html utilize this function?

If we take a peak at its source code, we find a few references to format-spec – but the one that's most interesting to us is org-html-format-spec, which as of Org Mode v9.5.5 looks like this:

(defun org-html-format-spec (info)
  "Return format specification for preamble and postamble.
INFO is a plist used as a communication channel."
  (let ((timestamp-format (plist-get info :html-metadata-timestamp-format)))
    `((?t . ,(org-export-data (plist-get info :title) info))
      (?s . ,(org-export-data (plist-get info :subtitle) info))
      (?d . ,(org-export-data (org-export-get-date info timestamp-format)
                              info))
      (?T . ,(format-time-string timestamp-format))
      (?a . ,(org-export-data (plist-get info :author) info))
      (?e . ,(mapconcat
              (lambda (e) (format "<a href=\"mailto:%s\">%s</a>" e e))
              (split-string (plist-get info :email)  ",+ *")
              ", "))
      (?c . ,(plist-get info :creator))
      (?C . ,(let ((file (plist-get info :input-file)))
               (format-time-string timestamp-format
                                   (and file (file-attribute-modification-time
                                              (file-attributes file))))))
      (?v . ,(or (plist-get info :html-validation-link) "")))))

Whatever format string we specify within this alist, is going to be recognized by the library and handed over to org-html--build-pre/postamble and later expanded to its associated S-expression.

Before we do anything we might regret, it's important that we inspect how GitHub, in my case, references source files. Let's dissect the source URL of my first ever blog, "Building a website with Emacs".

Full form: https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-05-18.org.

  • https://github.com, we all know GitHub.
  • grtcdr/grtcdr.tn is the repository's identifier.
  • blob to denote source code or data in general.
  • main refers to the main branch.
  • posts is the directory that contains, well… my posts.
  • 2022-05-18.org is the filename we're after.

That's all that we need, if someone were to inspect the changes that I made to any given post, they can just view the file's history.

Okay, let's start hacking!

The objective is to redefine org-html-format-spec; orchestrate.el as its name might suggest, will orchestrate and define the structure the various components required to build the website; it does this by providing the website specification at the time of document transpilation.

The first step is to define of the source URL prefix.

(defvar blog-post-source-url-prefix
  "https://github.com/grtcdr/grtcdr.tn/blob/main/posts")

The second is to create a function which will determine the current buffer's base file name, i.e. just the file name of the active blog post (to be exported).

(defun blog-post-file-name ()
    (concat (file-name-base (buffer-file-name)) ".org"))

The last step is to combine these two into a single function that returns the whole URL that anyone can visit.

(defun blog-post-source-url ()
  (format "%s/%s"
          blog-post-source-url-prefix
          (blog-post-file-name)))

Just to make sure we're on the right track, I'm going to take this function for a test drive. Let's visit a random blog post inside of Emacs, e.g. "Extending project.el with to-do functionality" [2022-08-08.org]:

  • Hit M-:, type (blog-post-source-url) and hit Return.

Neat, I get back https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-10-08.org. Just what I'm after!

We'll need to embed this within some HTML, format to the rescue!

(format "<a href=%s>Source</a>" (blog-post-source-url)) ; <a href="https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-10-08.org">Source</a>

Let's add that bit of code to our redefined org-html-format-spec, we'll associate this function with our new %S format string:

(defun org-html-format-spec (info)
  "Return format specification for preamble and postamble.
INFO is a plist used as a communication channel."
  (let ((timestamp-format (plist-get info :html-metadata-timestamp-format)))
    `((?t . ,(org-export-data (plist-get info :title) info))
      (?s . ,(org-export-data (plist-get info :subtitle) info))
      (?S . ,(format "<a href=%s>Source</a>" (blog-post-source-url))) ; <--  right here!
      (?d . ,(org-export-data (org-export-get-date info timestamp-format)
                              info))
      (?T . ,(format-time-string timestamp-format))
      (?a . ,(org-export-data (plist-get info :author) info))
      (?e . ,(mapconcat
              (lambda (e) (format "<a href=\"mailto:%s\">%s</a>" e e))
              (split-string (plist-get info :email)  ",+ *")
              ", "))
      (?c . ,(plist-get info :creator))
      (?C . ,(let ((file (plist-get info :input-file)))
               (format-time-string timestamp-format
                                   (and file (file-attribute-modification-time
                                              (file-attributes file))))))
      (?v . ,(or (plist-get info :html-validation-link) "")))))

And let's add the format string to our HTML preamble snippet:

<ul class="navigation">
  <div>
    <li><a href="/index.html">Home</a></li>
    <li><a href="/contact.html">Contact</a></li>
    <li><a href="/data/resume.pdf">Résumé</a></li>
  </div>
</ul>

<p class="metadata">%d by %a. (%S)</p> <!-- lookie here! -->

Hurray! That works!… until it doesn't.

2. Second Implementation

Not only do I host a blog on this website, but also numerous documentation files, within which is the content of my system's configuration files (dotfiles), and it doesn't help that they're hosted on an entirely different website, i.e. SourceHut.

We have to somehow address this situation. We need to make this solution more modular, so that it can support these two different forges and use cases.

So let's start by storing these forges in a property list.

(defvar forges
  '(:github "github.com" :sourcehut "git.sr.ht")
  "Property list mapping git forges to their respective domain.")

We'll write a function that will incrementally construct the prefix URL of any - yes, any - resource, once it matches it against one of our predefined forges.

(defun build-forge-prefix-url (forge slug type)
  "Construct the standard URL of a given FORGE by specifying
the repository SLUG and the TYPE of information to access.

FORGE is a property from the ’forges’ variable.

SLUG is a string and the combination of your username and the
name of your repository, e.g. \"octopus/website\".

TYPE can take a value of ’log’ or ’tree’."
  (cond ((equal forge :github)
         (format "https://%s/%s/%s/"
                 (plist-get forges :github)
                 slug
                 (cond ((eq type 'log) "commits/main")
                       ((eq type 'tree) "blob/main")
                       (t (error "Invalid type.")))))
        ((equal forge :sourcehut)
         (format "https://%s/%s/%s/"
                 (plist-get forges :sourcehut)
                 (concat "~" slug)
                 (cond ((eq type 'log) "log/main/item")
                       ((eq type 'tree) "tree/main/item")
                       (t (error "Invalid type.")))))))

Let's run a few examples to understand how it works.

  1. Return the URL pointing to the history of changes of a resource hosted on GitHub.

       (build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'log)
       ;; https://github.com/grtcdr/grtcdr.tn/commits/main/ 
    
  2. Return the URL pointing to the source code of a resource hosted on SourceHut.

       (build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'tree)
       ;; https://git.sr.ht/~grtcdr/dotfiles/tree/main/item/ 
    

Wonderful, the function covers whatever forge we throw at it! Let's move on.

Do you remember blog-post-source-url? Well, that'll break if the resource lives within a submodule. So we'll need to make that more modular.

Alright, what can we do to obtain the slug of a resource, whilst taking into account this new setting?… We can make use of vc, a built-in library and interface dedicated entirely to version control systems.

We can use vc-root-dir… but for some reason that won't work when we publish the project from a proximity, i.e. through a Makefile; however, we can do with vc-find-root. This function requires that we specify the buffer filename, as well as a "witness" i.e. a pattern to match against when performing the search (for the project root), e.g. .git.

Here's what I came up with:

(defun get-resource-slug ()
  "Determines the path of a resource relative to the value
returned by ’build-forge-prefix-url'"
  (let* ((buffer (buffer-file-name))
         (root (or (vc-find-root buffer (regexp-opt '(".git" ".hg")))
                   (project-root (project-current)))))
    (string-remove-prefix
     (expand-file-name root) buffer)))

We're done with the new implementation; we should interact with the new functions the same way we did with the older ones. For example, have a look at the format strings used in this website:

  • This format string expands to a link to the source code of a blog post hosted on GitHub.

      (?w . (format
           "<a href=%s>source</a>"
           (concat
            (build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'tree)
            (get-resource-slug))))
    
  • This one expands to a link to the list of revisions of a blog post hosted on GitHub.

      (?x . (format
             "<a href=%s>history</a>"
             (concat
              (build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'log)
              (get-resource-slug))))
    
  • While this one expands to a link to the list of revisions of a documentation file hosted on SourceHut.

      (?y . (format
             "<a href=%s>source</a>"
             (concat
              (build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'tree)
              (get-resource-slug))))
    
  • And this one expands to a link to the list of revisions of a documentation file hosted on SourceHut.

      (?z . (format
             "<a href=%s>history</a>"
             (concat
              (build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'log)
              (get-resource-slug))))
    

3. Conclusion

We did it… We hacked together a set of functions and scratched the itch for transparency. I didn't expect this task to be so trivial, and can I be honest with you? I've been postponing working on this feature for so long. I just didn't know where to look or where to begin.

But in the end, I learned a few things:

  • Elisp is not as scary as it looks.
  • Org Mode is well designed and documented, as is the rest of Emacs.
  • I'm starting to profit off of my choosing ox-publish as a static site builder.