Linking articles to their version control system to effectively track changes
This website has been a super fun side-project, but there's a lot of room for improvement. For example, even though its source code is publicly available, it's not quite visible. I intend to finally fix that, and this post is going to cover that experience in detail.
We're going to tackle blog posts, a category of articles where it's pivotal to remain transparent by tracking and divulging any alterations we might make to them over time.
As this website is source-controlled, the most passive and efficient way to solve this transparency problem, is to utilize my version control system, which in my case is git… the "stupid content tracker".
There's no denying that git (or any other VCS) happens to be more effective at tracking changes than you, or me. This blog post is a proposition (as well as a guide) for writers to cease the manual writing of changelogs, and simply let our version control systems take care of this task for us.
I'm going to be building on top of ox-html, an org-mode library, because my website is built upon the tools provided by the Emacs ecosystem. Before we dive into the technicalities, you should familiarize yourself with the exportation and publishing processes that a file submits to when instructed by the user.
1. First Implementation
We'll need to implement a format string, which is recognized by
ox-html
as a placeholder for something. We can take the %a
and
%e
format strings for example, which are translated/expanded
respectively into the author's name and email.
But how do we define our own format strings?
A tiny bit of research landed me on format-spec, which lets us create them, however many we want!
How does ox-html
utilize this function?
If we take a peak at its source code, we find a few references to
format-spec
– but the one that's most interesting to us is
org-html-format-spec
, which as of Org Mode v9.5.5 looks like this:
(defun org-html-format-spec (info) "Return format specification for preamble and postamble. INFO is a plist used as a communication channel." (let ((timestamp-format (plist-get info :html-metadata-timestamp-format))) `((?t . ,(org-export-data (plist-get info :title) info)) (?s . ,(org-export-data (plist-get info :subtitle) info)) (?d . ,(org-export-data (org-export-get-date info timestamp-format) info)) (?T . ,(format-time-string timestamp-format)) (?a . ,(org-export-data (plist-get info :author) info)) (?e . ,(mapconcat (lambda (e) (format "<a href=\"mailto:%s\">%s</a>" e e)) (split-string (plist-get info :email) ",+ *") ", ")) (?c . ,(plist-get info :creator)) (?C . ,(let ((file (plist-get info :input-file))) (format-time-string timestamp-format (and file (file-attribute-modification-time (file-attributes file)))))) (?v . ,(or (plist-get info :html-validation-link) "")))))
Whatever format string we specify within this alist, is going to be
recognized by the library and handed over to
org-html--build-pre/postamble
and later expanded to its associated
S-expression.
Before we do anything we might regret, it's important that we inspect how GitHub, in my case, references source files. Let's dissect the source URL of my first ever blog, "Building a website with Emacs".
Full form: https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-05-18.org
.
https://github.com
, we all know GitHub.grtcdr/grtcdr.tn
is the repository's identifier.blob
to denote source code or data in general.main
refers to themain
branch.posts
is the directory that contains, well… my posts.2022-05-18.org
is the filename we're after.
That's all that we need, if someone were to inspect the changes that I made to any given post, they can just view the file's history.
Okay, let's start hacking!
The objective is to redefine org-html-format-spec
; orchestrate.el as
its name might suggest, will orchestrate and define the structure the
various components required to build the website; it does this by
providing the website specification at the time of document
transpilation.
The first step is to define of the source URL prefix.
(defvar blog-post-source-url-prefix "https://github.com/grtcdr/grtcdr.tn/blob/main/posts")
The second is to create a function which will determine the current buffer's base file name, i.e. just the file name of the active blog post (to be exported).
(defun blog-post-file-name () (concat (file-name-base (buffer-file-name)) ".org"))
The last step is to combine these two into a single function that returns the whole URL that anyone can visit.
(defun blog-post-source-url () (format "%s/%s" blog-post-source-url-prefix (blog-post-file-name)))
Just to make sure we're on the right track, I'm going to take this function for a test drive. Let's visit a random blog post inside of Emacs, e.g. "Extending project.el with to-do functionality" [2022-08-08.org]:
- Hit
M-:
, type(blog-post-source-url)
and hitReturn
.
Neat, I get back https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-10-08.org
. Just what I'm after!
We'll need to embed this within some HTML, format
to the rescue!
(format "<a href=%s>Source</a>" (blog-post-source-url)) ; <a href="https://github.com/grtcdr/grtcdr.tn/blob/main/posts/2022-10-08.org">Source</a>
Let's add that bit of code to our redefined org-html-format-spec
,
we'll associate this function with our new %S
format string:
(defun org-html-format-spec (info) "Return format specification for preamble and postamble. INFO is a plist used as a communication channel." (let ((timestamp-format (plist-get info :html-metadata-timestamp-format))) `((?t . ,(org-export-data (plist-get info :title) info)) (?s . ,(org-export-data (plist-get info :subtitle) info)) (?S . ,(format "<a href=%s>Source</a>" (blog-post-source-url))) ; <-- right here! (?d . ,(org-export-data (org-export-get-date info timestamp-format) info)) (?T . ,(format-time-string timestamp-format)) (?a . ,(org-export-data (plist-get info :author) info)) (?e . ,(mapconcat (lambda (e) (format "<a href=\"mailto:%s\">%s</a>" e e)) (split-string (plist-get info :email) ",+ *") ", ")) (?c . ,(plist-get info :creator)) (?C . ,(let ((file (plist-get info :input-file))) (format-time-string timestamp-format (and file (file-attribute-modification-time (file-attributes file)))))) (?v . ,(or (plist-get info :html-validation-link) "")))))
And let's add the format string to our HTML preamble snippet:
<ul class="navigation"> <div> <li><a href="/index.html">Home</a></li> <li><a href="/contact.html">Contact</a></li> <li><a href="/data/resume.pdf">Résumé</a></li> </div> </ul> <p class="metadata">%d by %a. (%S)</p> <!-- lookie here! -->
Hurray! That works!… until it doesn't.
2. Second Implementation
Not only do I host a blog on this website, but also numerous documentation files, within which is the content of my system's configuration files (dotfiles), and it doesn't help that they're hosted on an entirely different website, i.e. SourceHut.
We have to somehow address this situation. We need to make this solution more modular, so that it can support these two different forges and use cases.
So let's start by storing these forges in a property list.
(defvar forges '(:github "github.com" :sourcehut "git.sr.ht") "Property list mapping git forges to their respective domain.")
We'll write a function that will incrementally construct the prefix URL of any - yes, any - resource, once it matches it against one of our predefined forges.
(defun build-forge-prefix-url (forge slug type) "Construct the standard URL of a given FORGE by specifying the repository SLUG and the TYPE of information to access. FORGE is a property from the ’forges’ variable. SLUG is a string and the combination of your username and the name of your repository, e.g. \"octopus/website\". TYPE can take a value of ’log’ or ’tree’." (cond ((equal forge :github) (format "https://%s/%s/%s/" (plist-get forges :github) slug (cond ((eq type 'log) "commits/main") ((eq type 'tree) "blob/main") (t (error "Invalid type."))))) ((equal forge :sourcehut) (format "https://%s/%s/%s/" (plist-get forges :sourcehut) (concat "~" slug) (cond ((eq type 'log) "log/main/item") ((eq type 'tree) "tree/main/item") (t (error "Invalid type.")))))))
Let's run a few examples to understand how it works.
Return the URL pointing to the history of changes of a resource hosted on GitHub.
(build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'log) ;; https://github.com/grtcdr/grtcdr.tn/commits/main/
Return the URL pointing to the source code of a resource hosted on SourceHut.
(build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'tree) ;; https://git.sr.ht/~grtcdr/dotfiles/tree/main/item/
Wonderful, the function covers whatever forge we throw at it! Let's move on.
Do you remember blog-post-source-url
? Well, that'll break if the
resource lives within a submodule. So we'll need to make that more
modular.
Alright, what can we do to obtain the slug of a resource, whilst taking into account this new setting?… We can make use of vc, a built-in library and interface dedicated entirely to version control systems.
We can use vc-root-dir
… but for some reason that won't work when
we publish the project from a proximity, i.e. through a Makefile;
however, we can do with vc-find-root
. This function requires that we
specify the buffer filename, as well as a "witness" i.e. a pattern to
match against when performing the search (for the project root),
e.g. .git
.
Here's what I came up with:
(defun get-resource-slug () "Determines the path of a resource relative to the value returned by ’build-forge-prefix-url'" (let* ((buffer (buffer-file-name)) (root (or (vc-find-root buffer (regexp-opt '(".git" ".hg"))) (project-root (project-current))))) (string-remove-prefix (expand-file-name root) buffer)))
We're done with the new implementation; we should interact with the new functions the same way we did with the older ones. For example, have a look at the format strings used in this website:
This format string expands to a link to the source code of a blog post hosted on GitHub.
(?w . (format "<a href=%s>source</a>" (concat (build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'tree) (get-resource-slug))))
This one expands to a link to the list of revisions of a blog post hosted on GitHub.
(?x . (format "<a href=%s>history</a>" (concat (build-forge-prefix-url :github "grtcdr/grtcdr.tn" 'log) (get-resource-slug))))
While this one expands to a link to the list of revisions of a documentation file hosted on SourceHut.
(?y . (format "<a href=%s>source</a>" (concat (build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'tree) (get-resource-slug))))
And this one expands to a link to the list of revisions of a documentation file hosted on SourceHut.
(?z . (format "<a href=%s>history</a>" (concat (build-forge-prefix-url :sourcehut "grtcdr/dotfiles" 'log) (get-resource-slug))))
3. Conclusion
We did it… We hacked together a set of functions and scratched the itch for transparency. I didn't expect this task to be so trivial, and can I be honest with you? I've been postponing working on this feature for so long. I just didn't know where to look or where to begin.
But in the end, I learned a few things:
- Elisp is not as scary as it looks.
- Org Mode is well designed and documented, as is the rest of Emacs.
- I'm starting to profit off of my choosing
ox-publish
as a static site builder.