Post-processing Soupault with Python
Needless to say, I’m at this point still very new to Soupault.app, so there may be a native way to do this, but after skimming the very long documentation, the GitHub blog blueprint, and a personal site that uses a pretty bog-standard Soupault content model, I didn’t have any leads.
So, what follows might be the worst way to do this, but I really wanted chronological nav links on every blog post.
The section index model Soupault uses always seemed kinda weird to
me, especially for a blog, where new pages are being added all the time.
Sure, there’s stuff like a “last 10 posts” view, or hashtag plugins to
make navigating content easier, but that feels like beating around the
bush. If you have hundreds of blog entries, that’s a very long
index.html
! Surely there are performance implications to
loading that whole thing? (If there aren’t, don’t come for me, I’m a
humble back-end dev). And performance aside, it’s annoying to navigate;
it’s a lot of click to view blog post, read, click back, scroll,
click to view next blog post. I’m slightly irritated just thinking
about it. And if you’re on a mobile browser that often forgets your
position in a long page? Rage inducing. Being able to move from post to
post without having to go back to a central “hub” or index, and without
excessive scrolling sounds a lot more pleasant. Of course, to someone
with my background, limit
and offset
sound
even better, but also impossible for a static site.
So, after a little thinking, an arrangement like
<< newest | < newer | older > | oldest >>
seemed a good enough solution to this perhaps made-up problem, and also
doable. Here’s how I do it. (My solution is implemented in
Python, but you can use anything.)
The process
First, we have Soupault build the site, because we’ll be using the
metadata it extracts to modify the pages it produces. At bare minimum,
we need url
, date
, and nav_path
.
date
is the only one of these you have to configure
yourself. I also use title
for the title
attribute in the link. After Soupault finishes, we walk through
index.json
and build a dict
(or what other
languages might call a map
) where the key is the section,
and the value is a list of objects, with each object holding the
metadata for one page. The section key is made by just joining
everything in the nav_path
array with a slash to form
something like a/b/c
. Since I have Soupault sorting by date
already, what we have by the end of this step is essentially a
collection of metadata sorted by section, then by date (descending). If
your index.json
is not already sorted by date, you
have some extra sorting to do. It will be obvious why we did this in the
next step.
Metadata dict code:
= {}
sections with open('index.json') as index_file:
= json.load(index_file)
index for page in index:
= page['url']
url = page['title']
title = page['date']
date = '/'.join(page['nav_path'])
section if section not in sections:
= []
sections[section] =url, title=title, date=date)) sections[section].append(PageMeta(url
Now, we walk through the collection we just made, section by section,
and page by page, while looking ahead (i + 1
) and behind
(i - 1
) to get the chronological newer and older page
metadata respectively. Since we’re in a date-sorted list, index
0
is the newest page in this section, and
len - 1
is the oldest. That’s really the meat of this whole
thing: using url
to create our hyperlinks, and inserting
them into each HTML document. url
is also used to find the
actual HTML artifact: just split it on the /
, prepend the
build directory, and Bob’s your uncle.
Look-ahead, look-behind link-building loop:
for _, metas in sections.items():
if len(metas) > 1:
for i in range(len(metas)):
= i == 0
on_newest = i == len(metas) - 1
on_oldest = html.escape('<< newest')
newest = html.escape('< newer')
newer = html.escape('older >')
older = html.escape('oldest >>')
oldest if not on_newest:
= make_anchor(metas[0], newest)
newest = make_anchor(metas[i - 1], newer)
newer if not on_oldest:
= make_anchor(metas[i + 1], older)
older = make_anchor(metas[-1], oldest)
oldest
= get_parent_path(metas[i].url) + ['index.html']
nav_path = f'<p class="post-nav">\n' + \
nav_block f' |\n'.join([newest, newer, older, oldest]) + \
'\n</p>'
# insert nav_block into HTML artifact
link(metas[i].url, nav_block)
All that’s left is to rewrite the HTML. I just duplicate the original, open the dupe in read mode, and the original in write mode, and begin writing the content of the dupe line-by-line back into the original file. When it encounters the tag that comes immediately after the position where I want the links, it of course first writes out the link block before continuing copying. Very simple in retrospect, it just took me quite a while to figure out what I wanted, and how to work with Soupault in harmony, neither of us stepping on each other’s toes. The result is this post-processor.