For a while now, I’ve been advocating having www in domains. I also believe in using HTTPS. Also I’ve fairly recently come to see the importance of having dates in your URLs.

A challenge comes when you’re trying to make such changes to your site’s permalink structure, however. Assuming you have your post’s full URL as the guid in your RSS (and if you’re not then you’re doing it wrong1), it means that scrapers will incorrectly think that all items in your XML feed are new.

A specific example should make this clear. Let’s say you decide to go from HTTP to HTTPS. You update your site URL in your site’s core configuration, and this update perpetuates across the board, including your RSS. Your feed’s guid entries went from this:

<guid isPermaLink="true">http://example.com/2017/02/22/some-post/</guid>

To this:

<guid isPermaLink="true">https://example.com/2017/02/22/some-post/</guid>

Since the guid is the unique identifier for RSS scrapers,2 you can see how the scraper is going to incorrectly think that you’ve published something new when in fact you’ve simply tweaked the URL of something previously published. If you’ve integrated something like dlvr.it as a means of auto-tweeting your new posts, this means you’re going to be sending out promiscuous tweets as well. It’s a real debacle.

Here’s how you fix it. You update your RSS so that any old items just don’t show up. Inside my main for loop of my feed.xml that loops through the most recent posts, here’s what that looks like in Jekyll:

{% capture posttime %}{{post.date | date: '%s' | minus: 1487640396 }}{% endcapture %}
{% if posttime contains '-' %}
  {% continue %}
{% endif %}

Change out the “minus” value to the timestamp cutoff you need. It’ll be a hardcoded value. In my experience this is usually the current timestamp at the time I’m needing the cutoff, and so I just head over to unixtimestamp.com and grab it from there. Once you do this, you’re all set. You can now make changes to your URL schema without having to worry about spamming your readers with a bunch of preexisting content. If you like keeping your code clean, after you’ve published enough new things that your posts prior to this change no longer appear in your XML feed, you can always revert your commit that added this conditional.

I wish I’d known about this “hack” years ago, because I’ve had clients, friends, and acquaintances that needed it. I’ve seen it plague 8-figure businesses.

You can add this to your growing list of things that Jekyll’s way better at than WordPress too, by the way. Editing your RSS feed in Jekyll is a breeze.

  1. To quote again from Harvard’s Berkman Center for Internet & Society, the entity responsible for the official RSS spec:

    In all cases, it’s recommended that you provide the guid, and if possible make it a permalink.

    I’m a firm believer in following specs unless there’s a very good reason not to. ↩︎

  2. That’s assuming, of course, that the guid is present; otherwise, scrapers look for other means of uniqueness. But you can be very sure that in such situations, these scrapers mumble and grumble about how the feed they’re inspecting doesn’t adhere to the proper syntax specification. ↩︎