blog.hugopoi.net/v2/content/post/add-archivarix-archives-to-.../index.md

3.3 KiB
Raw Blame History

title date draft
Add Archivarix archives to Hugo 2022-11-06T14:27:04+01:00 true

I want to add all my old articles to the Hugo posts list page.

Let's write some code.

  • I can use the Archivarix sitemap as source
  • Or I can use the sqlite database as source
  • I want to add all the canonical pages to the list
  • Sorted by reverse date of publication
  • With the title

First, I discover that GoHugo handle override over files, if you a file in /themes/<THEME>/static/js/jquery.min.js, you can override it with a file in /static/js/jquery.min.js. So I think I don't need a custom theme, so let's remove that.

Proof of concept with a sitemap

  1. First I change the index.php and add a sitemap path to enable sitemap generation in Archivarix loader.

  2. Generate a sitemap wget http://localhost:8080/sitemap.xml

  3. Then I discover sitemap doesn't have title in specification so it's a dead end.

  4. Place sitemap.xml in /data/legacyblog/sitemap.xml

  5. Let's poc the change in our Hugo theme in layouts/_default/list.html

    # Will load the file and parse it
    {{ range $.Site.Data.legacyblog.sitemap.url }}
    <li>
      <h2>
        <a href="{{ .loc }}">
          <svg
            class="bookmark"
            aria-hidden="true"
            viewBox="0 0 40 50"
            focusable="false"
          >
            <use href="#bookmark"></use>
          </svg>
          {{ .loc }}
        </a>
      </h2>
    </li>
    {{ end }}

I will not use this solution we can't have title with it.

Proof of concept with webcrawl csv file

In an other life, I develop a little web crawler or spider that can list all the urls and robot metadatas for a given website.

  1. git clone
  2. npm install
  3. node console.js http://localhost:8080 --noindex --nofollow --progress will create a file called localhost_urls.csv
"url","statusCode","metas.title","metas.robots","metas.canonical","metas.lang","parent.url"
"http://localhost:8080/",200,"HugoPoi  Internet, Hardware et Bidouille","max-image-preview:large",,"fr-FR",
"http://localhost:8080/v2/",200,"HugoPoi Blog",,"http://localhost:1313/v2/","en","http://localhost:8080/"
"http://localhost:8080/en/",200,"How to decrypt flows_cred.json from NodeRED data ?  HugoPoi","max-image-preview:large","http://localhost:8080/en/2021/12/28/how-to-decrypt-flows_cred-json-from-nodered-data/","en-US","http://localhost:8080/"
  1. Then we put this file outside of data directory as mention in the documentation of Hugo
  2. Mod the template with CSV parse function
    <!-- Loop against csv lines -->
    {{ range $i,$line := getCSV "," "./localhost_urls.csv" }}
    <!-- Fill variables with columns -->
    {{ $url := index $line 0 }}
    {{ $title := index $line 2 }}
    <!-- Skip csv head line and replytocom wordpress urls -->
    {{ if and (ne $i 0) (eq (len (findRE `replytocom` $url 1)) 0)}}
    <li>
      <h2>
        <a href="{{ $url }}">
          <svg
            class="bookmark"
            aria-hidden="true"
            viewBox="0 0 40 50"
            focusable="false"
          >
            <use href="#bookmark"></use>
          </svg>
          {{ $title }}
        </a>
      </h2>
    </li>
    {{ end }}
    {{ end }}

This solution is promising // TODO IMAGE