blog.hugopoi.net/v2/content/post/add-archivarix-archives-to-hugo/index.md

---
title: "Add Archivarix archives to Hugo"
date: 2022-11-06T14:27:04+01:00
draft: true
---

I want to add all my old articles to the Hugo posts list page.

Let's write some code.

* I can use the Archivarix sitemap as source
* Or I can use the sqlite database as source
* I want to add all the canonical pages to the list
* Sorted by reverse date of publication
* With the title

First, I discover that GoHugo handle override over files, if you a file
in `/themes/<THEME>/static/js/jquery.min.js`, you can override it with a
file in `/static/js/jquery.min.js`. So I think I don't need a custom
theme, so let's remove that.


## Proof of concept with a sitemap

1. First I change the `index.php` and add a sitemap path to enable
sitemap generation in Archivarix loader.

1. Generate a sitemap `wget http://localhost:8080/sitemap.xml`

1. Then I discover sitemap doesn't have title in specification so it's a
dead end.

1. Place `sitemap.xml` in `/data/legacyblog/sitemap.xml`
1. Let's poc the change in our Hugo theme in `layouts/_default/list.html`

  ```html
      # Will load the file and parse it
      {{ range $.Site.Data.legacyblog.sitemap.url }}
      <li>
        <h2>
          <a href="{{ .loc }}">
            <svg
              class="bookmark"
              aria-hidden="true"
              viewBox="0 0 40 50"
              focusable="false"
            >
              <use href="#bookmark"></use>
            </svg>
            {{ .loc }}
          </a>
        </h2>
      </li>
      {{ end }}
  ```
I will not use this solution we can't have title with it.

## Proof of concept with webcrawl csv file

In an other life, I develop a little web crawler or spider that can list
all the urls and robot metadatas for a given website.

1. `git clone `
1. `npm install`
1. `node console.js http://localhost:8080 --noindex --nofollow --progress` will create a file called `localhost_urls.csv`

  ```csv
  "url","statusCode","metas.title","metas.robots","metas.canonical","metas.lang","parent.url"
  "http://localhost:8080/",200,"HugoPoi – Internet, Hardware et Bidouille","max-image-preview:large",,"fr-FR",
  "http://localhost:8080/v2/",200,"HugoPoi Blog",,"http://localhost:1313/v2/","en","http://localhost:8080/"
  "http://localhost:8080/en/",200,"How to decrypt flows_cred.json from NodeRED data ? – HugoPoi","max-image-preview:large","http://localhost:8080/en/2021/12/28/how-to-decrypt-flows_cred-json-from-nodered-data/","en-US","http://localhost:8080/"
  ```
1. Then we put this file outside of data directory as mention in the
documentation of Hugo
1. Mod the template with CSV parse function
  ```html
      <!-- Loop against csv lines -->
      {{ range $i,$line := getCSV "," "./localhost_urls.csv" }}
      <!-- Fill variables with columns -->
      {{ $url := index $line 0 }}
      {{ $title := index $line 2 }}
      <!-- Skip csv head line and replytocom wordpress urls -->
      {{ if and (ne $i 0) (eq (len (findRE `replytocom` $url 1)) 0)}}
      <li>
        <h2>
          <a href="{{ $url }}">
            <svg
              class="bookmark"
              aria-hidden="true"
              viewBox="0 0 40 50"
              focusable="false"
            >
              <use href="#bookmark"></use>
            </svg>
            {{ $title }}
          </a>
        </h2>
      </li>
      {{ end }}
      {{ end }}
  ```

  This solution is promising
  // TODO IMAGE