--- title: "Add Archivarix archives to Hugo" date: 2022-11-06T14:27:04+01:00 draft: true --- I want to add all my old articles to the Hugo posts list page. Let's write some code. * I can use the Archivarix sitemap as source * Or I can use the sqlite database as source * I want to add all the canonical pages to the list * Sorted by reverse date of publication * With the title First, I discover that GoHugo handle override over files, if you a file in `/themes//static/js/jquery.min.js`, you can override it with a file in `/static/js/jquery.min.js`. So I think I don't need a custom theme, so let's remove that. ## Proof of concept with a sitemap 1. First I change the `index.php` and add a sitemap path to enable sitemap generation in Archivarix loader. 1. Generate a sitemap `wget http://localhost:8080/sitemap.xml` 1. Then I discover sitemap doesn't have title in specification so it's a dead end. 1. Place `sitemap.xml` in `/data/legacyblog/sitemap.xml` 1. Let's poc the change in our Hugo theme in `layouts/_default/list.html` ```html # Will load the file and parse it {{ range $.Site.Data.legacyblog.sitemap.url }}
  • {{ .loc }}

  • {{ end }} ``` I will not use this solution we can't have title with it. ## Proof of concept with webcrawl csv file In an other life, I develop a little web crawler or spider that can list all the urls and robot metadatas for a given website. 1. `git clone ` 1. `npm install` 1. `node console.js http://localhost:8080 --noindex --nofollow --progress` will create a file called `localhost_urls.csv` ```csv "url","statusCode","metas.title","metas.robots","metas.canonical","metas.lang","parent.url" "http://localhost:8080/",200,"HugoPoi – Internet, Hardware et Bidouille","max-image-preview:large",,"fr-FR", "http://localhost:8080/v2/",200,"HugoPoi Blog",,"http://localhost:1313/v2/","en","http://localhost:8080/" "http://localhost:8080/en/",200,"How to decrypt flows_cred.json from NodeRED data ? – HugoPoi","max-image-preview:large","http://localhost:8080/en/2021/12/28/how-to-decrypt-flows_cred-json-from-nodered-data/","en-US","http://localhost:8080/" ``` 1. Then we put this file outside of data directory as mention in the documentation of Hugo 1. Mod the template with CSV parse function ```html {{ range $i,$line := getCSV "," "./localhost_urls.csv" }} {{ $url := index $line 0 }} {{ $title := index $line 2 }} {{ if and (ne $i 0) (eq (len (findRE `replytocom` $url 1)) 0)}}
  • {{ $title }}

  • {{ end }} {{ end }} ``` This solution is promising // TODO IMAGE