blog.hugopoi.net/v2/content/post/add-archivarix-archives-to-.../index.md

108 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: "Add Archivarix archives to Hugo"
date: 2022-11-06T14:27:04+01:00
draft: true
---
I want to add all my old articles to the Hugo posts list page.
Let's write some code.
* I can use the Archivarix sitemap as source
* Or I can use the sqlite database as source
* I want to add all the canonical pages to the list
* Sorted by reverse date of publication
* With the title
First, I discover that GoHugo handle override over files, if you a file
in `/themes/<THEME>/static/js/jquery.min.js`, you can override it with a
file in `/static/js/jquery.min.js`. So I think I don't need a custom
theme, so let's remove that.
## Proof of concept with a sitemap
1. First I change the `index.php` and add a sitemap path to enable
sitemap generation in Archivarix loader.
1. Generate a sitemap `wget http://localhost:8080/sitemap.xml`
1. Then I discover sitemap doesn't have title in specification so it's a
dead end.
1. Place `sitemap.xml` in `/data/legacyblog/sitemap.xml`
1. Let's poc the change in our Hugo theme in `layouts/_default/list.html`
```html
# Will load the file and parse it
{{ range $.Site.Data.legacyblog.sitemap.url }}
<li>
<h2>
<a href="{{ .loc }}">
<svg
class="bookmark"
aria-hidden="true"
viewBox="0 0 40 50"
focusable="false"
>
<use href="#bookmark"></use>
</svg>
{{ .loc }}
</a>
</h2>
</li>
{{ end }}
```
I will not use this solution we can't have title with it.
## Proof of concept with webcrawl csv file
In an other life, I develop a little web crawler or spider that can list
all the urls and robot metadatas for a given website.
1. `git clone `
1. `npm install`
1. `node console.js http://localhost:8080 --noindex --nofollow --progress` will create a file called `localhost_urls.csv`
```csv
"url","statusCode","metas.title","metas.robots","metas.canonical","metas.lang","parent.url"
"http://localhost:8080/",200,"HugoPoi Internet, Hardware et Bidouille","max-image-preview:large",,"fr-FR",
"http://localhost:8080/v2/",200,"HugoPoi Blog",,"http://localhost:1313/v2/","en","http://localhost:8080/"
"http://localhost:8080/en/",200,"How to decrypt flows_cred.json from NodeRED data ? HugoPoi","max-image-preview:large","http://localhost:8080/en/2021/12/28/how-to-decrypt-flows_cred-json-from-nodered-data/","en-US","http://localhost:8080/"
```
1. Then we put this file outside of data directory as mention in the
documentation of Hugo
1. Mod the template with CSV parse function
```html
<!-- Loop against csv lines -->
{{ range $i,$line := getCSV "," "./localhost_urls.csv" }}
<!-- Fill variables with columns -->
{{ $url := index $line 0 }}
{{ $title := index $line 2 }}
<!-- Skip csv head line and replytocom wordpress urls -->
{{ if and (ne $i 0) (eq (len (findRE `replytocom` $url 1)) 0)}}
<li>
<h2>
<a href="{{ $url }}">
<svg
class="bookmark"
aria-hidden="true"
viewBox="0 0 40 50"
focusable="false"
>
<use href="#bookmark"></use>
</svg>
{{ $title }}
</a>
</h2>
</li>
{{ end }}
{{ end }}
```
This solution is promising
// TODO IMAGE