diff --git a/v2/content/post/add-archivarix-archives-to-hugo/index.md b/v2/content/post/add-archivarix-archives-to-hugo/index.md new file mode 100644 index 0000000..91bae81 --- /dev/null +++ b/v2/content/post/add-archivarix-archives-to-hugo/index.md @@ -0,0 +1,107 @@ +--- +title: "Add Archivarix archives to Hugo" +date: 2022-11-06T14:27:04+01:00 +draft: true +--- + +I want to add all my old articles to the Hugo posts list page. + +Let's write some code. + +* I can use the Archivarix sitemap as source +* Or I can use the sqlite database as source +* I want to add all the canonical pages to the list +* Sorted by reverse date of publication +* With the title + +First, I discover that GoHugo handle override over files, if you a file +in `/themes//static/js/jquery.min.js`, you can override it with a +file in `/static/js/jquery.min.js`. So I think I don't need a custom +theme, so let's remove that. + + +## Proof of concept with a sitemap + +1. First I change the `index.php` and add a sitemap path to enable +sitemap generation in Archivarix loader. + +1. Generate a sitemap `wget http://localhost:8080/sitemap.xml` + +1. Then I discover sitemap doesn't have title in specification so it's a +dead end. + +1. Place `sitemap.xml` in `/data/legacyblog/sitemap.xml` +1. Let's poc the change in our Hugo theme in `layouts/_default/list.html` + + ```html + # Will load the file and parse it + {{ range $.Site.Data.legacyblog.sitemap.url }} +
  • +

    + + + {{ .loc }} + +

    +
  • + {{ end }} + ``` +I will not use this solution we can't have title with it. + +## Proof of concept with webcrawl csv file + +In an other life, I develop a little web crawler or spider that can list +all the urls and robot metadatas for a given website. + +1. `git clone ` +1. `npm install` +1. `node console.js http://localhost:8080 --noindex --nofollow --progress` will create a file called `localhost_urls.csv` + + ```csv + "url","statusCode","metas.title","metas.robots","metas.canonical","metas.lang","parent.url" + "http://localhost:8080/",200,"HugoPoi – Internet, Hardware et Bidouille","max-image-preview:large",,"fr-FR", + "http://localhost:8080/v2/",200,"HugoPoi Blog",,"http://localhost:1313/v2/","en","http://localhost:8080/" + "http://localhost:8080/en/",200,"How to decrypt flows_cred.json from NodeRED data ? – HugoPoi","max-image-preview:large","http://localhost:8080/en/2021/12/28/how-to-decrypt-flows_cred-json-from-nodered-data/","en-US","http://localhost:8080/" + ``` +1. Then we put this file outside of data directory as mention in the +documentation of Hugo +1. Mod the template with CSV parse function + ```html + + {{ range $i,$line := getCSV "," "./localhost_urls.csv" }} + + {{ $url := index $line 0 }} + {{ $title := index $line 2 }} + + {{ if and (ne $i 0) (eq (len (findRE `replytocom` $url 1)) 0)}} +
  • +

    + + + {{ $title }} + +

    +
  • + {{ end }} + {{ end }} + ``` + + This solution is promising + // TODO IMAGE + + + diff --git a/v2/content/post/how-this-blog-is-made/index.md b/v2/content/post/how-this-blog-is-made/index.md index 40e3247..d904b4a 100644 --- a/v2/content/post/how-this-blog-is-made/index.md +++ b/v2/content/post/how-this-blog-is-made/index.md @@ -2,7 +2,7 @@ title: "How this blog is made, Archivarix, some PHP and Hugo" date: 2022-12-03T17:17:00+01:00 toc: true -tags: ["youpi"] +tags: ["this blog", "PHP", "gohugo"] --- ## The legacy of my blog, recover with Archivarix