blog.hugopoi.net/index.md at poc-feat-csv-with-archives-in-nav

hugopoi/blog.hugopoi.net

Fork 0

HugoPoi 567cc42c9c feat(post): new post about add archivarix archives to hugo

2022-12-05 20:01:43 +01:00

8.3 KiB

Raw Permalink Blame History

title

date

toc

The legacy of my blog, recover with Archivarix

For me this year was like a rollercoaster, and I forget my blog was hosted on a very old Online.net Dedibox server, now called Scaleway. This server was in a process to be decomission and I missed the 3 emails annoncing the end of my services. Then Online.net decided that it was a good idea to also delete the backups spaces attached to these machines. To sumup I loose my blog and the recent backups. But I wanted to keep it and a least serve the existing content that was linked on search engines and other websites. I looked on the wayback machine and my blog was in it. I found a cool all-in-one service to restore an entire website from the Wayback machine called Archivarix, the cost was arround 10€.

I recovered a 300MB zip archive with a lot of content, some images are missing but all the articles was there.

Running the Archivarix Loader

Archivarix loader is a single php file using a sqlite database with all your urls inside and the content is stored as files in www/.content.EZtzwPjb/binary/. Each time a HTTP request is process, the script look in the database for a matching url and serve the content linked to it. This mini cms is license under GPL, and I put a copy here.

With docker

You need PHP and SQLite extension, the PHP docker image already contains that. I have done a small docker-compose for running archivarix.

Simple as run docker compose up.

{{< figureCupper img="Screenshot 2022-11-20 at 18-54-58 HugoPoi – Internet Hardware et Bidouille.png" caption="First run of my old website" command="Fill" options="1024x500 Top" >}}

With Yunohost

I'm mainly self-hosted with the Yunohost project, those next steps show you how to easily add a small php inside your Yunohost instance.

Install the application My Webapp inside the yunohost admin panel

{{< figureCupper img="Screenshot from 2022-11-21 18-40-13.png" caption="TODO" command="Fit" options="1024x500" >}}
Fill the setup form

{{< figureCupper img="Screenshot 2022-11-21 at 18-41-13 Install my_webapp _ Catalog YunoHost Admin.png" caption="TODO" command="Fill" options="1024x500" >}}
You have an empty app inside /var/www/my_webapp/www/

{{< figureCupper img="Screenshot from 2022-11-21 18-42-30.png" caption="TODO" command="Fit" options="1024x500" >}}
You need to copy your files, I use rsync with the yunohost admin account

rsync -rlgoD --checksum --verbose www/ admin@home.hugopoi.net:/var/www/my_webapp/www/
Then you might need to chmod 664 /var/www/my_webapp/www/.content.*/structure.*, Archivarix required some write access on the sqlite files.

Modding Archivarix Loader

Fixing Wordpress version missing files

The homepage was looking good but some wordpress css and javascript assets were missing. Wordpress use a query params ?ver=.

{{< figureCupper img="Screenshot 2022-11-20 at 18-55-57 Linky opendata my ass – HugoPoi.png" caption="First run of my old website, some broken css looking wrong" command="Fill" options="1024x500 Top" >}}

{{< figureCupper img="Screenshot-Firefix-debugger-404-ver-wordpress-archivarix.png" caption="Missing files because of the Wordpress ?ver= query params with Archivarix" command="Fit" options="1024x500" >}}

So I code a little function to load any version available for a given url. And I take the most recent one.

Cleaning existing pages

After successfully running my backuped blog, I wanted to mod some content.

Replace the twitter widget
Replace the hoster widget
Add a legacy warning for visitor to redirect to the new blog

Archivarix has a ARCHIVARIX_INCLUDE_CUSTOM relying on regular expression to replace content but I needed a more precise approach. I used the PHP XML extension which has a DOM parser buit in and can parse HTML pages.

The new blog with Hugo

I wanted simple markdown that generate static HTML, this why I choose Hugo for my blog.

First I add a new directory called v2 next to the legacy blog.
Then you launch hugo new site v2 this will generate the folder tree needed.
I choose the Cupper theme

{{< figureCupper img="Screenshot 2022-12-03 at 15-30-20 Cupper.png" caption="Cupper Theme for Hugo" command="Fit" options="1024x500" >}}
I add the Cupper theme with git submodule add https://github.com/zwbetz-gh/cupper-hugo-theme.git themes/cupper-hugo-theme
Add some custom css
Add some config in config.toml
I also upgrade prism.js for code highlight in theme itself
I add favicon and logo
Then I need to build with hugo
Deploy with rsync in yunohost

I add a CORS header in nginx config of my Gitea to allow fetching code inside <pre> html tags

rewrite ^/gitea$ /gitea/ permanent;
location /gitea/ {
    proxy_pass                  http://localhost:6000/;
    proxy_set_header            Host $host;
    proxy_buffering off;
    client_max_body_size        200M;
    proxy_set_header X-Real-IP $remote_addr;

    # Include SSOWAT user panel.
    include conf.d/yunohost_panel.conf.inc;

    add_header 'Vary' 'Origin';

    # Add CORS header for loading code in pre html tags
    if ($http_origin ~* "^(http://localhost:1313|https://blog.hugopoi.net)$") {
        add_header Access-Control-Allow-Origin "$http_origin";
    }
}

Moooore to come !

8.3 KiB Raw Permalink Blame History Unescape Escape