blog.hugopoi.net/v2/content/post/how-this-blog-is-made/index.md

6.0 KiB
Raw Blame History

title date draft toc tags
How this blog is made, Archivarix, some PHP and Hugo 2022-11-06T15:07:16+01:00 true true
youpi

The legacy of my blog, recover with Archivarix

For me this year was like a rollercoaster, and I forget my blog was hosted on a very old Online.net Dedibox server, now called Scaleway. This server was in a process to be decomission and I missed the 3 emails annoncing the end of my services. Then Online.net decided that it was a good idea to also delete the backups spaces attached to these machines. To sumup I loose my blog and the recent backups. But I wanted to keep it and a least serve the existing content that was linked on search engines and other websites. I looked on the wayback machine and my blog was in it. I found a cool all-in-one service to restore an entire website from the Wayback machine called Archivarix, the cost was arround 10€.

I recovered a 300MB zip archive with a lot of content, some images are missing but all the articles was there.

Running the Archivarix Loader

Archivarix loader is a single php file using a sqlite database with all your urls inside and the content is stored as files in www/.content.EZtzwPjb/binary/. Each time a HTTP request is process, the script look in the database for a matching url and serve the content linked to it. This mini cms is license under GPL, and I put a copy here.

With docker

You need PHP and SQLite extension, the PHP docker image already contains that. I have done a small docker-compose for running archivarix.

Simple as run docker compose up.

{{< figureCupper img="Screenshot 2022-11-20 at 18-54-58 HugoPoi Internet Hardware et Bidouille.png" caption="First run of my old website" command="Fill" options="1024x500 Top" >}}

With Yunohost

I'm mainly self-hosted with the Yunohost project, those next steps show you how to easily add a small php inside your Yunohost instance.

  1. Install the application My Webapp inside the yunohost admin panel

    {{< figureCupper img="Screenshot from 2022-11-21 18-40-13.png" caption="TODO" command="Fit" options="1024x500" >}}

  2. Fill the setup form

    {{< figureCupper img="Screenshot 2022-11-21 at 18-41-13 Install my_webapp _ Catalog YunoHost Admin.png" caption="TODO" command="Fill" options="1024x500" >}}

  3. You have an empty app inside /var/www/my_webapp/www/

    {{< figureCupper img="Screenshot from 2022-11-21 18-42-30.png" caption="TODO" command="Fit" options="1024x500" >}}

  4. You need to copy your files, I use rsync with the yunohost admin account

    rsync -rlgoD --checksum --verbose www/ admin@home.hugopoi.net:/var/www/my_webapp/www/

  5. Then you might need to chmod 664 /var/www/my_webapp/www/.content.*/structure.*, Archivarix required some write access on the sqlite files.

Modding Archivarix Loader

Fixing Wordpress version missing files

The homepage was looking good but some wordpress css and javascript assets were missing. Wordpress use a query params ?ver=.

{{< figureCupper img="Screenshot 2022-11-20 at 18-55-57 Linky opendata my ass HugoPoi.png" caption="First run of my old website, some broken css looking wrong" command="Fill" options="1024x500 Top" >}}

{{< figureCupper img="Screenshot-Firefix-debugger-404-ver-wordpress-archivarix.png" caption="Missing files because of the Wordpress ?ver= query params with Archivarix" command="Fit" options="1024x500" >}}

So I code a little function to load any version available for a given url. And I take the most recent one.

Cleaning existing pages

After successfully running my backuped blog, I wanted to mod some content.

  • Replace the twitter widget
  • Replace the hoster widget
  • Add a legacy warning for visitor to redirect to the new blog

Archivarix has a ARCHIVARIX_INCLUDE_CUSTOM relying on regular expression to replace content but I needed a more precise approach. I used the PHP XML extension which has a DOM parser buit in and can parse HTML pages.

The new blog with Hugo

Go Hugo !