--- title: "How this blog is made, Archivarix, some PHP and Hugo" date: 2022-11-06T15:07:16+01:00 draft: true toc: true tags: ["youpi"] --- ## The legacy of my blog, recover with Archivarix For me this year was like a rollercoaster, and I forget my blog was hosted on a very old Online.net Dedibox server, now called Scaleway. This server was in a process to be decomission and I missed the 3 emails annoncing the end of my services. Then Online.net decided that it was a good idea to also delete the backups spaces attached to these machines. To sumup I loose my blog and the recent backups. But I wanted to keep it and a least serve the existing content that was linked on search engines and other websites. I looked on the wayback machine and my blog was in it. I found a cool all-in-one service to restore an entire website from the Wayback machine called Archivarix, the cost was arround 10€. I recovered a 300MB zip archive with a lot of content, some images are missing but all the articles was there. ## Running the Archivarix Loader Archivarix loader is a single php file using a sqlite database with all your urls inside and the content is stored as files in `www/.content.EZtzwPjb/binary/`. Each time a HTTP request is process, the script look in the database for a matching url and serve the content linked to it. This mini cms is license under GPL, and I put a copy [here](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/branch/master/www/index.php). ### With docker You need PHP and SQLite extension, the PHP docker image already contains that. I have done a small docker-compose for running archivarix. Simple as run `docker compose up`. * [docker-compose.yml](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/branch/master/docker-compose.yml)

* [nginx.conf](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/branch/master/nginx.conf)
  


 {{< figureCupper
img="Screenshot 2022-11-20 at 18-54-58 HugoPoi – Internet Hardware et Bidouille.png"
caption="First run of my old website"
command="Fill"
options="1024x500 Top" >}}


### With Yunohost

I'm mainly self-hosted with the Yunohost project, those next steps show
you how to easily add a small php inside your Yunohost instance.

1. Install the application `My Webapp` inside the yunohost admin panel

    {{< figureCupper
    img="Screenshot from 2022-11-21 18-40-13.png"
    caption="TODO"
    command="Fit"
    options="1024x500" >}}

1. Fill the setup form

    {{< figureCupper
    img="Screenshot 2022-11-21 at 18-41-13 Install my_webapp _ Catalog YunoHost Admin.png"
    caption="TODO"
    command="Fill"
    options="1024x500" >}}

1. You have an empty app inside `/var/www/my_webapp/www/`

    {{< figureCupper
    img="Screenshot from 2022-11-21 18-42-30.png"
    caption="TODO"
    command="Fit"
    options="1024x500" >}}

1. You need to copy your files, I use rsync with the yunohost admin account

    `rsync -rlgoD --checksum --verbose www/ admin@home.hugopoi.net:/var/www/my_webapp/www/`

1. Then you might need to `chmod 664 /var/www/my_webapp/www/.content.*/structure.*`, Archivarix required some write access on the sqlite files.


## Modding Archivarix Loader

### Fixing Wordpress version missing files

The homepage was looking good but some wordpress css and javascript assets were missing. Wordpress use a query params `?ver=`.

 {{< figureCupper
img="Screenshot 2022-11-20 at 18-55-57 Linky opendata my ass – HugoPoi.png"
caption="First run of my old website, some broken css looking wrong"
command="Fill"
options="1024x500 Top" >}}

{{< figureCupper
img="Screenshot-Firefix-debugger-404-ver-wordpress-archivarix.png"
caption="Missing files because of the Wordpress `?ver=` query params with Archivarix"
command="Fit"
options="1024x500" >}}

So I code a little function to load any version available for a
given url. And I take the most recent one.

* [`getOtherWordpressVersionUrls` function in www/index.php](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/commit/2a154a6eea510e08b2608fd55f6729056c363b25/www/index.php#L295-L305)
  


* [The call in www/index.php](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/commit/2a154a6eea510e08b2608fd55f6729056c363b25/www/index.php#L609-L614)
  



### Cleaning existing pages

After successfully running my backuped blog, I wanted to mod some
content.

* Replace the twitter widget
* Replace the hoster widget
* Add a legacy warning for visitor to redirect to the new blog

Archivarix has a `ARCHIVARIX_INCLUDE_CUSTOM` relying
on regular expression to replace content but I needed a more precise approach. I used the PHP XML extension which has a DOM parser buit in and can
parse HTML pages.

* [The easy config to add/replace/delete some html parts](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/src/commit/cd94d82c1a3dad22b026c9c26311b366f76dcd54/www/index.php#L97-L133)
  

* [The clever mod](https://home.hugopoi.net/gitea/hugopoi/blog.hugopoi.net/commit/598d28551d071774007172782541e1d140b8a3c1#diff-eb630ac88267e24589fd94de0826721dff38beb4)
  






## The new blog with Hugo

Go Hugo !