Web pages are ephemeral—existing on someone else’s computer, and under someone else’s control. Information you rely on and need may endure for decades only to disappear overnight when you need it most. With Archivy you can easily save webpages as Markdown, then organize and edit them on your own system. Yours for eternity. Here’s how.
Why Would You Want to Build Your Own Archive?
Almost all the world’s information is available online: Wikipedia is the largest encyclopedia ever created, and MakeUseOf.com hosts excellent technical articles which show you how to do cool and interesting things. If you like an article, it’s easy enough to bookmark it in your browser to visit later, and if you have a connected account with Google or another service, you can access your bookmarks on any device.
But web pages disappear, sites reorganize their linking structures, and often pages are updated to reflect the latest news, technology, and data. You may bookmark a set of instructions for a particular software version, only to return months later and discover that the steps have changed to suit the latest version. If you want to be able to rely on and return to the information you find online, it’s best to keep your own copy offline.
What Is Archivy?
Archivy is one of several offline archiving solutions which you can run on your Raspberry Pi. Some, such as ArchiveBox, will scrape websites and save the output in a variety of formats, including HTML, PDF, and screenshots.
Archivy is a personal archive based around a tree structure of Markdown documents. You can create branching folders and if you add a bookmark, it will scrape the webpage and convert the text to Markdown for you—and create and convert the headings into a clickable table of contents, and will, in some cases, automatically download the images, and store them on your Pi.
You can edit the Markdown, add notes and tags to make the archive work for you, and even add standalone notes of your own thoughts and musings. It’s more than a web archive: it’s a personal archive you can access from anywhere.
How to Install Archivy on Your Raspberry Pi
Archivy is a Python app and is designed to be accessed through a browser, so before you start, you will need to set your Raspberry Pi up as a web server. If you don’t have PIP and Python already installed on your Raspberry Pi, install them now.
While Archivy can use ElasticSearch to help you search and manage your archive, it works well with RipGrep as well. Install RipGrep with:
sudo apt install ripgrep
Now you can install Archivy with:
pip install archivy
Create a new directory where Archivy will store its data:
Now to configure your system and create an admin user.
…will start the wizard
The wizard will ask you for the full path of your data directory, and whether you want to be able to use search. Type “ripgrep” at the prompt when asked what type you want to use. When asked if you want to create an admin user, enter “y”.
You can start Archivy running with:
Archivy runs on port 5000, and you can access it by entering:
…into a browser on your local network.
If you want to access your Archivy archive from outside your house, create a new Apache configuration file:
cd /etc/apache2/sites-availablesudo nano archivy.conf
In this new file, enter:
ProxyPass / http:
ProxyPassReverse / http:/127.0.0.1:5000/
Save and exit with Ctrl + O then Ctrl + X. Then restart Apache with:
sudo service apache2 restart
Obtain a new security certificate from Let’s Encrypt with
Certbot will present you with a list and ask you to select which site you want a security certificate for. Enter the appropriate number and hit Return, and Certbot will check that everything is in order and create a certificate and key file on your system. Choose “redirect” when asked, then restart Apache once again.
Now when you visit your domain or subdomain, Archivy will be served over an encrypted connection.
Use Archivy to Archive the Internet and Your Ideas
Log into Archivy with the admin username and password, and you’ll see there’s only one folder: root. You can create a new sub-folder by typing a name into the field next to Create sub directory, then clicking the button. Subdirectories are nested, and you can carry on as deep as you like. A tree diagram is generated on the left of the screen to help you navigate the structure.
To add a webpage to your archive, click on the New Bookmark button. You’ll be asked for the URL, and to specify tags. You don’t have to add tags, but it helps for navigation. When you’re ready, hit Save, and Archivy will scrape the page and generate a formatted Markdown document, complete with tags and ToC.
You can change the layout of the document by clicking the edit button, and using standard Markdown formatting to tailor it precisely. You can add extra tags by bracing your new tag with “#” anywhere within the document. If you click on any of the tags, you will see a list of other archived articles with the same tag. To add a file or note of your own, click New Note and enter the Markdown directly.
Archivy is still a work in progress, so you can expect new features to be added in the future, and as it’s an open source project, you can even contribute to the code yourself.
Use Your Raspberry Pi for More!
The Raspberry Pi is an extraordinarily versatile machine, and performs extremely well as a server. The Raspberry Pi 4 in particular can handle an exceptional workload, and is able to run dozens of sites and services at the same time. Whatever your interests, from cooking to coding, archiving to audiobooks, there’s a self-hosted solution which will run on your Raspberry Pi.