Creating a WARC archive for a WordPress Blog

I have been working on a few new concepts for maxbronsema.com but wanted to create a copy of what the site, mostly the blog, before implementing these new ideas. My blog has been around for a long time and I don’t like the idea of deleting a lot of my writing. One day my family may want to read it. In my daily work I have some experience creating site copies but I had not tried it with a blog. For the longest time as a web developer I have been ok with my work being ethereal, a site or project is built and then it is a living entity changing each and every day, occasionally being rebuilt entirely with no semblance of the prior work existing. I am not sure I am comfortable with that concept any longer as more and more of the web I grew up with and the parts I created are lost.

The primary goal was to create a copy in the WARC format. WARC is the Web Archive File format. For a long time it has been straightforward to create a WARC file but cumbersome to view. Using the command line utility wget creating a copy of a site is straightforward.

$wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://yoururl.com --warc-file="name-of-file-you-wish-to-create"

The awesome Archive Team has a nice write-up on using wget for just this purpose and importantly the why behind using the WARC format.

Once you have your site generated as a WARC you can store it in your favorite backup destination as well as someplace else just in case that copy gets destroyed. You can view it at anytime by using the amazing ReplayWeb.page service. It is part of the great Web Recorder project.

Go forth and create personal archives of your writing and work.

Leave a Reply

Your email address will not be published. Required fields are marked *