As a scientist it is crucial to continue growing and learning as you progress in your field. That being the case, despite a lackluster background in the computer sciences I have come to rely more and more on computational skills in the lab. Ever so slowly the need for a stronger foundation in the tech fields has come to light and I feel woefully behind the times.
Computers are everywhere, whether you like it or not, and artificial intelligence, robotics, and coding are slowly taking over the world. The writing’s on the walls: DeepMind beat the Go Champion, Google put Boston Dynamics up for sell, Dark Trace’s machine learning algorithms stopping ransomware, and the multiple countries beginning to seriously discuss a universal monthly income for their citizens. These are red flags. There are only 4 types of work and until DeepMind exhibited its ability to actually learn and teach itself how to do a task by beating the Go player, we at least did not have to fear the onset of automation in 2 of the 4 types.
Things…things are’a changin’…
However ominous that may sound, we as scientists have to be adaptable. Besides, we have to give it to technology, it continues to exceed our expectations and give us amazing things at an ever quickening pace. With that in mind, this post outlines backing up and/or accessing whole websites offline. The internet is beautiful for many reasons, but one of them is that the billions of people who are on it often are talented and brilliant human beings who can create great things. If you know where to look you can find some of them, though sometime you have to deserve it.
So the question is can we save a whole website for access offline? I’m not talking a single page here, no, we want the whole thing. Can we rip it off the nets? Is it possible?
The answer is yes.
Not only can you access it, but you can access most of it’s features too given the right tools. Look no further though, I will give you all you need here in this quick tutorial.
Tools for downloading a whole website
- HtTrack -This program copies the contents of an entire site. It can even grab grab the pieces needed to make active code content work offline.
- Wget – This program is a classic command-line tool developed for this specific task. It is sometimes included with Unix/Linux systems. It is also available for Windows users as well (newer 1.13.4 ). Despite being a free utility, it packs a lot of bang for the buck allowing for non-interactive downloads of files from the Web. It supports HTTP, HTTPS, and FTP protocols, as well as retrieval through HTTP proxies.
-
Note: Wget is a bit more extensive in the computer department, A good example run is doing the following comd line:
wget -r --no-parent http://site.com/songs/
You also might try:
--mirror instead of -r.
and you might want to include:
-L/--relative
so as to not follow links to other servers. Here are some of the options explained, if you only get an index try a “-r”.
-p get all images, etc. needed to display HTML page. --mirror turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings --html-extension save HTML docs with .html extensions --convert-links make links in downloaded HTML point to local files.
There are additional details available as well, see the Wget Manual and its examples. I’ll leave more links in the reference links.
-
- ServerFault – Used to backup websites for server based entities
- Internet Download Manager – This tool has a Site Grabber utility with a lot of options, some of which let you completely download any website you want.
- ItSucks – Already a winner in my book, this program is a java web spider (web crawler) with the ability to download (and resume) files. It can also be customized with regular expressions and download templates. Hosts a GUI and console interface.
- WebSnake – A powerful offline browser for the Windows platform. Worked like a charm when I tried it out.
- WebZip – Another good program worth noting.

A screenshot of the WebZip program doing its thing (4/8/16)
Remember my site if you ever need access to things when you’re off the grid, you never know when you might need my page! Let me know as always if any links die and if this helps you any feel free to comment below.
References
- http://linuxreviews.org/quicktips/wget/ – Wget help #1
- http://www.krazyworks.com/?p=591 – Wget help #2
- http://brew.sh/ – Installs the stuff you need that Apple didn’t bother giving you, at least as far as they are concerned. I don’t have any Apple products at the moments to mess with it but I will try to set up a OS this summer to see what they are all about. The HomeBrew formulae are all in Ruby.
- StackExchange – Recent conversation started on Stack Exchange concerning a couple of these programs and this exact issue.