Imagine the situation: you are creating a website. Hire a webmaster or do it yourself, spending a lot of money and personal time on it. Place your brainchild on a hosting and lovingly fill it with information, without thinking that you need to save a copy of the site so as not to lose data.
One day, not very beautiful for you, you go to your site, but it does not work. You begin to find out what’s the matter, and, horror, the data center burned out or the hosting flew. Or maybe a virus got to you and destroyed the data. Losing information on a site is comparable to losing information on a computer. So how do you keep a copy of the site?
We will deal with the definition first. The process of archiving websites is to save the current version of a page or site in an archive for later work with it. For these purposes, specialized software is used. The largest company in the world is the Internet Archive, which we will talk about below.
For a private archive, you can use offline browsers that have been specifically created to work offline. They will help create local copies of individual web pages or entire sites. These include, for example:
- A cross-platform HTTrack browser that supports 29 world languages and is able to resume interrupted downloads and update the site’s mirror.
- Shareware Offline Explorer, which allows you to download not only files or pages, but also entire sites from the Internet via FTP, HTTP, HTTPS, RTSP, MMS, BitTorrent.
- Download Manager Free Download Manager. It integrates with all browsers, has built-in FTP, supports the BitTorrent protocol, can create torrent files, intercept links from the clipboard.
- Closed-source Teleport Pro for Windows. The program allows you to download entire sites.
- Free console non-interactive program for downloading files and sites from the Internet Wget. The program supports HTTPS, HTTP, FTP, and can also work through an HTTP proxy server. Suitable for Linux.
Creating a backup on the hosting
You can set up a site backup on your hosting provider. To do this, you need to go to the admin panel, in the backup section. Each hosting has its own admin panel, and it’s hard to say exactly where your section is located. If you can’t figure it out, write to technical support.
If your site is hosted on a CMS platform, such as WordPress, for example, you can save a copy of the site by installing the wp-db-backup plugin (en.wordpress.org/plugins/wp-db-backup/) or similar ones. Having correctly configured the plugin, you will receive a backup of the site every day or every week, as you wish.
How to save a copy of the site on a computer
You can save the site to the computer using an FTP client. If you use the FileZilla program, then create a backup folder on your computer (the folder name can be any). Connect to the server through an FTP client and simply drag and drop to make a full backup of the site in the "Backup" folder.
In addition, you can use the Site2ZIP service (archived site), a program for downloading WinHTTrack WebSite Copier. How to view a saved copy of the site? To do this, open the folder in which the site was saved and click on the index.html file.
In San Farncisco in 1996, Brewster Cale founded the not-for-profit organization Internet Archive. It collects copies of all web pages, audio and video recordings, graphic files and programs. The archives of the collected material have been stored here for a very long time and there is free access to their databases for everyone.
If you are wondering how to open a saved copy of a site, go to archive.org/web/ and enter the address of the site or page in the appropriate field. At the end of 2012, the size of the Internet archive was 10 petabytes - this is 10,000 terabytes! And by mid-2016, it had accumulated 502 billion copies of web pages.
Search Engine Caching
A saved copy of a Google site is nothing more than a cache of the site’s pages, which was made by a search engine. Any user at any time can use a copy of the page for their needs. Saving them on search engine servers takes a lot of resources, and a lot of money is allocated for this, but such help pays off, since we still go to search engines. True, this method is only suitable for existing sites or for those that have been recently deleted. If this happened a long time ago, the search engine erases the data.
Specialized Search Engine
Besides the fact that you can manually search for cached pages in Google or Yandex, you can use the specialized cachedview.com search engine. It has an analogue: cachedpages.com.
If you want to keep a copy of the site or its separate page, you can do it yourself and for free on archive.is. In addition, there is a global search for versions that have ever been saved by the user.
Creating a web archive in national libraries
Today, national libraries are faced with the task of creating archives of Internet documents that are part of the scientific, cultural and historical heritage of mankind. But this is very problematic.
Studies have shown that the number of web documents on the Web is growing exponentially, and on average a document lives from one to four months. It is most convenient to use a website as the accounting unit for a web document archive. The process of creating a fund is to create a copy or "mirror" of the site. Since the information on it changes over time, the library needs to create mirrors of the same website with a certain frequency.
So, in Sweden there are 60,000 sites, which is 20 times the number of traditional print media. Copies of printed documents in the Swedish library occupy 1.7 km of shelves per year. A web archive would take 25 km of shelves! Now their archive contains 138 million files with a total weight of 4.5 gigabytes.
The internet is growing every day. There are many companies and sites that take care to keep copies of web pages in their archives. But you should not rely only on them. Make timely backups and you will never lose your site.