Backing up your site with git and a makefile

Translating a site into a set of static web pages allows you to reduce server load or even use free storage, as well as improve the reliability, speed and security of the site. In this article I will tell you how to do this using the familiar tools of git and Makefile . The advantage of this approach is also the ability to control versions of the content of web pages.

The article describes how to make static versions of web pages for issuance by the server and how to put them in a repository for version control and backup. In this case, static and media files can be stored separately and archived by other means (static is usually placed in the repository for the program code of the site). The method also works for pages with Unicode names (for example, for Cyrillic domains). At the end is a working Makefile.

The author uses the django / uwsgi / nginx stack, a virtual dedicated server running GNU / Linux, but the content of the article is almost independent of specific technologies.

Loading pages

We will save the site pages using the standard wget program. We will save each site in a separate directory (which may not be associated with the domain name of the site).

Inside each directory the pages will be saved recursively using the wget -r key (it is assumed that all pages can be followed by links from the main one). By default, recursive copying goes up to level 5, but this can be changed with the -l option .

If we store media and static files separately from text pages, then the corresponding directories are ignored with the -X option .

The complete command looks like this:

mkdir primer cd primer wget -r -nH -X ,static --restrict-file-names=nocontrol .

-nH means --no-host-directories . By default, wget -r example.com will put everything in the example.com/ directory, and this option cancels the creation of a directory with the host name.

The --restrict-file-names option specifies that characters in the URL be escaped when creating local files. The value nocontrol means disabling screening and is very important for saving pages with Cyrillic links. Without it, the pages are saved to files with slightly changed names, and it is not quite clear how to give them to the server. Unfortunately for Windows users, --restrict-file-names = nocontrol will not work for them, this is a known problem .

Add to git

A new repository is created using the git init command. By default, it is created inside the current directory in the .git folder, but we want the server to have access only to files that correspond to the names of the open pages of the site. Therefore, the full team that creates a clean (bare) repository in the ../.git-primer folder looks like this:

 git init --bare ../.git-primer

To further use this non-standard repository, you need to pass the git-dir and work-tree options to git :

 git --git-dir=../.git-primer --work-tree=. add .

Writing Makefile

Let's start with the announcement of our projects:

 SITES := example primer all : $(SITES) .PHONY : $(SITES)

The variable SITES contains the names of our projects. The default target is the first target all , that is, to perform all actions, it will suffice to type one make command . All targets in SITES are fictitious ( PHONY ): the recipe for each of them will be executed regardless of the existence of the directory and the time it changes.
The basic introduction to make can be read, for example, here , and the basic guide is info make ( original , translation ).

The rule for each of the projects is as follows:

 $(SITES) : if [[ -d .git-$@ ]]; \ then \ $(get-data); \ $(mgit) add . && \ if [[ -n "`$(mgit) status --porcelain`" ]]; then \ $(mgit) commit -m "Update $@."; \ fi \ else \ $(init-git); \ fi

This rule is essentially one shell command.
$ @ Is an automatic variable containing the name of the current target (for example, primer).
First we check if the .git-primer directory exists. If yes, then go to the project directory, download the pages, add them to git.
If the content of the pages has not changed, git will not add anything, but in this case the commit will cause an error and stop the execution of the Makefile. Therefore, we first call git status with the option - porcelain , which is intended for use in scripts. If the length of the git status output line --porcelain is not null, then we can commit ( from here ).

get-data, mgit and init-git are canned recipes in the Makefile. For example, mgit is a call to git with an indication of the directory with the repository files and the working directory:

 define mgit = git --git-dir=../.git-$@ --work-tree=. endef

Canned recipes are created when one sequence of commands can be used in several recipes. They can consist of several lines, each of which is automatically tabulated in recipes (more precisely, with the .RECIPEPREFIX symbol). In our example, the indent is made only for the convenience of reading the Makefile.
During the execution of recipes, each line of the prepared sequences is interpreted as a separate line of the recipe, that is, in particular, they can use automatic variables for a given purpose .

The full makefile looks like this:

 SITES := primer example SERVERHOST := example # .  punicode SERVERHOSTNAME := xn--e1afmkfd.xn--p1ai SERVERPATH := ~/archive all : $(SITES) .PHONY : $(SITES) # target-specific variables primer : DOMAIN := . primer : EXCLUDEDIRS := ,static example : DOMAIN := example.com ifeq ($(SERVERHOSTNAME),$(shell hostname)) # Server define mgit = git --git-dir=../.git-$@ --work-tree=. endef define init-git = mkdir -p $@ && \ $(get-data) && \ git init --bare ../.git-$@ && \ $(mgit) add . && \ $(mgit) commit -m "Initial commit of $@." endef define get-data = cd $@ && \ wget -r -nH -X $(EXCLUDEDIRS) --restrict-file-names=nocontrol $(DOMAIN) endef else # Workstation define init-git = git clone $(SERVERHOST):$(SERVERPATH)/.git-$@ $@ endef endif $(SITES) : ifeq ($(SERVERHOSTNAME),$(shell hostname)) # Server if [[ -d .git-$@ ]]; \ then \ $(get-data); \ $(mgit) add . && \ if [[ -n "`$(mgit) status --porcelain`" ]]; then \ $(mgit) commit -m "Update $@."; \ fi \ else \ $(init-git); \ fi else # Workstation if [[ -d $@/.git ]]; \ then \ cd $@ && git pull; \ else \ $(init-git); \ fi endif

In the fourth paragraph there are target-dependent variables: for each goal you can set its own value for this variable. These values are also transmitted to the prerequisites of each of the goals and to the used prepared recipes, that is, we can be sure that the recipe for each site will be executed with the correct name of the site and its directory.
For each project, we can transfer our non-archived directories via the EXCLUDEDIRS variable or leave it empty. Similarly, you can change the name of the server for archiving from the working computer ( SERVERHOST ) and the path on the server to the directory with the archive of sites ( SERVERPATH ). For simplicity, in this example, all sites are on the same server and are archived in one directory.
Since each line of the recipe (including the prepared one) is executed in a separate shell , so that the transition to the directory remains in effect for the following commands, we use the "and" && operator and escape the end of the line \.

Next comes the conditional Makefile construction : using the shell hostname command, we check if make is running on the server or on the local computer. Lines that do not satisfy the current branch of the conditional directive are completely ignored by the Makefile.

Difference between local and server repositories

The local computer serves primarily to store data, so we only copy data from the server ( git pull ) to it, and for the convenience of local work with git (viewing logs or versions of files) we use the default repository structure (the usual repository in the .git folder ).
In both cases, a single make command is sufficient. For automatic copying, you can use the cron scheduler. In order not to enter the password for access to the server each time, ssh-keys are generated.

For the convenience of working on the server, you can create an alias git with the specified configuration from the current site directory:

 alias mgit="git --work-tree=. --git-dir=../.git-${PWD##*/}"

The variable $ {PWD ## * /} contains the name of the current directory without a path to it and is included in the POSIX standard, that is, it can be used in all shells supporting POSIX.

The conditional directive can also be used in recipes, the only restriction is that its beginning and end cannot be in different files.

Server

After running make, the archive directory looks like this:

 $ ls -a . .. .git-example .git-primer Makefile example primer $ ls -a primer . .. index.html - - $ # ,     .    ./-  ./-

The nginx configuration file for example. RF may look like this:

 server { server_name xn--e1afmkfd.xn--p1ai; charset utf-8; location = / { root /home/user/archive/primer; try_files /index.html =404; } location / { root /home/user/archive/primer; default_type "text/html"; try_files $uri =404; } location = /index.html { return 404; } }

The first location corresponds to the main page of the site, example. It is saved to wget as an index.html file. If it is not found, error 404 is issued.

For all other URIs, files are checked in the primer directory with the name URI. If they are not found, 404 is issued.

In the end, in order to avoid duplication of content, we explicitly prohibit access by reference example.rf / index.html (404). A little more detail about this configuration is written here .

Conclusion

Backing up sites can be done using standard tools wget , git , make . You can copy all pages of the site or exclude media and a number of other files as accurately as wget allows. Similarly, using .gitignore , you can control which static pages will be added to the repository for backup and which ones will not. Makefile allows you to flexibly manage different configurations for different projects. The complete example of the Makefile for the client and for the server above, with only about 60 lines.

It is assumed that the change and addition of site content occurs through standard mechanisms, that is, for this CMS or CMF is launched. If this happens rarely, then after work they can be turned off, freeing up system resources and delivering the saved static pages. An example of a more complete automation may deserve a separate article.

The proposed method is suitable primarily for small projects that are rarely updated, so the issues of performance and security are almost not considered here. Since we told wget not to escape characters from URIs, if arbitrary users can add files to the site, escaping or prohibiting them from adding should happen immediately.

Saving versions of the contents of the site can also be carried out through the database when changing its pages. But this requires support for versions with CMF models, as well as greater control over the database dump (completely copy it after any page editing). In the proposed method, in the case of a small change in the contents, only this change will be added to the repository, and no full copy of the database will be required. In addition, the generated static pages can be directly used by the server or viewed in a browser (changes in the design or other program code of the site will also be copied).

Alternative backup programs are listed here . To store and synchronize media files you should pay attention to git-annex . Separating the .git repository from the working tree is also successfully used to manage user configuration files ( dotfiles ). Today there are servers that directly support the work with git-repositories.

Source: https://habr.com/ru/post/425259/

All Articles

Backing up your site with git and a makefile

Loading pages

Add to git

Writing Makefile

Conclusion

More articles: