📜 ⬆️ ⬇️

Why and how do we back up github



Let me start to philosophize a little on technology. Technology allows us to focus on the result, on the ultimate goal, give a sense of control. Here you are in a snow white jacket on the bridge of your high-tech liner. Your liner is equipped with everything you need to counter any problem. You are not afraid of waves, icebergs and even drunken boatswains.

In general, the lyrical introduction was inspired by a very specific story about a broken githab. The synchronization of the home repo to the githabb made at the dawn of one of the projects solved the problem of moving. Then they forgot about the crutch. Ancient evil fell asleep and patiently waited in the wings. One fine day , the new employee decided to put in order the same home repository. And the most popular question among programmers that day was “a colleague, and you did not see my 0022 branch? Well, this, with bugfixes. " The guide was more relaxed than ever: git is a distributed system, the version of the code is stored on the personal computer of each developer. Let us somehow sort things out between each other and do not distract us from assembling our cores and tuning network stacks.
')
And yet why ..?

Indeed, you can fantasize a large number of possible problems, for example:



So, what and how to back up ...

In short, our cunning plan of action:

  1. We receive the list of repositories for the organization
  2. Clone the repositories of the posed list.
  3. Archiving
  4. We put in AWS S3


A bit more specifics when using github.com


It is reasonable to have a separate readonly user for the backup procedure. It is also necessary to generate token for it (Settings -> Personal Access Tokens -> Generate new token).

First, using pygithub3, we get the repositories that we are going to back up later:
from pygithub3 import Github def get_repos(args): config = {'token': args.token} gh = Github(**config) return gh.repos.list_by_org(args.organization).all() 


For cloning we will use console git:
 def clone_repo(repo_list,args): if os.path.isdir(args.directory): shutil.rmtree(args.directory) os.mkdir(args.directory) if args.mirror is True: args.git += " --mirror" for repo in repo_list: repo_url = "https://%(token)s:x-oauth-basic@github.com/%(organization)s/%(repo)s.git" % {'token': args.token, 'organization': args.organization, 'repo': repo.name} os.system('git clone %(arguments)s %(repo_url)s %(directory)s/%(repo)s' % {'arguments': args.git, 'repo_url': repo_url, 'directory': args.directory, 'repo': repo.name}) 

Note the option "--mirror" - using it creates a mirror copy of the remote repository.

By the way, in the case of using bitbucket.org ...


Get the list of repositories:
 def _get_repositories(owner, username, password): auth_value = ('%s:%s' % (username, password)).encode('base64').strip() headers = {'Authorization': 'Basic %s' % auth_value} url = 'https://bitbucket.org/api/2.0/repositories/%s?role=member' % owner values = [] while url is not None: request = urllib2.Request(url, None, headers) data = json.loads(urllib2.urlopen(request).read()) values = values + data['values'] url = data.get('next') return values 


And we clone:
 def _git_clone(username, password, directory, sub_dir_name, owner, slug, verbose=False): os.chdir(directory) cmd = 'git clone --mirror https://%s:%s@bitbucket.org/%s/%s.git %s' % (username, password, owner, slug, sub_dir_name) proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE) ret_value = proc.wait() msg = proc.stdout.read() sys.stdout.write('%s%s%s%s' % (sub_dir_name, os.linesep, '=' * len(sub_dir_name), os.linesep)) sys.stdout.write("%s%s" % (msg, os.linesep)) return ret_value 

By the way, slug is the url-friendly name of your bitbucket repo.

A ready script for github can be found here .

Source: https://habr.com/ru/post/273513/


All Articles