📜 ⬆️ ⬇️

RPM repository - do it yourself

So, let's begin.


When implementing a DevOps process in a company, one of the possible storage options for assembly artifacts could be an rpm repository. Essentially, it's just a web server that distributes content in a certain way organized. There are, of course, commercial versions of maven repositories that have plugins to support rpm, but aren't we looking for easy ways?


image


Task


Write a service that will accept ready-made rpm-packages via HTTP protocol, parse their metadata, list package files according to the internal repository structure and update the repository metadata after processing the next package. What came out of it - described under the cut.


Analysis


In my head, the task almost instantly fell apart into several parts: the first is the host, which should receive the rpm package via HTTP; the second one is the processing one, which the received RPM package should process. Well, somewhere else there should be a web server that will distribute the contents of the repository.


Host part


Due to the fact that I have been familiar with Nginx for a long time, the choice of a web server to receive rpm-packages and distribute the contents of the repository did not even stand - only Nginx. Taking it for granted, I found the right options in the documentation and wrote


The part of the Nginx configuration that accepts files
location /upload { proxy_http_version 1.1; proxy_pass http://127.0.0.1:5000; proxy_pass_request_body off; proxy_set_header X-Package-Name $request_body_file; client_body_in_file_only on; client_body_temp_path /tmp/rpms; client_max_body_size 128m; } 

The result of this configuration is that when receiving a file, Nginx will save it to the specified directory and report the original name in a separate header.


To complete the picture - the second tiny


The part of the configuration that distributes the contents of the repository
 location /repo { alias /srv/repo/storage/; autoindex on; } 

So, we have the first part, which is able to receive files and give them away.


Processing part


The processing part is written in Python without any special knowledge and looks like


Like this:
 #!/usr/bin/env python import argparse import collections import pprint import shutil import subprocess import threading import os import re import yaml from flask import Flask, request from pyrpm import rpmdefs from pyrpm.rpm import RPM #     () Sergey Pechenko, 2017 #  - GPL v2.0.       . #          . class LoggingMiddleware(object): #        def __init__(self, app): self._app = app def __call__(self, environ, resp): errorlog = environ['wsgi.errors'] pprint.pprint(('REQUEST', environ), stream=errorlog) def log_response(status, headers, *args): pprint.pprint(('RESPONSE', status, headers), stream=errorlog) return resp(status, headers, *args) return self._app(environ, log_response) def parse_package_info(rpm): #    os_name_rel = rpm[rpmdefs.RPMTAG_RELEASE] os_data = re.search('^(\d+)\.(\w+)(\d+)$', os_name_rel) package = { 'filename': "%s-%s-%s.%s.rpm" % (rpm[rpmdefs.RPMTAG_NAME], rpm[rpmdefs.RPMTAG_VERSION], rpm[rpmdefs.RPMTAG_RELEASE], rpm[rpmdefs.RPMTAG_ARCH]), 'os_abbr': os_data.group(2), 'os_release': os_data.group(3), 'os_arch': rpm[rpmdefs.RPMTAG_ARCH] } return package #      app = Flask(__name__) settings = {} #   -     @app.route('/') def hello_world(): return 'Hello from repo!' #     URL @app.route('/upload', methods=['PUT']) def upload(): #    status = 503 headers = [] #        Nginx  curr_package = request.headers.get('X-Package-Name') rpm = RPM(file(unicode(curr_package))) rpm_data = parse_package_info(rpm) try: new_req_queue_element = '%s/%s' % (rpm_data['os_release'], rpm_data['os_arch']) dest_dirname = '%s/%s/Packages' % ( app.settings['repo']['top_dir'], new_req_queue_element) #      shutil.move(curr_package, dest_dirname) src_filename = '%s/%s' % (dest_dirname, os.path.basename(curr_package)) dest_filename = '%s/%s' % (dest_dirname, rpm_data['filename']) #   shutil.move(src_filename, dest_filename) #  ,     response = 'OK - Accessible as %s' % dest_filename status = 200 if new_req_queue_element not in req_queue: #         req_queue.append(new_req_queue_element) event_timeout.set() event_request.set() except BaseException as E: response = E.message return response, status, headers def update_func(evt_upd, evt_exit): #  ,     while not evt_exit.is_set(): if evt_upd.wait(): #       curr_elem = req_queue.popleft() p = subprocess.Popen([app.settings['index_updater']['executable'], app.settings['index_updater']['cmdline'], '%s/%s' % (app.settings['repo']['top_dir'], curr_elem)], shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE) res_stdout, res_stderr = p.communicate(None) pprint.pprint(res_stdout) pprint.pprint(res_stderr) #    evt_upd.clear() return def update_enable_func(evt_req, evt_tmout, evt_upd, evt_exit): while not evt_exit.is_set(): #   evt_req.wait() # OK,  #   30 ,       .... while evt_tmout.wait(30) and (not evt_exit.is_set()): evt_tmout.clear() if evt_exit.is_set(): break evt_upd.set() evt_tmout.clear() evt_req.clear() return def parse_command_line(): #     parser = argparse.ArgumentParser(description='This is a repository update helper') parser.prog_name = 'repo_helper' parser.add_argument('-c', '--conf', action='store', default='%.yml' % prog_name, type='file', required='false', help='Name of the config file', dest='configfile') parser.epilog('This is an example of Nginx configuration:\ location /repo {\ alias /srv/repo/storage/;\ autoindex on;\ }\ \ location /upload {\ client_body_in_file_only on;\ client_body_temp_path /tmp/rpms;\ client_max_body_size 128m;\ proxy_http_version 1.1;\ proxy_pass http://localhost:5000;\ proxy_pass_request_body off;\ proxy_set_header X-Package-Name $request_body_file;\ }\ ') parser.parse_args() return parser def load_config(fn): with open(fn, 'r') as f: config = yaml.safe_load(f) return config def load_hardcoded_defaults(): #    " " config = { 'index_updater': { 'executable': '/bin/createrepo', 'cmdline': '--update' }, 'repo': { 'top_dir': '/srv/repo/storage' }, 'server': { 'address': '127.0.0.1', 'port': '5000', 'prefix_url': 'upload', 'upload_header': '' }, 'log': { 'name': 'syslog', 'level': 'INFO' } } return config if __name__ == '__main__': try: cli_args = parse_command_line() settings = load_config(cli_args['configfile']) except BaseException as E: settings = load_hardcoded_defaults() req_queue = collections.deque() # Application-level specific stuff # Exit flag exit_flag = False # ,     event_request = threading.Event() # ,     event_timeout = threading.Event() # ,      event_update = threading.Event() # ,      event_exit = threading.Event() #     event_request.clear() event_timeout.clear() event_update.clear() # ,      update_thread = threading.Thread(name='update_worker', target=update_func, args=(event_update, event_exit)) update_thread.start() # ,   ,    ,    #    -    delay_thread = threading.Thread(name='delay_worker', target=update_enable_func, args=(event_request, event_timeout, event_update, event_exit)) delay_thread.start() #    app.wsgi_app = LoggingMiddleware(app.wsgi_app) app.run(host=settings['server']['address'], port=settings['server']['port']) #        event_exit.clear() 

An important and, most likely, incomprehensible at first sight moment - why are flows, events and a queue needed here?


They are needed to transfer data between asynchronous processes. Look, because the HTTP client is not obliged to wait for some permission to download the package? That's right, it can start downloading at any time. Accordingly, in the main application thread, we need to inform the client about the success / failure of the download, and if the download was successful, transfer the data through the queue to another stream that reads the package metadata and then moves it to the file system. At the same time, a separate stream keeps track of whether 30 seconds have passed since the last packet was downloaded or not. If passed - the repository metadata will be updated. If the time has not yet come out, and the next request has already arrived, reset and restart the timer. Thus, any download package will delay the update metadata for 30 seconds.


How to use


First need


Install Python packages by list:

appdirs == 1.4.3
click == 6.7
Flask == 0.12.1
itsdangerous == 0.24
Jinja2 == 2.9.6
MarkupSafe == 1.0
packaging == 16.8
pyparsing == 2.2.0
pyrpm == 0.3
PyYAML == 3.12
six == 1.10.0
uWSGI == 2.0.15
Werkzeug == 0.12.1


Unfortunately, I can not guarantee that this is the minimum possible list - the pip freeze command simply takes a list of available Python packages and mechanically transfers it to a file, not considering whether a particular package is used in a particular project or not.


Then you need to install packages with nginx and c createrepo :


 yum install -y nginx createrepo 

The launch of the project looks like this:


 nohup python app.py 

After everything is up and running, you can try to download the rpm package into the repository with this command:


 curl http://hostname.example.com/upload -T <packagename-1.0.rpm> 

I understand that the described service is far from perfect and is a prototype rather than a full-fledged application, but, on the other hand, it can be easily supplemented / expanded.


For convenience, those wishing to code is posted on GitHub . Suggestions for complementing the service, and even better - pull-requests are warmly welcome!


I hope this prototype will be useful to someone. Thanks for attention!


PS

Well, for those who really need a small snippet to tame SELinux:


 #!/bin/bash semanage fcontext -a -t httpd_sys_rw_content_t "/srv/repo/storage(/.*)?" restorecon -R -v /srv/repo/storage setsebool -P httpd_can_network_connect 1 

')

Source: https://habr.com/ru/post/337736/


All Articles