Rethinking PID 1. Part 4

Table of contents

Ref Upstart

First of all, let me emphasize that I really like the Upstart code, it is very well documented and easy to navigate. In general, other projects (including mine) should learn from his example.
')
I said this, but I cannot say that I generally agree with the approach used in Upstart. But first, a little more about Upstart.

Upstart does not share the code with sysvinit, and its functionality is a superstructure and it provides some degree of compatibility with well-known SysV scripts. Its main feature is an event-oriented approach: starting and stopping processes is tied to events in the system, where an event can be many different things, such as the availability of a network interface or the launch of a program.

Upstart serializes services through these events: if a syslog-started signal was sent, it is used as a signal to start the D-Bus so that Syslog can use it. Further, when the dbus-started signal is sent, NetworkManager will be launched, since now it can use D-Bus and so on and so on.

Some might say that this way the current real dependencies that exist and are understandable to administrators were simply converted and encoded into a set of events and rules of behavior: each logical rule " a needs b ", which the administrator / developer is worried about turns into "run a when b ”plus“ stop a when b is stopped ”will start. In some situations, this is just a simplification of things: for the code in Upstart itself. Nevertheless, I would argue that this simplification of things is in fact pernicious. First of all, the logical dependencies do not disappear, the person who writes Upstart files now has to manually convert these logical dependencies into a set of events and rules of behavior (in fact, there are two rules for each dependency). As a result, instead of letting the machine figure out what to do based on the dependencies, the user must manually convert the dependencies into a simple set of events / rules of behavior. Also, because the dependency information is never encoded, it is not available at runtime, which means that the administrator who is trying to figure out why something happened, for example, why a started, when b started, he has no chance to find out.

Moreover, the event system turns all dependencies on its head. Instead of minimizing the amount of work (which a good loading system should do, as noted at the beginning of the article), it actually increases the amount of work that needs to be done. Or in other words, instead of having a clear goal and doing only what is really necessary to reach the goal, he takes one step, and after completing this step, he performs all the steps that could possibly follow him.

Simply put: the fact that a user launched D-Bus is not an indication that NetworkManager needs to be launched (but this is what Upstart actually would have done). In the opposite case, this statement is also true: when a user requests NetworkManager, this is clearly an indication that D-Bus should be running (which most users expect to see, right?).

A good boot system should run only what is needed and only on demand. And either using "lazy" loading, or using parallelism in advance. However, it should not run more than necessary, in particular, not everything that is installed and can use any service should be running.

Finally, I see no real benefit in using event logic. It seems to me that most of the events that are presented in Upstart are in fact not punctual in nature, but have some execution time: the service starts, runs and stops. When a device is connected, available and disconnected. The mount point during the connection process is fully connected and disconnected. The power cable is connected, the system is powered from the network and the power cable is disconnected. The loading system or the observer process must manage the minimum number of events in order to observe punctuality, most of which are sequences from start, condition, and stop. Again, this information is not available in Upstart, because it focuses on single events and ignores runtime dependencies.

Now, I am aware that some of the problems that I mentioned above were solved in some way by the latest changes to Upstart, in particular the syntax based on conditions such as start on (local-filesystem and net-device-up IFACE = lo) in the Upstart rule files -but. Nevertheless, for me this one seems more like an attempt to fix a system in the code of which there is a hard flaw.

Leaving all this behind, Upstart is a good carer for services, even if some solutions seem ambiguous (read above), and have many unrealized capabilities (also read above).

There are other boot systems besides Upstart, sysvinit and launchd. Most of them offer little more than Upstart or sysvinit. The most interesting rival is Solaris SMF, which maintains the correct dependencies between services. However, in many cases they are too complex and let me say a little academic in their excessive use of XML and new terminology for well-known things. It is also closely related to the specific features of Solaris, such as the contract system.

Putting it all together

So, now is a good time for the second break, before I explain how a good PID 1 should behave and what most of the current systems do, and find out where the dog is buried. So go and pour a new cup of coffee. It will be worth it.

You probably guessed it: what I proposed above as requirements and functionality for an ideal loading system is now actually available, in the (so far experimental) system loading called systemd and which I want to announce right now and here! And again, here is the code. And here is a quick overview of its functionality and the grain of rationality behind them.

systemd runs and manages the entire system (note the name ...). It implements the entire functional indicated above and a few more. Everything is based on the concept of units . Unit have type and name. Since units usually load their configuration from the file system, the unit names are therefore file names. For example: the avahi.service unit reads its configuration file with the same name, and of course the Avahi daemon encapsulates the daemon at the same time. There are several types of units:

service: this is the most obvious type of unit: daemons that can be started, stopped, overloaded, and re-initialized. For compatibility with SysV, we not only support our own configuration files, but also have the ability to read classic SysV boot scripts, in particular, we parse the LSB header, if present. Hence /etc/init.d is nothing but another source of configuration files.
socket: this type of unit encapsulates a socket in the file system or on the Internet.
At the moment we support AF_INET , AF_INET6 , AF_UNIX sockets of the following types: streaming, datagram, and sequential packets. We also support classic FIFOs as transport. Each socket unit has a corresponding service unit, which is started as soon as the first connection to the socket or to the FIFO is received. For example: nscd.socket runs nscd.service on an incoming connection.
device: this type of unit encapsulates the device in a Linux system. If a device is marked with a udev rule, it will appear in the device unit in systemd. Properties set using udev can be used as a configuration source to establish device unit dependencies.
mount: this type of unit encapsulates a mount point in the file system. systemd monitors all mount points while they are connected further in their life path, and can also be used to connect and disconnect mount points. / etc / fstab is used as an additional source for mount points, just like SysV boot scripts can be used as an additional configuration source for service type units.
automount: this type of unit encapsulates an automatic mount point in the file system. Each automount unit has a corresponding mount unit, which starts (i.e. connects) as soon as an attempt is made to access the auto-connect folder.
target: this type of unit is used for logical grouping of units: instead of actually doing something useful, it simply refers to other units, so that they can be managed together. An example would be: multi-user.target , which plays the role of launch level 5 in the classic SysV system, or bluetooth.target , which is requested as soon as any bluetooth dongle is connected to the system and that simply starts all bluetooth-related services that otherwise they wouldn’t be running: bluetoothd and obexd as an example
snapshot: this type of unit resembles a target and does nothing at its core and its only purpose is to link to other units. Snapshots can be used to save / roll back the status of all services and boot system units. Basically, it is intended for use in two cases: to allow the user to temporarily enter a certain state such as “Emergency Shell”, i.e. interruption of current services and an easy way to return to the previous state, i.e. start all services that were temporarily stopped. It can also be used as a simple way to suspend the system (sleep): there are many services that do not behave correctly when the system goes to sleep, and it would often be a good idea to just stop these services before sleep and start the system from sleep after it exits .

All these units can have dependencies between themselves (both positive and negative, ie, “Requires” and “Conflicts”): the device may have a dependency on the service, meaning that as soon as the device becomes available, the corresponding service will start. Mount points have implicit dependencies on the devices from which they connect. Mount points also have implicit dependencies to other mount points that foreshadow them (for example, the / home / lennart mount point implicitly has a dependency on the / home mount point), etc.

Here is a short list of features:

For each process that is spawned, you can control: the environment, resource limits, working and root directories, umask, killer's OOM settings, nice level, IO class and priority, CPU policies and priority, CPU attraction, user id, group id, id side groups, read / write folders and non-access folders, common / private / secondary mount flags, feature set / restrictions, security attributes, CPU scheduler settings, private / tmp space, cgroup for different subsystems. Also you can easily connect stdin / stdout / stderr services to syslog, / dev / kmsg or any TTY. If connected to a TTY for input, systemd makes sure that the process has exclusive access to the TTY, either waiting for access or forcing it.
Each running process gets its own cgroup (currently only in the debugging subsystem, since the debugging subsystem is not used for other purposes and does no more than just grouping processes) and it is very easy to configure systemd so that each service has its own cgroup configured outside of systemd, let's say through utilities libcgroups.
Configuration files use syntax that best follows well-known .desktop files. This is a simple syntax, the parser to which is present in many libraries. Also, it allows us to refer to existing i18n tools in service descriptions. Administrators and developers will not have to learn the new syntax.
As mentioned, we maintain compatibility with SysV boot scripts. We take advantage of the LSB and Red Hat chkconfig headers if they are present. If they are not represented, we try to squeeze the best information available, such as the launch priorities in /etc/rc.d . SysV boot scripts are used as an additional configuration source, therefore, the migration path to systemd is simplified. Optionally, we can read classic PID files for services to identify the master service process. Notice that we take the LSB headers and convert them to the systemd dependency system. Abstract note: Upstart cannot collect and use this kind of information. Download using Upstart on a system where LSB prevails SysV scripts will not be parallelized, although there will be on the same system with systemd. In fact, for Upstart, all SysV boot scripts run together as one task, whereas for systemd they are as another configuration source and they are all managed and designated individually as systemd native service.
In a similar way, we read the existing / etc / fsta b, and consider it as an additional configuration source. Using the fstab comment option = we can even mark an element in / etc / fstab as a systemd-controlled automount point.
If the same unit is configured in several configuration sources (for example, there is a /etc/systemd/system/avahi.service and /etc/init.d/avahi file) then the native configuration file will receive priority, ignoring the outdated configuration file, allowing the package use both the SysV script and the systemd configuration file for some time.
We support a simple template / instance mechanism. For example: instead of supporting six configuration files for six getty, we simply support one getty @ .service instance of which will be created for getty@tty2.service and so on. The interface part can even be inherited by dependency expressions, i.e. it is easy to encode that the dhcpd@eth0.service service starts avahi-autopid@eth0.service
while leaving part of the line - eth0 - disguised.
To activate the socket, we maintain full compatibility with traditional inetd modes, as well as a very simple mode that tries to simulate the activation method of launchd and this is the recommended method for new services. The inetd mode allows you to transfer only one socket to the daemon, while the natively supported mode allows you to transfer as many file descriptors as you like. We also support one instance per connection, as well as one instance per connection. In the first mode, we call the service cgroup to be started, with the connection parameters, and use the template logic mentioned above. For example: sshd.socket can generate services sshd@192.168.0.1-4711-192.168.0.2-22 with the name cgroup sshd @ .service / 192.168.0.1-4711-192.168.0.2-22 (i.e. IP address and port numbers used in the instance name. For the AF_UNIX socket, we use the PID and client connection identifier). This mechanism provides administrators with a good way to identify different service instances or control their runtime individually. Native socket transfer mode is very easy to implement in applications: if the $ LISTEN_FDS variable is set , it will contain the number of transferred sockets and the daemon can find them sorted as indicated in the .service file starting with descriptor 3 (a well-written daemon can also use fstat () and getsocketname () to identify each of the sockets in case there are more than one passed). In addition, we set the $ LISTEN_PID variable to the PID value of the daemon, which should receive the file descriptors, because the environment variables are usually inherited by the child process and, therefore, could be misleading the following along the chain. Moreover, such a logic for transmitting a socket is very easy to implement in demons. We will provide a BSD licensed reference implementation that shows how to work with this. We have ported a couple of demons implementing this scheme.
To some extent, we provide compatibility with / dev / initctl . This compatibility is actually implemented using FIFO-activated services that simply convert these old requests to D-Bus requests. In essence, this means that the old shutdown , poweroff, and similar commands from Upstart and sysvinit continue to work with systemd.
We also provide utmp and wtmp compatibility. Perhaps even a slightly better version than the existing utmp and wtmp .
systemd supports several types of dependencies between units. After / Before can be used to intervene in the activation order of units. Also fully orthogonal Requires and Wants , which are expressed in a positively demanding dependency, either mandatory or optional. Also, there is Conflicts , which is expressed in a negatively requiring dependency. Finally, there are also three other less used dependency types.
systemd have a minimal transaction system. This means: if the unit wants to start or stop, we will add the service and all its dependencies to the temporary transaction . Next, we make sure that the transaction is complete (i.e., sorting through After / Before all units is free from cyclicity). If this is not the case, try to correct it, and remove non-essential tasks from the transaction, which can remove recursion. Also, systemd tries to hold back nonessential tasks that may place a service launch. Non-essential requests are those that are not directly involved, but which are tightened through dependencies such as Wants . Finally, we check if there are any jobs that contradict those that have already been added to the queue, which can then abort the transaction. If all the buzz and the transaction are consistent (holistic) and its impact is minimized, then it will be merged with the upcoming tasks and will be added to the launch queue. In fact, this means that before performing the requested operation, we will check that it makes sense to do it at all, correct it if possible and “give up” if an unresolvable situation actually occurs.
We record the start / stop time, as well as the PID and exit code of each process that we start and which we are tracking. We can use this data to build cross-links between services and their data in abrtd, auditd and syslog. Introduce the user interface that highlights the daemon crashes and provides you with easy navigation to the appropriate user interface for syslog, abrtd, and auditd, which will show the generated information for this daemon on the current launch.
We support the re-execution of the download process by yourself at any time. The state of the daemon is serialized before re-execution and deserialized after execution. In this way, we provide an easy way to facilitate updates to the boot system, as well as the transfer of the boot daemon to the final daemon. Open sockets and autofs mount points are serialized correctly, so they allow you to connect to them all the time, in such a way that clients won't even notice that the boot system restarts itself. Also, the fact that most of the state of services is encoded in the cgroup virtual file system does not allow us to continue execution without access to the serialized data. The restart code path is in fact very similar to the configuration reload code path for the boot system, which guarantees re-execution (which rarely starts) is also tested as a configuration reload (which probably runs more often)
Starting work on the removal of boot scripts from the boot system, we recorded part of the basic system setup in C and transferred it directly to systemd. This also includes connecting file system APIs (i.e. virtual file systems like / proc , / sys and / dev ) and setting the host name.
Server status is monitored and monitored via D-Bus. It has not yet been completed, but has advanced far ahead.
For now, we want to emphasize socket-based and bus-based activation, and therefore support dependencies between sockets and services. We also support several ways such services can signal their readiness: by forking and having the stop status of the launching process (i.e., the traditional daemonize () behavior) as well as watching the bus until a configured service name appears.
There is an interactive mode that asks for confirmation every time the process is spawned by systemd. You can enable it by passing systemd.confirm_spawn = 1 in the kernel launch arguments.
With the kernel parameter systemd.default = you can specify from which unit the systemd boot will start. Usually you specify something like multi-user.target , but you can even specify one single service instead of target. For example, out of the box we provide emergency.service , which is similar in its usefulness to init = / bin / bash , nevertheless having the advantage of running the boot system, therefore providing the ability to boot a full-fledged system from the emergency shell.
There is also a minimal user interface that allows you to start / stop / inspect services. It is far from a full user interface, but is useful as a debugging tool. It is written in Vala ( yoo ee !!) and has the name systemadm

It should be noted that systemd uses many specific features of Linux and does not limit itself to POSIX. This gives us access to a huge amount of functionality that cannot be provided by a system designed to be transferred to other operating systems.

To be continued…

Source: https://habr.com/ru/post/336834/

All Articles

Rethinking PID 1. Part 4

Ref Upstart

Putting it all together

More articles: