Content: The heartbreaking growth rate of auth.log on hosts with neutron-plugin-openvswitch-agent. Analysis of the causes, the method of elimination. A little about the work of sudo, PAM and its session.

What are we talking about? Openstack - a platform for building clouds. Neutron is the name of its subsystem responsible for the network, a
fashionable hipster webmaster , considered more advanced and functional than the first attempt called nova-networking. openvswitch-plugin is a neutron plugin that implements its functionality using an Open vSwitch, a soft switch that allows you to do smart things, like GRE tunnels, bonding and port mirroring, imposing rules on a port inside a virtual switch in iptables style, etc.
')
neutron-openvswitch-plugin-agent is one of the components of this plug-in that works on all hosts that have at least some real relation to the transfer of virtual machine network traffic. In other words, these are all compute nodes (where the virtual machines work), networking sites (which make the Internet for virtual women). Only API servers and other service servers drop out of the list. Given that most of the cloud consists of compute + networking, it is possible, slightly coarsening, to say that this neutron-openvswitch-plugin-agent is installed on all hosts. Logstash is a system for centralized logging, Elasticsearch is a database for working with these logs.
For timely response to software problems, all logs of all applications must be collected and analyzed by the monitoring system. We have already
written more about this. However, even good can be too much. It was quickly discovered that most of the data collected from the hosts are absurd messages of the following type:
server sudo: neutron: TTY = unknown; PWD = /; USER = root; COMMAND = / usr / bin / neutron-rootwrap /etc/neutron/rootwrap.conf ovs-vsctl --timeout = 2 --format = json - --columns = name, external_ids list Interface
server sudo: pam_unix (sudo: session): session opened for root by (uid = 108)
server sudo: pam_unix (sudo: session): session closed for user root
server sudo: nova: TTY = unknown; PWD = /; USER = root; COMMAND = / usr / bin / nova-rootwrap /etc/nova/rootwrap.conf chown 107 / var / lib / nova / instances / _base / 421f0808ac5dd178bc574eff6abe8df949edc319
server sudo: pam_unix (sudo: session): session opened for root by (uid = 107)
server sudo: pam_unix (sudo: session): session closed for user root
These messages were repeated every two (!) Seconds. From each server on which there is an agent. It turns out that from every hundred servers we will receive 730 GB of garbage per year. Logs are collected in Elasticsearch, that is, it is, sorry, terabytes in the database. Good luck in query queries, so to speak.
On the one hand, all this rubbish is asking for a shutdown.
On the other hand, the complete disconnection of auth.log or its rigid local rotation is a bad decision, because information about suspicious activity should be collected and stored as carefully as possible, and, preferably, away from the greedy pens of a compromised host (if such a one appears).
In addition to the neutron-plugin-openvswitch-agent, the nova-compute (virtualizers manager in Openstack) also appeared nearby, although not so often.
PAM and its sessions
Most likely, many readers use sudo. Most likely, many of those who use sudo do not particularly think about what happens when it is launched.
When sudo asks for a password, it can request authentication from the user. Most often this is a password, but, for example, on my laptop it is a fingerprint scanner (which is not used when logging in, because the password at login decrypts the disk, but is great for secure sudo). In order not to implement the code for working with passwords and other things in each program, PAM was created - a user authentication system that allows you to verify that the user knows the password, without writing the / etc / shadow parser and supporting all other authorizations (including LDAP, kerberos) , OpenID, OTP and any other crazy things that will only come up).
PAM sessions are a collection of rituals that PAM performs when a user logs in. Among other things, the ritual includes an entry in auth.log.
Generally speaking, PAM sessions are not required for normal operation. Request a user check and create a session for him — three big differences (the third is the “close session” function). All self-respecting programs use them. But in our case it causes some problems.
More information about PAM, its history and principles of operation are on
the IBM website , for the meticulous ones - the
Linux-PAM System Administration Guide .
But why so often?
Let me remind you that the problem is that the logs from sudo and pam_session took place almost constantly - every two seconds.
What kind of messages about pam_session we have already figured out. It remains to find out why neutron (and a little bit nova) to run something so often, and even with sudo?
In the Openstack model, components are not particularly eager to learn the subtleties of the underlying technologies. Instead of receiving information about changes from ovs-vswitchd, libvirtd, etc., and for a long time to understand “what would this mean and should we react to it?” A much more crude and efficient approach is used: when is the api-server which - either the subsystem wants to know the status of the service, it sends a request through the message queue to it - and the service, in turn, requests the current state from the used programs.
At the output, the server has a lot of information: firstly, that the service is still alive (heartbeats), and secondly, information about the current configuration. Without any nuances, “missed a Very Important State Update”.
For neutron to work properly, it needs to know the state of the ports. In the context of openvswitch-plugin-agent, this is launching ovs-vsctl show and the like to get the current status of all ports.
Unfortunately, this requires privilege.
To manage privileges, Openstack uses its own wrapper, root-wraper, which performs a more subtle check of the arguments to run than the usual sudo. But the root-wraper itself is written on Python (like everything in Openstack), and you cannot put suid on the interpreted files. As a simple and reasonable solution, root-wraper is invoked as
sudo root-wraper
.
And what to do?
Our goal:
- Do not save information about root-warp performance by nova / neutron users via sudo
- Save all other information.
The first item boils down to two independent questions: disable recording information from sudo, and disable recording information from pam_session.
Preliminary: if you are using old distros, upgrade sudo to version 1.8.7 or newer (Ubuntu 12.04 comes with 1.8.3, everything is fine with Ubuntu 14.04).
Then for neutron, in the /etc/sudoers.d/neutron_sudoers file, we replace the lines:
- Defaults:neutron !requiretty + Defaults:neutron !requiretty, !syslog, !pam_session
For nova, the changes are similar, in the /etc/sudoers.d/nova_sudoers file.
A brief summary of sudoers syntax, for those who don’t like BNF from the man pages - the exclamation mark inverts the meaning, Defaults: apply to a specific section.
- syslog - enables logging in syslog,!
!syslog
, respectively, disables. - pam_session - indicates to create a new PAM session
!pam_session
- says not to create a new session. It was for pam_session that we needed a new version of sudo.
Neutron polling frequency control
The obvious thought - and let's lower the frequency of the survey? From a practical point of view, reducing the frequency means increasing the interval between “done” and “saw the changes”. If two seconds seem too frequent, then you can do, for example, ten. But to put there a minute is already a bust (there will be trouble with interactivity). At the same time, ten seconds will not save us much - instead of 43 thousand calls per day (for each server), we will have almost 9 thousand.
For frequency reduction, this is the configuration variable of the neutron-server:
report_interval
(in seconds).