"System Administrator's Toolkit" or "How We Work"

How do system administrators work, what do they use in their daily work, what utilities make life easier for us?
We will try to briefly answer these questions and describe how our work is arranged.

So, in principle, the system administrator should do (be able to do):
Install / update / uninstall software
Software setting
Plan work
Document
Monitor the state of IT systems
Diagnose and maintain IT systems
Backup / backup software and data

For all this, there are many different software, try to describe all the most necessary.

Environment

How the work of the system administrator begins - with the choice of a comfortable environment. This is not a spacious light office and comfortable chair, although it is important, and the choice of operating system. My colleagues work on Mac OS X, Windows and Ubuntu. After many trials and experiments with distributions and different desktop shells (Debian, Kubuntu, Fedora Gnome / Cinnamon), I stopped at Ubuntu 12.04 (Unity).
I just got used to it and got used to Unity, now it seems to me a logical and quite comfortable shell of the PC. The main requirement for the admin is stability and simplicity. I don’t want to argue with anyone, but this is how Ubuntu has established itself.
')
I probably will not describe in detail the means for planning documentation. I note that for documenting I have enough notebook (Gedit) and Libre Office and Google Docs for sharing docks. All the most useful notes and documents we store in the wiki. For accounting of working hours and control of performance of tasks we use Redmine.
Of course, the necessary tools for communication with colleagues and customers. Usually, this is e-mail and Skype and phone.
For planning I use Google Calendar, any other tool will be equally good if you actually use it, for example, a diary.

One way or another, you have to work a lot on the console, locally or on remote servers. Therefore, the most important things in the daily work of the administrator * nix are command line utilities.

Command line utilities

Most often, you have to work with text (configs, logs, manuals, etc.), so let's start with text editors and other tools for manipulating text and strings:
vi / vim - this editor is an indispensable tool for any administrator, since it can be found on almost any server (on any Unix), and if you still have the first thing when it opens right out of it by ctrl + z, then I highly recommend you at least read man vim how to do it right.
nano is the simplest editor, shipped by default on Debian / Ubuntu.
mcedit - for fans of mc and blue screens, not the worst option, is quite simple and convenient.
cat - concatinate is initially a tool for combining files, but is more often used to display the contents of a file, as well as to insert lines into a file.
tail - by default, displays the last 10 lines of the file, can be used to monitor the log (with the -f key)
wc - counts word strings and the number of bytes in the file
head - like tail prints 10 lines of a file, but as it is clear from the name, from the beginning of the file.
grep is the most indispensable tool for filtering text output or just for finding the desired string in files.
sed - stream editor - is very useful for automating routine tasks, replacing lines in files (editing configs), deleting lines, displaying content by mask, etc.
awk / gawk is a very powerful utility (in fact, AWK is a whole programming language) that helps parse files and manipulate the output of strings and whatnot.
Information on all the listed utilities can be found in man. Speaking of man:
man - access to reference pages is something that no Unix-administrator can work without. If you are a novice specialist, be sure to start by typing man in the command line.
The list can be continued, for a long time, there are actually a lot of utilities. But for now let's stop on this and see what can be done with awk, wc, cat and grep, reading a little man.

Some practice

What is all the same useful in practice, these utilities, try to show the example of parsing logs.

There is an apache log file (access.log) in the following format:

1.2.3.4 - - [12/May/2014:03:08:55 +0400] "GET /shop/goods-list HTTP/1.1" 200 1691 "http://example.ru/shop/goods-list" "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0"

Let's write a simple script to calculate the percentage of successfully processed queries.

 #!/bin/bash SUM=`wc -l access.log` #      OK=`cat access.log | awk '{print $9}' | grep 200 | wc -l` #      "200" awk -v SUM="$SUM" -v OK="$OK" ' BEGIN { printf "Passed: %3.2f%%\n", OK / SUM * 100 exit }'

We Passed: 93.62% something like this: Passed: 93.62%

And some more useful examples:

Calculate the amount of data transferred in the hour preceding the current

 cat access.log | grep `date +%H:%M -d "-1 hour"` | awk '{ total += $10 } END { print total/1024/1024 FS"MB" }'

Calculate the total amount of data transferred

 awk '{ total += $10 } END { print total/1024/1024 FS"MB" }' access.log

Calculate the average request processing time

 #!/bin/bash COUNT=`wc -l access.log` SUM=`cat access.log | awk '{ total += $16 } END { print total }' ` awk -v COUNT="$COUNT" -v SUM="$SUM" ' BEGIN { printf "Average req-time %3.2fs\n", SUM / COUNT exit }'

System boot diagnostics

To understand how busy the server is there are several utilities, first of all, you can estimate the load by the load average indicator, on Habré they already wrote that this is an indicator of the average system load, expressed in the number of blocking processes waiting to receive resources, for 1 minute, 5 minutes and 15 minutes, respectively.
The first thing that will help to assess the current server load and see which processes in this particular load the server - top or htop more:
top - displaying linux processes - in addition, it displays uptime, load average, the number of running tasks and threads.
CPU performance (as a percentage):
- us - time spent on the execution of custom without changing the processing priority (un-nice),
- sy - time spent on executing system (kernel) processes,
- ni is the time spent on executing custom c by changing the processing priority (nice),
- wa - I / O waiting time (disk subsystem),
- hi - time to process hardware interrupts,
- si - time for processing software interrupts,
- st - the time “stolen” by the hypervisor to process tasks on another virtual machine is important for virtual machines (VPS / VDS) and is useful for assessing the honesty of the hoster, if this figure is large, then your provider will impudently oversell.
Also top will show you how much memory is consumed in general and for what (processes, caches, buffers), and also show how much each process consumes:
- VIRT - displays how much memory a process uses in its entirety including code, data, shared libraries, as well as memory pages dumped into swap and memory pages that were reserved but not used,
- RES - the size of the resident memory - the current amount of RAM (without a swap) used by the process,
- SHR - size of shared memory - the current amount of RAM available for the process, as a rule, not all of this memory is resident. This number simply indicates how much memory can be shared with other processes.
In addition to displaying resource utilization indicators, top is also used to manipulate processes; you can send a signal to terminate a process or change its priority (nice).
htop - a more colorful version of top - has all the same capabilities as top, but has a more user-friendly interface (allows scrolling through the list of processes both vertically and horizontally), and also can call lsof, strace and ltrace for the selected process (about them later we will tell in more detail).
To estimate the load on the disk I / O subsystem, you can use the iotop utility:
iotop , a top-like utility for monitoring disk load, displays a process table with current disk I / O utilization rates, such as:
- PRIO - process priority,
- DISK READ - read from disk Bytes / sec,
- DISK WRITE - write to disk Bytes / sec,
- SWAPIN - time (as a percentage) spent by the process on swapping,
- IO - time (as a percentage) spent by the process waiting for input / output.
Additionally, information about the total read / write bit rate of the disk subsystem is displayed.
vmstat - displays summary information about processes, memory, input / output, processor and disk activity. Unlike iotop, it does not require superuser privileges.

Diagnostics of program crashes, troubleshooting
There are situations when a program or a daemon does not work as you would expect, or fail for some unknown reason. When all logs are viewed (or when there are no logs), all configs are checked and the documentation is re-read (if there is of course), the system call trace utilities stace, ltrace will help you, and lsof will show the list of open files:
strace - the utility allows you to intercept system calls and signals from a running process or from an already running process using its PID.
The output can be filtered, for example, only open () or select () calls can be output. If the call returns a non-negative value, then everything is fine, if given for example:
open("/foo/bar", O_RDONLY) = -1 ENOENT (No such file or directory) , then from this you can understand that the application is trying to open a non-existent file. It also sometimes happens that the file permissions are incorrectly set. For example, the virtual machine does not start, because the hypervisor cannot allocate the necessary amount of RAM to it. Strace in such situations helps a lot.
If you are just curious to see how a particular service works, what kind of system calls it uses, then stace is your friend. For example, you want to know how Apache differs fundamentally from Nginx, just take and look.
ltrace - a utility for tracing library calls - is very similar to strace, but intercepts only calls to dynamic libraries.
ldd - if you set it on the program will show which program uses the library. Useful if you need to transfer the program to chroot.
lsof - displays a list of open files with an indication; by default, displays everything. It can display a list for a specific process by PID (quite convenient output in htop), for a user by UID, or for example, which processes use a particular file.
May be useful in some cases . There is a very detailed man with descriptions and examples.

What remains to tell

In the next article we will talk about utilities for diagnosing equipment, networks, and monitoring and automation tools.
The topic is really capacious and interesting, we will be happy to see in the comments interesting examples from your personal experience.

To be continued ...

UPD: More correct option for calculating the percentage of successfully processed requests from sledopit

 awk '{sum+=1; if ($9==200) ok+=1} END {printf "Passed: %3.2f%%\n", ok / sum * 100}' access.log

And an even simpler option with the built-in variable NR instead of SUM from RumataEstora

 awk '$9 == 200 { s++ } END { print s / NR * 100; }' access.log

Source: https://habr.com/ru/post/222469/

All Articles