
This article is the first of three articles in which I want to give my vision of the problem of managing large infrastructures with Puppet. The first part is an introduction to the powerful hierarchy organization tool Puppet Hiera. This article is aimed at people already familiar with Pappet, but not yet familiar with Hiera. In it, I will try to give basic knowledge about this powerful tool and how it facilitates the management of a large number of servers.
You probably know or imagine that managing a large infrastructure with 
Puppet is not an easy task. If for ten servers Pappet is not needed, for fifty at the most time and the code can be written as you like, when it comes to 500+ servers, in this case you have to seriously think about optimizing your efforts. It is bad that Pappet initially, apparently, did not think about how a solution for large infrastructures, at least the hierarchy in it was initially laid down very badly. Standard 
node definitions are completely inapplicable in large companies. 
Node inheritance (as well as 
class inheritance ) Puppetlabs do not recommend using more at all; instead, it is better to download hierarchy data from external sources such as 
Hiera and 
External Node Classifier (ENC). 
Despite the fact that initially the concept of ENC is not much different from Hiera, nevertheless, for some reason, I don’t really like specific ENC implementations such as 
Puppet Dashboard and 
Foreman . Let me explain why:
1) My infrastructure data is somewhere in the application database. How to get them from there in case of application crash? I dont know. I can speculate, but I don’t know for sure.
2) Powerful ENC because of its power is bad and difficult to scale. In contrast, Hiera stores all her data in text form. Text data is very easy to synchronize via git and 
r10k between several Pappet masters, if such a need arises. In general, textual configurations are a UNIX way, however old-fashioned it may sound.
')
Again, I do not reject the potential of Puppet Dashboard and Foreman as a means of monitoring and reporting. A beautiful web interface with graphs and images is necessary, but only as a means of viewing, not as a means of changing the configuration of your infrastructure. And I also know that Foreman does a lot of things besides Pappet (the 
Red Hat Satellite Server 6 and 
Katello project based on Foreman are vivid examples). But nevertheless, I like it more as a place to store the configuration of my entire Hiera infrastructure.
What is Hiera? This is the Ruby library, which by default is included in the Pappet and helps to better organize your data in Pappet. Is it possible to do without it? Can. You can write all the numbers and parameters in manifests, but then from a certain stage of development they will take on a completely intimidating form, and it will become harder and harder for you to remember where things are stored and what they are responsible for.
What is the profit of using Heera? You begin to separate the specific parameters of your infrastructure (user uids, ssh keys, dns settings, various centralized files, etc.) from Pappet code, which actually applies them to your infrastructure. This leads to the fact that if one day you need to find out which UID of such and such a user on such a server or even a group of servers, you will immediately know exactly where this information is stored, and you will not frantically flick through all your manifests in search of the desired user and try to predict what the UID’s change "here in this place" will lead to. Of course, no need to expect a miracle from Hiera. In the end, this is just a way to store and organize your data.
But enough of the lyrics , get down to business. Hiera (from hierarchy) operates a hierarchy. And I wrote the following hierarchy in /etc/puppet/hiera.yaml:
:hierarchy: - "%{::environment}/nodes/%{::fqdn}" - "%{::environment}/roles" - "%{::environment}/%{::environment}" - common :backends: - yaml :yaml: :datadir: '/etc/puppet/hiera' 
Remember this hierarchy, in the future I will actively use it.
For those who are not very familiar with Hiera, I will explain. We set the folder "/ etc / puppet / hiera" as Hiera's data store. Files in this folder must have the extension 
.yaml and data format 
YAML . Next, we set the file names that Hiera would expect to see in her folder. Since Hiera is called from the Pappet code, the same variables are available to her as Pappet, including the 
facts . The built-in fact of each node is its environment, which can be used in Hiere as a variable 
% {:: environment} . 
The FQDN of the node in Hiera predictably looks like 
% {:: fqdn} . Thus, this hierarchy corresponds to a similar file structure:
/ etc / puppet / hiera / 
| - common.yaml 
| - production / 
| ----- production.yaml 
| ----- roles.yaml 
| ----- nodes / 
| -------- prod-node1.yaml 
| -------- prod-node2.yaml 
| - development / 
| ----- development.yaml 
| ----- roles.yaml 
| ----- nodes / 
| -------- dev-node1.yaml 
| -------- dev-node2.yamlThe order of the levels in hiera.yaml (not in the file structure) is important. Hiera starts viewing from top to bottom, and then it all depends on the method of calling Hiera, which you use in the Pappet manifest. There are 
three methods , I will demonstrate them by example. Let our hierarchy be described by the hiera.yaml file described above, we will create three files of the following content:
/etc/puppet/hiera/common.yaml  classes:
   - common_class1
   - common_class2
 roles:
   common_role1:
     key1: value1
     key2: value2
 common: common_value  /etc/puppet/hiera/production/production.yaml  classes:
   - production_class1
   - production_class2
 roles:
   production_role1:
     key1: value1
     key2: value2
 production: production_value  /etc/puppet/hiera/production/nodes/testnode.yaml  classes:
   - node_class1
   - node_class2
 roles:
   node_role1:
     key1: value1
     key2: value2
 node: node_value  Hiera supports command line prompts. In fact, the easiest way to understand how it works is from the console. Hiera by default keeps its config in /etc/hiera.yaml. You need to make this file a symbolic link to /etc/puppet/hiera.yaml. After that we make a simple call:
 [root@testnode] 
Because in this query we did not provide information about the environment and fqdn Hiera takes data from the lowest level of the hierarchy, the common.yaml file. Array elements are displayed in square brackets. Let's try to provide information about the environment:
 [root@testnode] 
The data from production.yaml are higher in the hierarchy, so they are more priority and overwrite data obtained from common.yaml. Similarly, data from testnode.yaml overwrites data from production.yaml. However, if the data is not in the parent hierarchy, then the data is logically taken from the downstream:
 [root@testnode] 
In this case, strings are returned, not arrays, according to the above files.
This type of request is called the 
priority lookup . As you can see, it always returns the first found value in the hierarchy (with the highest priority), and then it is completed without examining the underlying hierarchies. In Pappet, it corresponds to the standard function hiera (). In our example, this would be a call to hiera ('classes'). Since Pappet always calls Hyerra from the appropriate context, we don’t need to additionally specify something in the query string.
The next type of request is 
Array merge . We look:
 [root@testnode] 
This type of query passes through all levels of the hierarchy and collects all found values ​​(strings and arrays) into one large single array. In Pappet's terminology, this request is called hiera_array (). However, this type of request is not able to collect hashes. If during its passage it encounters a hash, it will give an error:
 [root@testnode] 
In a similar situation, the priority lookup will go fine and return a hash (in curly brackets):
 [root@testnode] 
What to do if we need to collect hashes? Use the third query type: 
Hash merge :
 [root@testnode] 
This query, similar to the previous one, passes through all levels of the hierarchy and collects all the hashes into one large common hash. It is easy to guess that when you try to build them arrays or strings, it will return an error:
 [root@testnode] 
On Pappet, this request is called hiera_hash (). What happens if, at different levels of the hierarchy, the same hash has different sets of “key => value”? For example, test user at the common level has UID = 100, and at the node level testnode has UID = 200? In this case, for each specific key, hash lookup will behave as an priority lookup, that is, return a higher priority value. You can read more about it 
here .
Okay, cool ( or not ) , 
but why is this all of us?Pappet 
automatically (in versions 3.x, for this, even nothing needs to be set up) looks at Heer for parameters that can be used by him.
For a start, a simple slightly modified example from 
the Pappet site (by the way, the example now shows the outdated parameters ntp :: autoupdate and ntp :: enable, I have below their actual names). We will torment the long-suffering module 
puppetlabs-ntp . Suppose we want to express in the Pappet the following ntp configuration:
/etc/ntp.conftinker panic 0
restrict restrict default kod nomodify notrap nopeer noquery
restrict restrict -6 default kod nomodify notrap nopeer noquery
restrict restrict 127.0.0.1
restrict restrict -6 :: 1
server 0.pool.ntp.org iburst burst
server 1.pool.ntp.org iburst burst
server 2.pool.ntp.org iburst burst
server 3.pool.ntp.org iburst burst
driftfile / var / lib / ntp / drift
 To do this, add the following lines to common.yaml in Hiera:
 classes: - ntp ntp::restrict: - restrict default kod nomodify notrap nopeer noquery - restrict -6 default kod nomodify notrap nopeer noquery - restrict 127.0.0.1 - restrict -6 ::1 ntp::service_ensure: running ntp::service_enable: true ntp::servers: - 0.pool.ntp.org iburst burst - 1.pool.ntp.org iburst burst - 2.pool.ntp.org iburst burst - 3.pool.ntp.org iburst burst 
It is easy to notice that here the concrete values ​​of the ntp class variables are simply listed, which will be passed to the class when it is called. These variables are declared in the ntp class header (modules / ntp / manifests / init.pp file). With this method of passing parameters to the class from Hiera, it is necessary to use 
fully qualified variable names in order for Pappet to correctly load them into the required 
scope .
The only thing left to do is to add one line to the main Pappet manifesto of your environment (site.pp):
 hiera_include('classes') 
This line, despite its simplicity and brevity, produces a lot of work behind the scenes. First, Pappet goes through all (!) Hiera hierarchies and loads 
all classes declared in 
all sections of the " 
classes: " Hiera. Then Pappet passes through all the fully qualified variables in Hiera and loads them into the scope of the corresponding class. It is easy to guess that if you remove the ntp class from the classes list, but forget to remove the variables of this class in the YAML file, Pappet will generate an error like “cannot find named class ntp”. Without a loaded class, its variables lose all meaning.
Here I have to say that the word classes (like all the others) in HAMA’s YAML files does not carry any special or reserved meaning. Instead of classes, you can write any other word, for example, production_classes, my_classes, my -% {:: environment}. Yes, the latter is also true; Pappet 
variables can also be used in the names of the Hiera sections and hash keys. In the values ​​of hashes, as well as in string variables and arrays, variables cannot be used, and sometimes it's a pity!
Thus, we effectively removed the ntp service parameters from the Pappet manifest to the Hyera hierarchy. Now, in accordance with the hierarchy described at the beginning of the article, these ntp parameters will be applied to absolutely all the nodes in your infrastructure. But if you want to redefine these parameters at a higher environment level or at a specific server level, you can easily do this by specifying the variable values ​​you need at the hierarchy level you need.
In fact, this method of automatically importing data from Hyera to Pappet is not the only one.
The previous method has one major drawback: it is 
too automatic. If on simple configurations we can easily predict its behavior, then in the case of a large number of hosts it is not always possible to say with certainty what the addition of another class to the list of imported ones will lead to. For example, you can use the 
puppetlabs-apache module to add a specific Apache configuration to some nodes. If you include a harmless phrase
 classes: - apache 
to the 
production.yaml file, this will install, configure, and start up Apache on all production hosts. Moreover, the apache module will erase the 
entire previous Apache 
configuration , which has already been configured before it.
Here is his 
fun default behavior ! So a simple 'include apache' can sometimes be costly if you don’t read the documentation.
But what to do? To enter the Apache in YAML only the nodes we need? Somehow this is not quite centrally obtained ...
To have a choice of what we want to include and what we do not want, the Pappet function 
create_resources () was created. Its use is beautifully described 
here .
Function 
create_resources (resource, hash1, hash2) : creates a resource 
resource for Pappet, passing it 
hash1 and 
hash2 to it . Hash2 is optional, but if it is specified, its keys and values ​​will be added to hash1. If the same parameter is specified in both hash1 and hash2, then hash1 takes precedence. A Pappet resource can be either from the list of standard (see 
Puppet type reference ), either previously declared ( 
defined type ) by us or in a class. An example of a standard resource is the user resource, an example of the declared one is apache :: vhost from the apache module. Consider an example with an Apache (here I allow myself to copy a good example from the above 
link ).
Suppose we want to transfer the following configuration of two virtual hosts of Apache to Hieru:
 apache::vhost { 'foo.example.com': port => '80', docroot => '/var/www/foo.example.com', docroot_owner => 'foo', docroot_group => 'foo', options => ['Indexes','FollowSymLinks','MultiViews'], proxy_pass => [ { 'path' => '/a', 'url' => 'http://backend-a/' } ], } apache::vhost { 'bar.example.com': port => '80, docroot: => '/var/www/bar.example.com', } 
In Hiera, it will look like this:
 apache::vhosts: foo.example.com: port: 80 docroot: /var/www/foo.example.com docroot_owner: foo docroot_group: foo options: - Indexes - FollowSymLinks - MultiViews proxy_pass: - path: '/a' url: 'http://localhost:8080/a' bar.example.com: port: 80 docroot: /var/www/bar.example.com 
All that remains to be written in the Pappet Manifesto is:
 $myvhosts = hiera('apache::vhosts', {}) create_resources('apache::vhost', $myvhosts) 
Here in the first line we asked Hiera to load the entire configuration from the apache :: vhosts section. The information was loaded in the form of two hashes: 'foo.example.com' and 'bar.example.com' (if absolutely accurate, then the nameless hash consisting of two named hashes fell into the $ myvhosts variable). After that, the data hashes in turn were passed to the input resource apache :: vhosts, which will lead to their creation Pappet.
Another good example of how you can transfer data from manifests to Hieru. User management. If you write the following code in Hiera:
Hidden text users: user1: ensure: present home: /home/user1 shell: /bin/sh uid: 10001 managehome: true user2: ensure: present home: /home/user2 shell: /bin/sh uid: 10002 groups: - secondary_group1 - secondary_group2 user3: ensure: present home: /home/user3 shell: /bin/sh uid: 10003 groups: - secondary_group3 - secondary_group4 
 And then in site.pp write:
 $node_users = hiera_hash('users') create_resources(user, $users, {}) 
then it will create all of the above users. Notice that calling hiera_hash will effectively bring together all the users declared in the users: section with your entire hierarchy. If conflicts arise somewhere (different user UIDs in different files), Hiera will take the value described in a higher hierarchy level. Is logical.
Also, create_resources () along with defined types is one of the ways to organize iteration over the cycle in Pappet, which is initially deprived of this function (at least without the future parser, are you not so insane to use it yet?). Both ways of iteration are well described 
here .
Here to begin with and all. I gave the basics of using Hyera. Using the standard functions of Pappet, hiera (), hiera_array (), hiera_hash (), hiera_include () and create_resources (), as you probably already guessed, you can think of a lot of things.
In the next article I will try to describe the management of server roles using Pappet and Hiera.