--private-network-ip
flag in the GCP or by the --private-ip-address
option in AWS when allocating computing resources. for i in 0 1 2; do gcloud compute instances create controller-${i} \ # ... --private-network-ip 10.240.0.1${i} \ # ... done
controllers_gcp.sh
) for i in 0 1 2; do declare controller_id${i}=`aws ec2 run-instances \ # ... --private-ip-address 10.240.0.1${i} \ # ... done
controllers_aws.sh
)10.240.0.1${i}/24
, workers - 10.240.0.2${i}/24
) and public, assigned by the cloud provider, which we will talk about later how to get to NodePorts
. $ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS controller-0 us-west1-c n1-standard-1 10.240.0.10 35.231.XXX.XXX RUNNING worker-1 us-west1-c n1-standard-1 10.240.0.21 35.231.XX.XXX RUNNING ...
$ aws ec2 describe-instances --query 'Reservations[].Instances[].[Tags[?Key==`Name`].Value[],PrivateIpAddress,PublicIpAddress]' --output text | sed '$!N;s/\n/ /' 10.240.0.10 34.228.XX.XXX controller-0 10.240.0.21 34.173.XXX.XX worker-1 ...
ping
installed on the host).POD_CIDR=10.200.${i}.0/24
for a worker-${i}
.“The CNI plugin is responsible for adding a network interface in the container's network namespace (for example, one end of a veth pair ) and making necessary changes on the host (for example, connecting the second end of the veth to the bridge). Then he must assign the IP interface and configure the routes according to the section “IP Address Management” by calling the necessary IPAM plugin. ” (from Container Network Interface Specification )
“The namespace wraps the global system resource into an abstraction, which is visible to the processes in this namespace in such a way that they have their own isolated copy of the global resource. Changes in the global resource are visible to other processes in this namespace, but not visible to other processes. ” ( from namespaces man page )
Cgroup
, IPC
, Network
, Mount
, PID
, User
, UTS
). Network ( Network
) namespaces ( CLONE_NEWNET
) define network resources that are available to the process: “Each network namespace has its own network devices, IP addresses, IP routing tables, /proc/net
directory, port numbers, and so on” ( from the article " Namespaces in operation ") .“Virtual network pair (veth) offers an abstraction in the form of a“ pipe ”that can be used to create tunnels between network name spaces or to create a bridge to a physical network device in another network space. When the namespace is freed, all veth devices in it are destroyed. ” (from the network namespaces man page )
“The CNI plugin is selected by passing the command line--network-plugin=cni
to Kubelet. Kubelet reads a file from--cni-conf-dir
(the default is/etc/cni/net.d
) and uses the CNI configuration from this file to configure the network for each hearth. ” (from Network Plugin Requirements )
-- cni-bin-dir
(the default is /opt/cni/bin
).kubelet.service
include --network-plugin=cni
: [Service] ExecStart=/usr/local/bin/kubelet \\ --config=/var/lib/kubelet/kubelet-config.yaml \\ --network-plugin=cni \\ ...
pause
container, which “serves as the“ parent container ”for all pod containers” (from the article “ The Almighty Pause Container ”) . Kubernetes then executes the CNI plugin to attach the pause
container to the network. All pod containers use the network namespace ( netns
) of this pause
container. { "cniVersion": "0.3.1", "name": "bridge", "type": "bridge", "bridge": "cnio0", "isGateway": true, "ipMasq": true, "ipam": { "type": "host-local", "ranges": [ [{"subnet": "${POD_CIDR}"}] ], "routes": [{"dst": "0.0.0.0/0"}] } }
bridge
plug-in to configure the Linux software bridge (L2) in the root namespace called cnio0
(the default name is cni0
), which acts as a gateway ( "isGateway": true
).ipam
) is called. In this case, the host-local
type is used, “which stores the state locally on the host file system, which ensures the uniqueness of IP addresses on one host” (from the host-local
) . The IPAM plugin returns this information to the previous plug-in ( bridge
), due to which all routes specified in the config can be configured ( "routes": [{"dst": "0.0.0.0/0"}]
). If gw
not specified, it is taken from the subnet . The default route is also configured in the network pod names, pointing to the bridge (which is configured as the first IP subnet of the sub)."ipMasq": true
) of the traffic originating from the network podov. In fact, we do not need NAT here, but this is the config in Kubernetes The Hard Way . Therefore, for the sake of completeness, I must mention that the bridge
plugin's iptables
entries are configured for this particular example. All packets from the pod, the recipient of which is not in the range of 224.0.0.0/4
, will be behind NAT , which does not quite meet the requirement “all containers can communicate with any other containers without using NAT”. Well, we'll prove why NAT is not needed ...nginx
deployment from here . We use lsns
with the -t
option to select the desired type of namespace (i.e. net
): ubuntu@worker-0:~$ sudo lsns -t net NS TYPE NPROCS PID USER COMMAND 4026532089 net 113 1 root /sbin/init 4026532280 net 2 8046 root /pause 4026532352 net 4 16455 root /pause 4026532426 net 3 27255 root /pause
-i
option to ls
we can find their inode numbers: ubuntu@worker-0:~$ ls -1i /var/run/netns 4026532352 cni-1d85bb0c-7c61-fd9f-2adc-f6e98f7a58af 4026532280 cni-7cec0838-f50c-416a-3b45-628a4237c55c 4026532426 cni-912bcc63-712d-1c84-89a7-9e10510808a0
ip netns
: ubuntu@worker-0:~$ ip netns cni-912bcc63-712d-1c84-89a7-9e10510808a0 (id: 2) cni-1d85bb0c-7c61-fd9f-2adc-f6e98f7a58af (id: 1) cni-7cec0838-f50c-416a-3b45-628a4237c55c (id: 0)
cni-912bcc63–712d-1c84–89a7–9e10510808a0
( 4026532426
), you can run, for example, the following command: ubuntu@worker-0:~$ sudo ls -l /proc/[1-9]*/ns/net | grep 4026532426 | cut -f3 -d"/" | xargs ps -p PID TTY STAT TIME COMMAND 27255 ? Ss 0:00 /pause 27331 ? Ss 0:00 nginx: master process nginx -g daemon off; 27355 ? S 0:00 nginx: worker process
pause
in this section, we launched nginx
. The pause
container shares the net
and ipc
namespaces with all other pod containers. Remember PID from pause
- 27255; we will return to it.kubectl
tells about this kubectl
: $ kubectl get pods -o wide | grep nginx nginx-65899c769f-wxdx6 1/1 Running 0 5d 10.200.0.4 worker-0
$ kubectl describe pods nginx-65899c769f-wxdx6
Name: nginx-65899c769f-wxdx6 Namespace: default Node: worker-0/10.240.0.20 Start Time: Thu, 05 Jul 2018 14:20:06 -0400 Labels: pod-template-hash=2145573259 run=nginx Annotations: <none> Status: Running IP: 10.200.0.4 Controlled By: ReplicaSet/nginx-65899c769f Containers: nginx: Container ID: containerd://4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 Image: nginx ...
nginx-65899c769f-wxdx6
- and the ID of one of its containers ( nginx
), but not a word about pause
so far. Let's dig deeper the working node to match all the data. Remember that Kubernetes The Hard Way does not use Docker , so for details on the container we refer to the console utility containerd - ctr (see also the article “ Integration of containerd with Kubernetes, replacing Docker, is ready for production ” - approx. Transl. ) : ubuntu@worker-0:~$ sudo ctr namespaces ls NAME LABELS k8s.io
k8s.io
), you can get the nginx
container ID: ubuntu@worker-0:~$ sudo ctr -n k8s.io containers ls | grep nginx 4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 docker.io/library/nginx:latest io.containerd.runtime.v1.linux
pause
too: ubuntu@worker-0:~$ sudo ctr -n k8s.io containers ls | grep pause 0866803b612f2f55e7b6b83836bde09bd6530246239b7bde1e49c04c7038e43a k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linux 21640aea0210b320fd637c22ff93b7e21473178de0073b05de83f3b116fc8834 k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linux d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6 k8s.gcr.io/pause:3.1 io.containerd.runtime.v1.linux
nginx
container ID ending in …983c7
is the same as what we got from kubectl
. Let's see if we can figure out which pause
container belongs to the nginx
hearth: ubuntu@worker-0:~$ sudo ctr -n k8s.io task ls TASK PID STATUS ... d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6 27255 RUNNING 4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 27331 RUNNING
cni-912bcc63–712d-1c84–89a7–9e10510808a0
? ubuntu@worker-0:~$ sudo ctr -n k8s.io containers info d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6 { "ID": "d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6", "Labels": { "io.cri-containerd.kind": "sandbox", "io.kubernetes.pod.name": "nginx-65899c769f-wxdx6", "io.kubernetes.pod.namespace": "default", "io.kubernetes.pod.uid": "0b35e956-8080-11e8-8aa9-0a12b8818382", "pod-template-hash": "2145573259", "run": "nginx" }, "Image": "k8s.gcr.io/pause:3.1", ...
ubuntu@worker-0:~$ sudo ctr -n k8s.io containers info 4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7 { "ID": "4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7", "Labels": { "io.cri-containerd.kind": "container", "io.kubernetes.container.name": "nginx", "io.kubernetes.pod.name": "nginx-65899c769f-wxdx6", "io.kubernetes.pod.namespace": "default", "io.kubernetes.pod.uid": "0b35e956-8080-11e8-8aa9-0a12b8818382" }, "Image": "docker.io/library/nginx:latest", ...
nginx-65899c769f-wxdx6
) and the network namespace ( cni-912bcc63–712d-1c84–89a7–9e10510808a0
):4c0bd2e2e5c0b17c637af83376879c38f2fb11852921b12413c54ba49d6983c7
);d19b1b1c92f7cc90764d4f385e8935d121bca66ba8982bae65baff1bc2841da6
).nginx-65899c769f-wxdx6
) connected to the network? Let's use the previously received PID 27255 from pause
to launch commands in its network namespace ( cni-912bcc63–712d-1c84–89a7–9e10510808a0
): ubuntu@worker-0:~$ sudo ip netns identify 27255 cni-912bcc63-712d-1c84-89a7-9e10510808a0
nsenter
with the -t
option that defines the target PID, and -n
without specifying a file to get into the network namespace of the target process (27255). This is what the ip link show
will say: ubuntu@worker-0:~$ sudo nsenter -t 27255 -n ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:c8:00:04 brd ff:ff:ff:ff:ff:ff link-netnsid 0
ifconfig eth0
: ubuntu@worker-0:~$ sudo nsenter -t 27255 -n ifconfig eth0 eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.200.0.4 netmask 255.255.255.0 broadcast 0.0.0.0 inet6 fe80::2097:51ff:fe39:ec21 prefixlen 64 scopeid 0x20<link> ether 0a:58:0a:c8:00:04 txqueuelen 0 (Ethernet) RX packets 540 bytes 42247 (42.2 KB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 177 bytes 16530 (16.5 KB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
kubectl get pod
is configured on the eth0
interface of the submenu. This interface is part of a veth pair , one end of which is in the hearth, and the other is in the root namespace. To find out the interface of the second end, use ethtool
: ubuntu@worker-0:~$ sudo ip netns exec cni-912bcc63-712d-1c84-89a7-9e10510808a0 ethtool -S eth0 NIC statistics: peer_ifindex: 7
ifindex
is 7. ifindex
that it is in the root namespace. This can be done using ip link
: ubuntu@worker-0:~$ ip link | grep '^7:' 7: veth71f7d238@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master cnio0 state UP mode DEFAULT group default
ubuntu@worker-0:~$ sudo cat /sys/class/net/veth71f7d238/ifindex 7
brctl
let's see who else is connected to the Linux-bridge: ubuntu@worker-0:~$ brctl show cnio0 bridge name bridge id STP enabled interfaces cnio0 8000.0a580ac80001 no veth71f7d238 veth73f35410 vethf273b35f
ubuntu@worker-0:~$ sudo ip netns exec cni-912bcc63-712d-1c84-89a7-9e10510808a0 ip route show default via 10.200.0.1 dev eth0 10.200.0.0/24 dev eth0 proto kernel scope link src 10.200.0.4
default via 10.200.0.1
). Now let's look at the host routing table: ubuntu@worker-0:~$ ip route list default via 10.240.0.1 dev eth0 proto dhcp src 10.240.0.20 metric 100 10.200.0.0/24 dev cnio0 proto kernel scope link src 10.200.0.1 10.240.0.0/24 dev eth0 proto kernel scope link src 10.240.0.20 10.240.0.1 dev eth0 proto dhcp scope link src 10.240.0.20 metric 100
kubectl create -f busybox.yaml
create two identical busybox
containers with Replication Controller: apiVersion: v1 kind: ReplicationController metadata: name: busybox0 labels: app: busybox0 spec: replicas: 2 selector: app: busybox0 template: metadata: name: busybox0 labels: app: busybox0 spec: containers: - image: busybox command: - sleep - "3600" imagePullPolicy: IfNotPresent name: busybox restartPolicy: Always
busybox.yaml
) $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE busybox0-g6pww 1/1 Running 0 4s 10.200.1.15 worker-1 busybox0-rw89s 1/1 Running 0 4s 10.200.0.21 worker-0 ...
$ kubectl exec -it busybox0-rw89s -- ping -c 2 10.200.1.15 PING 10.200.1.15 (10.200.1.15): 56 data bytes 64 bytes from 10.200.1.15: seq=0 ttl=62 time=0.528 ms 64 bytes from 10.200.1.15: seq=1 ttl=62 time=0.440 ms --- 10.200.1.15 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.440/0.484/0.528 ms
tcpdump
or conntrack
: ubuntu@worker-0:~$ sudo conntrack -L | grep 10.200.1.15 icmp 1 29 src=10.200.0.21 dst=10.200.1.15 type=8 code=0 id=1280 src=10.200.1.15 dst=10.240.0.20 type=0 code=0 id=1280 mark=0 use=1
ubuntu@worker-1:~$ sudo conntrack -L | grep 10.200.1.15 icmp 1 28 src=10.240.0.20 dst=10.200.1.15 type=8 code=0 id=1280 src=10.200.1.15 dst=10.240.0.20 type=0 code=0 id=1280 mark=0 use=1
ubuntu@worker-0:~$ sudo iptables -t nat -Z POSTROUTING -L -v Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes) pkts bytes target prot opt in out source destination ... 5 324 CNI-be726a77f15ea47ff32947a3 all -- any any 10.200.0.0/24 anywhere /* name: "bridge" id: "631cab5de5565cc432a3beca0e2aece0cef9285482b11f3eb0b46c134e457854" */ Zeroing chain `POSTROUTING'
"ipMasq": true
from the configuration of the CNI plug-in, you can see the following (this operation is performed exclusively for educational purposes - we do not recommend changing the config on a running cluster!): $ kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE busybox0-2btxn 1/1 Running 0 16s 10.200.0.15 worker-0 busybox0-dhpx8 1/1 Running 0 16s 10.200.1.13 worker-1 ...
$ kubectl exec -it busybox0-2btxn -- ping -c 2 10.200.1.13 PING 10.200.1.6 (10.200.1.6): 56 data bytes 64 bytes from 10.200.1.6: seq=0 ttl=62 time=0.515 ms 64 bytes from 10.200.1.6: seq=1 ttl=62 time=0.427 ms --- 10.200.1.6 ping statistics --- 2 packets transmitted, 2 packets received, 0% packet loss round-trip min/avg/max = 0.427/0.471/0.515 ms
ubuntu@worker-0:~$ sudo conntrack -L | grep 10.200.1.13 icmp 1 29 src=10.200.0.15 dst=10.200.1.13 type=8 code=0 id=1792 src=10.200.1.13 dst=10.200.0.15 type=0 code=0 id=1792 mark=0 use=1
ubuntu@worker-1:~$ sudo conntrack -L | grep 10.200.1.13 icmp 1 27 src=10.200.0.15 dst=10.200.1.13 type=8 code=0 id=1792 src=10.200.1.13 dst=10.200.0.15 type=0 code=0 id=1792 mark=0 use=1
busybox
that the IP addresses allocated to the busybox
pod were different in each case. What if we wanted to make these containers available for communication from other platforms? It would be possible to take the current pod IP addresses, but they will change. For this reason, you need to configure the resource Service
, which will proxy requests to a variety of short-lived pods.“Service in Kubernetes is an abstraction that defines a logical set of hearths and a policy by which they can be accessed.” (from Kubernetes Services documentation)
ClusterIP
, which configures the IP address from the cluster CIDR block (that is, it is accessible only from the cluster). One such example is the DNS Cluster Add-on configured in Kubernetes The Hard Way. # ... apiVersion: v1 kind: Service metadata: name: kube-dns namespace: kube-system labels: k8s-app: kube-dns kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile kubernetes.io/name: "KubeDNS" spec: selector: k8s-app: kube-dns clusterIP: 10.32.0.10 ports: - name: dns port: 53 protocol: UDP - name: dns-tcp port: 53 protocol: TCP # ...
kube-dns.yaml
)kubectl
shows that Service
remembers endpoints and translates them: $ kubectl -n kube-system describe services ... Selector: k8s-app=kube-dns Type: ClusterIP IP: 10.32.0.10 Port: dns 53/UDP TargetPort: 53/UDP Endpoints: 10.200.0.27:53 Port: dns-tcp 53/TCP TargetPort: 53/TCP Endpoints: 10.200.0.27:53 ...
iptables
. Let's go through the rules created for this example. Their full list can be seen with the iptables-save
command.OUTPUT
) or arrive at the network interface ( PREROUTING
), they pass through the following iptables
chains: -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
-A KUBE-SERVICES -d 10.32.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4 -A KUBE-SVC-ERIFXISQEP7F7OF4 -m comment --comment "kube-system/kube-dns:dns-tcp" -j KUBE-SEP-32LPCMGYG6ODGN3H -A KUBE-SEP-32LPCMGYG6ODGN3H -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp" -m tcp -j DNAT --to-destination 10.200.0.27:53
-A KUBE-SERVICES -d 10.32.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU -A KUBE-SVC-TCOU7JCQXEZGVUNU -m comment --comment "kube-system/kube-dns:dns" -j KUBE-SEP-LRUTK6XRXU43VLIG -A KUBE-SEP-LRUTK6XRXU43VLIG -p udp -m comment --comment "kube-system/kube-dns:dns" -m udp -j DNAT --to-destination 10.200.0.27:53
Services
in Kubernetes. In particular, Kubernetes The Hard Way NodePort
about NodePort
- see Smoke Test: Services . kubectl expose deployment nginx --port 80 --type NodePort
NodePort
publishes a service on the IP address of each node, placing it on a static port (it is called NodePort
). The NodePort
service can NodePort
be accessed from outside the cluster. You can check the selected port (in this case - 31088) with the help of kubectl
: $ kubectl describe services nginx ... Type: NodePort IP: 10.32.0.53 Port: <unset> 80/TCP TargetPort: 80/TCP NodePort: <unset> 31088/TCP Endpoints: 10.200.1.18:80 ...
http://${EXTERNAL_IP}:31088/
. Here, EXTERNAL_IP
is the public IP address of any working instance . In this example, I used the public IP address worker-0 . The request is obtained by a node with an IP address of 10.240.0.20 (the cloud provider is in public NAT), but the service is actually running on another node ( worker-1 , which can be seen at the endpoint's IP address - 10.200.1.18): ubuntu@worker-0:~$ sudo conntrack -L | grep 31088 tcp 6 86397 ESTABLISHED src=173.38.XXX.XXX dst=10.240.0.20 sport=30303 dport=31088 src=10.200.1.18 dst=10.240.0.20 sport=80 dport=30303 [ASSURED] mark=0 use=1
ubuntu@worker-1:~$ sudo conntrack -L | grep 80 tcp 6 86392 ESTABLISHED src=10.240.0.20 dst=10.200.1.18 sport=14802 dport=80 src=10.200.1.18 dst=10.240.0.20 sport=80 dport=14802 [ASSURED] mark=0 use=1
iptables
rules are: -A KUBE-NODEPORTS -p tcp -m comment --comment "default/nginx:" -m tcp --dport 31088 -j KUBE-SVC-4N57TFCL4MD7ZTDA -A KUBE-SVC-4N57TFCL4MD7ZTDA -m comment --comment "default/nginx:" -j KUBE-SEP-UGTFMET44DQG7H7H -A KUBE-SEP-UGTFMET44DQG7H7H -p tcp -m comment --comment "default/nginx:" -m tcp -j DNAT --to-destination 10.200.1.18:80
LoadBalancer
, - which makes the service publicly available with the help of the cloud provider’s load balancer, but the article has already turned out to be large.Source: https://habr.com/ru/post/420813/
All Articles