I use Google Cloud with Kubernetes Engine for 2 months. In fact, it did not take me a month to put everything in my head, but it took as much again to deal with some troubles.
TL; DR: Google does a pretty good job, so AWS is not relaxing. If you know AWS well, I would advise testing Google Cloud. Perhaps because of muscle memory, I would be more comfortable with AWS, but I studied Google Cloud and Kubernetes and am confident in most of my scenarios.
I am not an expert, so accept my words with a bit of skepticism. Google Cloud and Kubernetes are one of those topics that I really want to talk about, but I can’t always find the right words and hope you get the right idea about the proposed solutions.
The purpose of the article is to preserve some fragments and thoughts for further use. So keep in mind that this is not a walkthrough. At first I intended to write a guide, but then I realized that it was almost like writing a whole book, so not this time.
To succeed with something like Google Cloud and Kubernetes, you must have enough experience. If you have never installed Linux From Scratch, if you have never performed server optimization, if you do not like server-side server components, do not attempt a real production deployment. Your safest bet will still be on Heroku.
You should be the kind of person who likes to tinker (as in my previous blogs).
I don't know everything, but I know enough. To begin with, I had to understand what I needed. It is important to state your needs before attempting to write the first YAML file. Planning is crucial.
Here is what I needed:
A simple demo web application is easy to deploy. But I didn’t want a demo, I wanted a solution for long-term production.
Some problems for newbies:
gcloud
and kubectl
.Let me remind you once again: this is not a walkthrough. I annotate a few steps.
The first thing you need is a complete 12-factors app.
Whether it is Ruby on Rails, Django, Laravel, Node.js (whatever), this should be an application that is not shared and does not depend on writing anything on the local file system. An application that you can easily disable and launch instances independently. There should be no sessions in local memory or in local files (I prefer to avoid session proximity). There is no file upload to the local file system (if necessary, you need to mount external persistent storage), always prefer to send binary streams to managed storage services.
You have to have a proper pipeline that caches through fingerprint assets (whether you like it or not, Rails still has the best of off-the-shelf solutions in Asset Pipeline).
Instruct the application, add the New Relic RPM and Rollbar .
2018, you do not want to deploy your own code using SQL injection (or any other input), uncontrolled eval around your code, there is no room for CSRF or XSS, etc. Go ahead, buy a license for Brakeman Pro and add it to CI conveyor. I can wait…
Since this is not a tutorial, I assume that you can register with Google Cloud and configure the project, your region and zone.
It took me some time to understand the primary structure in Google Cloud:
gcloud
and kubectl
interact with the API interfaces). gcloud container clusters create my-web-production \ --enable-cloud-logging \ --enable-cloud-monitoring \ --machine-type n1-standard-4 \ --enable-autoupgrade \ --enable-autoscaling --max-nodes=5 --min-nodes=2 \ --num-nodes 2
As I already mentioned, the cluster creates a default-pool
with machine type n1-standard-4
. Select the CPU / RAM combination you need for the application. The type I chose has 4 vCPUs and 15 GB of RAM.
By default, it starts at 3 nodes, so I first chose 2, but automatically scaled to 5 (you can update it later if needed, but make sure there is room for initial growth). You can continue to add additional pools for instances of different sized nodes, say, for Sidekiq workers, to perform intensive background processing. Then create a separate pool of nodes with a different machine type for your set of instances, for example:
gcloud container node-pools create large-pool \ --cluster=my-web-production \ --node-labels=pool=large \ --machine-type=n1-highcpu-8 \ --num-nodes 1
This other pool manages 1 n1-highcpu-8
node, which has 8 vCPUs with 7.2 GB of RAM. More processors, less memory. You have a highmem
category, which is smaller than a CPU with a much larger memory. Again, you need to know what you want.
The important point here is --node-labels
- how I will map the deployment for choosing between node pools (in this case, between the default pool and the large pool).
When you create a cluster, you must issue the following command to get your credentials:
gcloud container clusters get-credentials my-web-production
The command sets kubectl
. If you have more than one cluster ( my-web-production
and my-web-staging
), you need to be careful with get-credentials
for the correct cluster, otherwise you can run an intermediate deployment on the production cluster.
Since this is confusing, I changed my ZSH PROMPT to always see which cluster I come across. I adapted from zsh-kubectl-prompt :
Since you will have many clusters in a large application, I highly recommend adding this PROMPT to your shell.
How do you deploy the app in sub-instances of the nodes?
You must have a Docker file in the application's project repository to create a Docker image. This is one example of a Ruby on Rails application:
FROM ruby:2.4.3 ENV RAILS_ENV production ENV SECRET_KEY_BASE xpto RUN curl -sL https://deb.nodesource.com/setup_8.x | bash - RUN apt-get update && apt-get install -y nodejs postgresql-client cron htop vim ADD Gemfile* /app/ WORKDIR /app RUN gem update bundler --pre RUN bundle install --without development test RUN npm install ADD . /app RUN cp config/database.yml.prod.example config/database.yml && cp config/application.yml.example config/application.yml RUN RAILS_GROUPS=assets bundle exec rake assets:precompile
In the Google Cloud Web Console, you will find the “Container Registry” , which is the Private Docker Registry.
Add the remote URL to your local configuration as follows:
git remote add gcloud https://source.developers.google.com/p/my-project/r/my-app
Now you can git push gcloud master
. I recommend adding triggers to tag images. I add 2 triggers: one to mark it latest
and another to mark it with a random version number. You will need this later.
After adding the registry repository as a remote in your git configuration ( git remote add
), click on it. It should begin to create a Docker image with the appropriate tags that you have configured using triggers.
Make sure your Ruby on Rails application does not have anything in initializers that require a connection to the database because it is not available. You can get stuck when the Docker build completes with an error due to the fact that assets:precompile
- the task has loaded an initializer that accidentally calls the model, and this triggers ActiveRecord :: Base
triggers to attempt to connect.
Make sure that the Ruby version in the Dockerfile is the same as the version in the Gemfile, otherwise it will also fail.
Have you noticed the weird config/application.yml
. This is from figaro . I recommend to simplify the setting of the ENV variable in the system. I don’t like Rails secrets, and this isn’t very friendly to deployment systems after Heroku made ENV vars ubiquitous. Stick to ENV vars. Kubernetes will thank you for it.
Now you can override any environment variable from the Kubernetes and yaml deployment file. Now is the time to set an example. You can call it deploy / web.yml
or whatever it suits you. And, of course, check it out in the source code repository.
kind: Deployment apiVersion: apps/v1beta1 metadata: name: web spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 1 minReadySeconds: 10 replicas: 2 template: metadata: labels: app: web spec: containers: - image: gcr.io/my-project/my-app:latest name: my-app imagePullPolicy: Always ports: - containerPort: 4001 command: ["passenger", "start", "-p", "4001", "-e", "production", "--max-pool-size", "2", "--min-instances", "2", "--no-friendly-error-pages" "--max-request-queue-time", "10", "--max-request-time", "10", "--pool-idle-time", "0", "--memory-limit", "300"] env: - name: "RAILS_LOG_TO_STDOUT" value: "true" - name: "RAILS_ENV" value: "production" # ... obviously reduced the many ENV vars for brevity - name: "REDIS_URL" valueFrom: secretKeyRef: name: my-env key: REDIS_URL - name: "SMTP_USERNAME" valueFrom: secretKeyRef: name: my-env key: SMTP_USERNAME - name: "SMTP_PASSWORD" valueFrom: secretKeyRef: name: my-env key: SMTP_PASSWORD # ... this part below is mandatory for Cloud SQL - name: DB_HOST value: 127.0.0.1 - name: DB_PASSWORD valueFrom: secretKeyRef: name: cloudsql-db-credentials key: password - name: DB_USER valueFrom: secretKeyRef: name: cloudsql-db-credentials key: username - image: gcr.io/cloudsql-docker/gce-proxy:latest name: cloudsql-proxy command: ["/cloud_sql_proxy", "--dir=/cloudsql", "-instances=my-project:us-west1:my-db=tcp:5432", "-credential_file=/secrets/cloudsql/credentials.json"] volumeMounts: - name: cloudsql-instance-credentials mountPath: /secrets/cloudsql readOnly: true - name: ssl-certs mountPath: /etc/ssl/certs - name: cloudsql mountPath: /cloudsql volumes: - name: cloudsql-instance-credentials secret: secretName: cloudsql-instance-credentials - name: ssl-certs hostPath: path: /etc/ssl/certs - name: cloudsql emptyDir:
I will explain an example:
kind
and apiVersion
- important, follow the documentation, it may change. This is what is called Deployment . There used to be a replication controller (you will find it in old textbooks), but it is no longer used. Recommendation: use ReplicaSet .metadata:name
with the web
. Notice the spec:template:metadata:labels
, where I put each block labeled app: web
. You will need it to be able to select items later in the “Service” section.spec:strategy
in which we set up a rolling update.spec:replicas
announces how many pods I want. You will have to manually calculate the machine type of the pool of nodes, and then split the server resources for each application.spec: template: spec: container: image
. - name: "SMTP_USERNAME" valueFrom: secretKeyRef: name: my-env key: SMTP_USERNAME
This is a link to the “secret” repository, which I called my-env. And this is how you create your own:
kubectl create secret generic my-env \ --from-literal=REDIS_URL=redis://foo.com:18821 \ --from-literal=SMTP_USERNAME=foobar
Read the documentation, since you can download text files, and not declare everything from the command line.
As I said, I would prefer to use a managed service for the database. You can upload your Docker image, but I do not recommend it. The same applies to other services like databases, such as Redis, Mongo. If you are using AWS, keep in mind: Google Cloud SQL is similar to RDS.
After creating an instance of PostgreSQL, you will not be able to access it directly from the web application. You have a template for the second Docker image, “CloudSQL Proxy” .
To do this, you must first create a service account:
gcloud sql users create proxyuser host --instance=my-db --password=abcd1234
After creating a PostgreSQL instance, you will be prompted to load JSON credentials. Save them somewhere. I hope I do not have to remind you of the need for a strong password. Need to create additional secrets:
kubectl create secret generic cloudsql-instance-credentials \ --from-file=credentials.json=/home/myself/downloads/my-db-12345.json kubectl create secret generic cloudsql-db-credentials \ --from-literal=username=proxyuser --from-literal=password=abcd1234
They are mentioned in this part of the deployment:
- image: gcr.io/cloudsql-docker/gce-proxy:latest name: cloudsql-proxy command: ["/cloud_sql_proxy", "--dir=/cloudsql", "-instances=my-project:us-west1:my-db=tcp:5432", "-credential_file=/secrets/cloudsql/credentials.json"] volumeMounts: - name: cloudsql-instance-credentials mountPath: /secrets/cloudsql readOnly: true - name: ssl-certs mountPath: /etc/ssl/certs - name: cloudsql mountPath: /cloudsql volumes: - name: cloudsql-instance-credentials secret: secretName: cloudsql-instance-credentials - name: ssl-certs hostPath: path: /etc/ssl/certs - name: cloudsql emptyDir:
You must add the database name (my-db in our case) in the -instance command.
By the way, gce-proxy:latest
refers to version 1.09, when version 1.11 already existed. The new version has created a headache breakage of the connection and the extension of the waiting time. So I went back to a later version, 1.09, and everything worked out. So not everything is new well. In infrastructure, it is better to stick to stability.
It would be possible to make sure that it was not a problem. You may want to read this thread on the subject.
You can download a separate instance of CloudSQL instead of having it in each pane so that the pods can connect to only one proxy. I recommend reading this article .
Nothing seems to be affected. Therefore, you need to expose the pods through Node Port Service . Let's create the file deploy/web-svc.yamlfile
.
apiVersion: v1 kind: Service metadata: name: web-svc spec: sessionAffinity: None ports: - port: 80 targetPort: 4001 protocol: TCP type: NodePort selector: app: web
That is why I emphasized the importance of spec:template:metadata:labels
. We will use it in spec:selector
to select the correct containers.
Now we can expand these 2 pods as follows:
kubectl create -f deploy/web.yml kubectl create -f deploy/web-svc.yml
You can see that the pods are created using kubectl get pods --watch
.
In many textbooks, pods are provided through another service called Load Balancer . I'm not sure how well he behaves under pressure, whether he has SSL termination, and so on. I decided to power up with Ingress Load Balancer using the NGINX controller .
First of all, I decided to create for it a separate pool of nodes, for example:
gcloud container node-pools create web-load-balancer \ --cluster=my-web-production \ --node-labels=role=load-balancer \ --machine-type=g1-small \ --num-nodes 1 \ --max-nodes 3 --min-nodes=1 \ --enable-autoscaling
Once the large-pool
example is created, take care to add --node-labels
to install the controller instead of the default-pool
. You need to know the instance name of the node, and we can do it like this:
$ gcloud compute instances list NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS gke-my-web-production-default-pool-123-123 us-west1-a n1-standard-4 10.128.0.1 123.123.123.12 RUNNING gke-my-web-production-large-pool-123-123 us-west1-a n1-highcpu-8 10.128.0.2 50.50.50.50 RUNNING gke-my-web-production-web-load-balancer-123-123 us-west1-a g1-small 10.128.0.3 70.70.70.70 RUNNING
Save it:
export LB_INSTANCE_NAME=gke-my-web-production-web-load-balancer-123-123
You can manually reserve the external IP address and give it a name:
gcloud compute addresses create ip-web-production \ --ip-version=IPV4 \ --global
Suppose that it generated the reserved IP address "111.111.111.111". Save it:
export LB_ADDRESS_IP=$(gcloud compute addresses list | grep "ip-web-production" | awk '{print $3}')
Link an address with a load balancer instance:
export LB_INSTANCE_NAT=$(gcloud compute instances describe $LB_INSTANCE_NAME | grep -A3 networkInterfaces: | tail -n1 | awk -F': ' '{print $2}') gcloud compute instances delete-access-config $LB_INSTANCE_NAME \ --access-config-name "$LB_INSTANCE_NAT" gcloud compute instances add-access-config $LB_INSTANCE_NAME \ --access-config-name "$LB_INSTANCE_NAT" --address $LB_ADDRESS_IP
Add the rest of the Ingress Deployment configuration. This may take a long time, but basically we will follow the pattern. Let's start by defining another web application called default-http-backend
. It will be used to respond to HTTP requests if web containers are unavailable for any reason. Let's call it deploy / default-web.yml
:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: default-http-backend spec: replicas: 1 template: metadata: labels: app: default-http-backend spec: terminationGracePeriodSeconds: 60 containers: - name: default-http-backend # Any image is permissable as long as: # 1. It serves a 404 page at / # 2. It serves 200 on a /healthz endpoint image: gcr.io/google_containers/defaultbackend:1.0 livenessProbe: httpGet: path: /healthz port: 8080 scheme: HTTP initialDelaySeconds: 30 timeoutSeconds: 5 ports: - containerPort: 8080 resources: limits: cpu: 10m memory: 20Mi requests: cpu: 10m memory: 20Mi
No need to change anything. You are now familiar with the deployment pattern. We need to unmask the template via NodePort, so let's add deploy/default-web-svc.yml
:
kind: Service apiVersion: v1 metadata: name: default-http-backend spec: selector: app: default-http-backend ports: - protocol: TCP port: 80 targetPort: 8080 type: NodePort
Again, do not change anything. The following 3 files are important parts. First, we create the NGINX load balancer, let's call it deploy / nginx.yml
:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: nginx-ingress-controller spec: replicas: 1 template: metadata: labels: k8s-app: nginx-ingress-lb spec: # hostNetwork makes it possible to use ipv6 and to preserve the source IP correctly regardless of docker configuration # however, it is not a hard dependency of the nginx-ingress-controller itself and it may cause issues if port 10254 already is taken on the host # that said, since hostPort is broken on CNI (https://github.com/kubernetes/kubernetes/issues/31307) we have to use hostNetwork where CNI is used hostNetwork: true terminationGracePeriodSeconds: 60 nodeSelector: role: load-balancer containers: - args: - /nginx-ingress-controller - "--default-backend-service=$(POD_NAMESPACE)/default-http-backend" - "--default-ssl-certificate=$(POD_NAMESPACE)/cloudflare-secret" env: - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace image: "gcr.io/google_containers/nginx-ingress-controller:0.9.0-beta.5" imagePullPolicy: Always livenessProbe: httpGet: path: /healthz port: 10254 scheme: HTTP initialDelaySeconds: 10 timeoutSeconds: 5 name: nginx-ingress-controller ports: - containerPort: 80 name: http protocol: TCP - containerPort: 443 name: https protocol: TCP volumeMounts: - mountPath: /etc/nginx-ssl/dhparam name: tls-dhparam-vol volumes: - name: tls-dhparam-vol secret: secretName: tls-dhparam
Notice that nodeSelector
creates the node label that we added when we created the new node pool.
You may want to work with labels, the number of replicas. It is important to note that the volume I called tls-dhparam-vol
is mounted here. This is Diffie Hellman Ephemeral Parameters
. We generate it:
sudo openssl dhparam -out ~/documents/dhparam.pem 2048 kubectl create secret generic tls-dhparam --from-file=/home/myself/documents/dhparam.pem kubectl create secret generic tls-dhparam --from-file=/home/myself/documents/dhparam.pem
Please note that I am using version 0.9.0-beta_5 for the controller image. It works well, no problems so far. But watch out for release notes and for new versions and do your own testing.
Again, let's unmask this Ingress controller through a load balancing service. Let's call it deploy / nginx-svc.yml
:
apiVersion: v1 kind: Service metadata: name: nginx-ingress spec: type: LoadBalancer loadBalancerIP: 111.111.111.111 ports: - name: http port: 80 targetPort: http - name: https port: 443 targetPort: https selector: k8s-app: nginx-ingress-lb
Remember the static external IP we reserved above and saved to LB_INGRESS_IP ENV var
? You need to include it in the spec: loadBalancerIP
section. This is also the IP address that you will add as a “record” in your DNS service (say, mapping your www.my-app.com.br with CloudFlare).
Now we can create our own configuration Ingress, let's create deploy / ingress.yml
as follows:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: ingress annotations: kubernetes.io/ingress.class: "nginx" nginx.org/ssl-services: "web-svc" kubernetes.io/ingress.global-static-ip-name: ip-web-production ingress.kubernetes.io/ssl-redirect: "true" ingress.kubernetes.io/rewrite-target: / spec: tls: - hosts: - www.my-app.com.br secretName: cloudflare-secret rules: - host: www.my-app.com.br http: paths: - path: / backend: serviceName: web-svc servicePort: 80
So we connected the NodePort service created for the web modules with the nginx login controller and added SSL termination via spec: tls: secretName
. How to create it? First, you must purchase an SSL certificate using CloudFlare as an example.
When you purchase, the provider must provide you with the secret download files (keep them safe! The shared folder with Dropbox is not safe!). Then you must add it to the infrastructure as follows:
kubectl create secret tls cloudflare-secret \ --key ~/downloads/private.pem \ --cert ~/downloads/fullchain.pem
Now that we have edited a lot of files, we can deploy the entire load balancer package:
kubectl create -f deploy/default-web.yml kubectl create -f deploy/default-web-svc.yml kubectl create -f deploy/nginx.yml kubectl create -f deploy/nginx-svc.yml kubectl create -f deploy/ingress.yml
This NGINX Ingress configuration is based on a Zihao Zhang blog post . There are also examples in the cubernet incubator repository . You can check it out.
If you did everything right, https://www.my-app-com.br should download your web application. You can check the time on the first byte (TTFB) via CloudFlare as follows:
curl -vso /dev/null -w "Connect: %{time_connect} \n TTFB: %{time_starttransfer} \n Total time: %{time_total} \n" https://www.my-app.com.br
If you have a slow TTFB:
curl --resolve www.my-app.com.br:443:111.111.111.111 https://www.my-app.com.br -svo /dev/null -k -w "Connect: %{time_connect} \n TTFB: %{time_starttransfer} \n Total time: %{time_total} \n"
TTFB should be around 1 second or less. Otherwise, it means that there is an error in the application. It is necessary to check the types of instances of machine nodes, the number of workers loaded on one module, the version of the CloudSQL proxy, the version of the NGINX controller, etc. This is a trial and error procedure. Subscribe to Loader or Web Page Test for understanding.
Now that everything is working, how do I perform the Rolling Update update, which I mentioned at the beginning? First, run git push
to the Container registry repository and wait until the Docker image is created.
Remember, I made the trigger put an image with a random version number? Let's use (you can see it in the Build History list in the Google Cloud Container Registry):
kubectl set image deployment web my-app=gcr.io/my-project/my-app:1238471234g123f534f543541gf5 --record
You must use the same name and image that is declared in deploy/web.yml
. A rolling update, the addition of a new module, and then the completion of the block will begin until all of them are updated without any downtime for users.
Rolling updates should be performed carefully. For example, if a database migration is required for deployment, you should add a maintenance window (you need to do it in the middle of the night, when the traffic volume is low). Thus, you can run the migration command as follows:
kubectl get pods # to get a pod name kubectl exec -it my-web-12324-121312 /app/bin/rails db:migrate # you can also bash to a pod like this, but remember that this is an ephemeral container, so file you edit and write there disappear on the next restart: kubectl exec -it my-web-12324-121312 bash
To redistribute everything without resorting to a rolling upgrade, you need to do the following:
kubectl delete -f deploy/web.yml && kubectl apply -f deploy/web.yml
On my list “I want” there was an item - to have permanent mounted storage with automatic backup / snapshots. Google Cloud provides half of this. To connect to the modules, you can create permanent disks, but without the function of automatic backup. However, the repository has an API for creating a snapshot manually.
Create a new SSD disk and format it:
gcloud compute disks create --size 500GB my-data --type pd-ssd gcloud compute instances list
The last command allows you to copy the instance name of the node. Suppose it is gke-my-web-app-default-pool-123-123
. We will attach a disk my-data
to it:
gcloud compute instances attach-disk gke-my-web-app-default-pool-123-123 --disk my-data --device-name my-data gcloud compute ssh gke-my-web-app-default-pool-123-123
ssh . sudo lsblk
, 500 , , / dev / sdb
. , , !
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
SSH :
gcloud compute instances detach-disk gke-my-web-app-default-pool-123-123 --disk my-data
, yaml :
spec: containers: - image: ... name: my-app volumeMounts: - name: my-data mountPath: /data # readOnly: true # ... volumes: - name: my-data gcePersistentDisk: pdName: my-data fsType: ext4
CronJob deploy / auto-snapshot.yml
:
apiVersion: batch/v1beta1 kind: CronJob metadata: name: auto-snapshot spec: schedule: "0 4 * * *" concurrencyPolicy: Forbid jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: auto-snapshot image: grugnog/google-cloud-auto-snapshot command: ["/opt/entrypoint.sh"] env: - name: "GOOGLE_CLOUD_PROJECT" value: "my-project" - name: "GOOGLE_APPLICATION_CREDENTIALS" value: "/credential/credential.json" volumeMounts: - mountPath: /credential name: editor-credential volumes: - name: editor-credential secret: secretName: editor-credential
, IAM & admin Google Cloud, JSON , , :
kubectl create secret generic editor-credential \ --from-file=credential.json=/home/myself/download/my-project-1212121.json
: cron. «0 4 *» , 4 .
, , . Kubernetes, Deployment, Service, Ingress, ReplicaSet, DaemonSet .
, multi-region High Availability, .
: My Notes about a Production-grade Ruby on Rails Deployment on Google Cloud Kubernetes Engine
Source: https://habr.com/ru/post/348394/
All Articles