In this tutorial, we will discover Kubernetes (often abbreviated as “K8s”) and the Google Kubernetes Engine (GKE). We will also discover some other/general aspects of the Google Cloud Platform (GCP) interface.

There are 3 main parts in the tutorial:

Instantiating a K8s cluster on GCP and deploying a basic web server on the cluster;
Working with a load balancer to distribute HTTP requests over several instances of a web server;
Leveraging the resource autoscaling features provided by K8s.

Through this tutorial, we raise some questions that you should try to answer
in order to get a better understanding of Google Cloud and Kubernetes.
Also, feel free to run extra tests to better observe how the system behaves.

You should also refer to the Kubernetes documentation and glossary to better understand the different concepts that will be used.

Before starting

To be able to execute the instructions presented in this tutorial, you first need to:

Activate your Google Cloud Education credits as described here
Have a machine where the required tools (Google Cloud SDK and Docker) are installed and configured

Having a configured machine

Regarding the second point, two main solutions exist:

Using the Google Cloud Shell (https://cloud.google.com/shell)
- Gives you access to an already configured machine through your web browser
- To get access to the Cloud Shell, simply click on “Activate Cloud Shell” on the top right of GCP console web page
- IMPORTANT: When you are using Cloud Shell, GCP creates a virtual machine to run this shell. Note that this virtual machine is distinct from the nodes of the GKE cluster that we will create in this lab session. Thus, regarding the network configuration, the Cloud Shell will be an external client from the point of view of your GKE cluster.
- The Cloud Shell is a free service for GCP users.
- Cloud Shell provisions 5 GB of persistent disk storage mounted as your $HOME directory on the Cloud Shell instance. All files you store in your home directory, including scripts and user configuration files like .bashrc and .vimrc, persist between sessions.
Using you own machine:
- To install and configure the Cloud SDK, please follow the instructions here (“Installing gcloud” and “Configuring gcloud” steps)
- To interact with a Kubernetes cluster, you will need the kubectl command line tool. It can be simply installed by running the following command in your terminal once gcloud is installed and configured: gcloud components install kubectl
- To install Docker (the Docker Engine), please follow the instructions available here
  - A short summary to install docker on a Ubuntu machine is available here

The description of the tutorial assumes that you use the cloud shell, but everything should also work if you use your own machine.

WARNING

!!! If you don’t delete the resources you allocate, they keep consuming credits on GCP even when you are disconnected. !!!

No matter how fast you progress with the lab, you should delete all the resources that you created before terminating.

Please see this section of the tutorial for instructions on resource deletion.

First Experiments with Kubernetes

In this first part, we are going to discover K8s and GKE through the deployment of a simple web server.

This part is strongly inspired from the following GCP tutorial: Deploying a containerized web application

Before starting, it is necessary to enable the compute and the container API in GCP. You can do that using the following command:

gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com

Executing these commands might take some time. You can then use the following command to check that the API has been enabled:

gcloud services list

Description of the web server

The web server we are going to play with is a simple application that answers all HTTP requests with a “Hello World” message that includes the hostname of “machine” on which the server executes:

Hello, world!
Version: 1.0.0
Hostname: 9eb6da3d8c54

(The source code of this application is available here)

Testing the web server locally

Before deploying the web server on a K8s cluster in the cloud, we can run it locally to check that it works.

The Docker image of the web server is already published with the following identifier:

us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0

To run the container locally, use the following command:

docker run --rm -p 8080:8080 us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0

The command shows that the container listens on (TCP) port 8080. To make a request on port 8080, you can:

Use the command line: curl localhost:8080. (Notice that you can open multiple tabs in the Cloud shell GUI, similarly to the graphical interface of a local/desktop terminal emulator.)
Use your web browser if you use your own machine
Use the web preview button if you use the cloud shell

Then, run the following command (while the container is still running):

docker container ls

Based on the output of the above command, what exactly is the (default) hostname used for
the "machine" that hosts the web server?

Note: You can try to replace the default hostname with another one, by adding the following option in the above docker run command before the image name: --hostname [CHOSEN_HOSTNAME]

Finally, you can stop and destroy the container using:

docker rm -f [CONTAINER_ID]

Creating a K8s cluster

Configure the default zone

Before deploying a K8s cluster, you need to select the geographical zone in which you would like to work. The list of all zones is available here: https://cloud.google.com/compute/docs/regions-zones#available

To obtain an up-to-date list of available zones, run:

gcloud compute zones list

To check the default zone currently set for your project, run:

gcloud config get compute/zone

In the following, we will work in the europe-west6-a zone.

To configure the default zone for your project, run:

gcloud config set compute/zone europe-west6-a

Creating a GKE cluster

To create a GKE K8s cluster named hello-cluster using the default options, run:

gcloud container clusters create hello-cluster

It will take some time.

In the meantime, open the Kubernetes Engine page in the cloud console (can be found through the search tool). You will be able to observe that your new cluster appears and is in the process of being created.

Observing the newly created cluster

Once the cluster is created, start by spending some time on the Kubernetes Engine pages to observe the information provided about your cluster. For instance, you should be able to see:

The list of nodes composing your cluster
The monitoring of the resource consumption on each node

- Are there pods already running on the nodes of your cluster?
- If so, what do these pods correspond to?
- By the way, what are "pods"?

You can also observe the state of our cluster through the command line. To do so, we first need to ensure that we are connected to our GKE cluster:

gcloud container clusters get-credentials hello-cluster

To get the list of nodes running belonging to your cluster, run:

kubectl get nodes

To get details about a specific node:

kubectl describe nodes [NODE_ID]

To get all the running pods:

kubectl get pods --all-namespaces

What do we observe about the Namespace of existing pods?

Deploying our web server

As an introduction to this step, we quote the GCP tutorial:

Kubernetes represents applications as Pods, which are scalable units holding one or more containers. The Pod is the smallest deployable unit in Kubernetes. Usually, you deploy Pods as a set of replicas that can be scaled and distributed together across your cluster. One way to deploy a set of replicas is through a Kubernetes Deployment.

Basic Deployment

Creating a Deployment for our web server using the docker image is done as follows:

kubectl create deployment hello-app --image=us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0

At this point, a new pod should appear in the default namespace:

kubectl get pods

You can get a description of this pod:

kubectl describe pods [POD_ID]

You can also observe the new deployment in the Workloads tab of the Kubernetes Engine page in the cloud console.

- What is the IP address of the pod?
- In your opinion, from which machine(s) is the web server accessible?

Testing the access to the web server

To get a more accurate answer to the previous question, we are going to run some tests.

Try to access the web server from your local machine using its IP address:

curl [IP_ADDRESS]:8080

We will now connect to the node where the pod is deployed using SSH with the following command:

gcloud compute ssh [NODE_ID]

From this node, we can again try to access the web server using its IP address.

To get a better understanding, you can even try to connect to a different node and run the test again.

Finally, we will create another pod with an interactive session from which we will be able to run a curl command:

kubectl run curl-test --image=radial/busyboxplus:curl -i --tty --rm

What do you conclude?

Note that the --rm implies that the Pod will be deleted automatically when you disconnect. Still, if you need to delete a pod manually, here is the command to run:

kubectl delete pod [POD_ID]

Observing the activity of Pods

We can access the logs generated by a pod using the following command:

kubectl logs [POD_ID]

We can even stream the logs using the following command:

kubectl logs -f [POD_ID]

Open a new tab in your shell to send a request to the server using one of the previous methods and observe what happens in the logs.

Observing the deployment

Until now, we have focused on the pod that has been created to instantiate the web server. As we have mentioned earlier, this is actually a Deployment: a higher-level concept in Kubernetes, which can, among other things manage a set of replicas for a component. You can read more about Deployments here.

We can get information about our Deployment:

kubectl get deployments

To get a more detailed description run:

kubectl describe deployments [DEPLOYMENT_NAME]

Replication

Until now we have a single instance of our web server. To have a better fault tolerance, we may want to create several instances of our web server, as follows:

kubectl scale deployment hello-app --replicas=3

You can then observe the impact of this command on your Deployment and on your pods.

kubectl get pods

Log into one VM of the k8s cluster and try sending requests to at least two different replicas of the web server.

To get the IP address of each pod easily, you can run the command:

kubectl get pods -o yaml | grep podIP:

Beyond the fact that the web server is still not accessible from the outside of the cluster, 
what is the major limitation of this configuration in your opinion?

To manage multiple replicas, K8s has created a ReplicaSet. You can read more about Replicasets here.

To get information about the created ReplicaSet, you can run the following command:

kubectl get rs
kubectl describe rs

Using the following command, we can delete one pod from the ReplicaSet:

kubectl delete pod [PODID]

Observe what happened and explain.

Service

One major limitation of our Deployment is the fact that we can only send requests to each replica using its IP address. Thus, accessing the web server is neither transparent nor flexible.

To solve this problem, we are going to create a Service that will make the set of replicas composing our Deployment appear as a single service that can be accessed using a name.

To create a service for the 3 replicas of our web server, run:

kubectl expose deployment/hello-app --port 7000 --target-port 8080

The --port option defines the port on which the service will be accessible. The --target-port option defines the port of the pods to which the requests will be forwarded.

To see the newly created service, run:

kubectl get svc

To get detailed information about the created Service, run:

kubectl describe svc [SERVICE_NAME]

In the description of the Service:
- What does the IP field correspond to?
- What do the Endpoints correspond to?

To see how we can interact with the created service, finally, we will recreate a pod with an interactive session from which we will be able to run a curl command:

kubectl run curl-test --image=radial/busyboxplus:curl -i --tty --rm

- Try to send a request to the web servers using Endpoints. Explain.
- Try to send a request to the web servers using the service IP.
  - Which port should you use?
  - By which replica of the service is a request handled?
- Try to send a request to the web servers using the service name. Explain.

To send a request using the service name, you can use the following command:

curl [SERVICE_NAME]:[PORT_NUMBER]

Exposing a service on the Internet

The limit of our service for now is that it is not accessible outside of the K8s cluster. This is because the type of created service is ClusterIP. A basic description of the possible ServicesTypes is available here.

There are 3 solutions to make the service accessible from the outside:

Using a NodePort service type. This will make the service accessible from the outside through a port opened on each node of our K8s cluster. This will work if:
- At least one node of your K8s cluster has a public IP address
- A firewall rule is defined to allow TCP traffic to the opened port
Using a LoadBalancer service type. This will ask the cloud provider to create an external load balancer for your service. This external load balancer has a public IP address and will forward the traffic to the instances of your service using its own load-balancing rules.
Using an Ingress that can expose HTTP routes from outside the cluster to services

In the following, we are going to use an external load balancer to make our service accessible to the outside.

First, we start by deleting the service we created previously:

kubectl delete svc/hello-app

Then, we will recreate the service with the right options:

kubectl expose deployment/hello-app --port 7000 --target-port 8080 --type LoadBalancer

After running the command, you should be able to observe the created LoadBalancer service in the Services & Ingress tab of the Kubernetes Engine page in the cloud console.

We can observe the state of the created service:

kubectl get svc

And observe after a bit of time that it got assigned an external IP address.

From this point on, you should be able to access your service from another machine in GCP associated to the same project (for example Cloud Shell).

However, to make the external IP address and port reachale from any machine connected to the Internet, it is also necessary to perform an additional step to modify the GCP firewall rules.

To allow incoming traffic on machines in GCP on a specific [HOST_PORT], the following command should be used.

gcloud compute --project=[PROJECT_ID] firewall-rules create default-allow-[HOST_PORT] \
    --direction=INGRESS --priority=1000 --network=default --action=ALLOW \
    --rules=tcp:[HOST_PORT] --source-ranges=0.0.0.0/0

From this point on, you should be able to access your service from any machine connected to the Internet (unless there is a firewall perfoming filtering on the client-side of the network - note that wifi-campus, eduroam, and even probably the ensimag network, fall in this category so accessing the service will not work if your client machine uses one of these networks).

To delete the firewall rule (in order to forbid again the external incoming traffic), use the following command:

gcloud compute --project=[PROJECT_ID] firewall-rules delete default-allow-[HOST_PORT]

To go further

If you are interested in trying to make your service accessible through a NodePort, please try to follow this tutorial.

Resource limits and Autoscaling

The last point that we want to study is the autoscaling capabilities offered by K8s.

In our current configuration, 3 replicas of our web service are always running no matter the load. This is a waste of resources if there are almost no request to be processed.

We can observe the resource utilization of the pods by running:

kubectl top pod

You should observe that the amount of CPU consumed in the current configuration is 0 (or almost 0).

To be able to apply autoscaling to a deployment, we first need to define the limit in the amount of resources (in the following, we will focus on the CPU) that each pod can use.

To do so, we need to delete our deployment and create a new one. To delete the deployment, run:

kubectl delete deployment hello-app

Instead of running manually the kubectl commands to configure our deployments as we did until now, we will use a yaml config file that includes all the configurations information about our deployment.

To get this config file, run the following command:

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples

This git repository includes several examples of applications that are used for GCP K8s tutorials. Among them, you can observe that our hello-app is present in the directory kubernetes-engine-samples/hello-app/

In the hello-app directory, there is a manifests/helloweb-deployment.yaml that defines a basic configuration for our web server deployment.

You can open this file using a command-line text editor, or using the text editor associated with the cloud shell (click on the Open Editor button). Some information about how to read this file is provided here.

Take some time to observe this file and answer the following questions:

- What is the name of the Deployment that will be created using this file?
- On which port are the created pods going to listen?

Resource request and limits

The following page provides information about resource requests and limits in k8s.

In the `yaml` file, we observe that a request `cpu: 200m` is defined. What does it imply?

Modify the provided yaml file as follows, to set the CPU request and limit to 1:

    spec:
      containers:
      - name: hello-app
        image: us-docker.pkg.dev/google-samples/containers/gke/hello-app:1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: 1
          limits:
            cpu: 1

To run the Deployment described in the yaml file, run the following command:

kubectl apply -f kubernetes-engine-samples/hello-app/manifests/helloweb-deployment.yaml

Observe the created Deployment and Pod:

What do you observe regarding the Pod?

To allow the pod to be created, we need to set a lower CPU request. Modify the yaml file to set the CPU request and limit to 20m.

Delete the current Deployment and start a new one:

kubectl delete deployment helloweb
kubectl apply -f kubernetes-engine-samples/hello-app/manifests/helloweb-deployment.yaml

To be able to access the new deployment, we have to delete the old service and create a new one:

kubectl delete service hello-app
kubectl expose deployment/helloweb --port 7000 --target-port 8080 --type LoadBalancer

Injecting load

In the following, to be able to stress our service, we will use an additional pod that will infinitely send requests to the service.

To launch the load-generator pod, run:

kubectl run -i --tty load-generator --rm --image=radial/busyboxplus:curl --restart=Never -- /bin/sh -c "while true; do curl helloweb:7000 >/dev/null 2>&1; done"

After a few 10s of seconds, you should be able to observe that our single pod in our deployment uses up to its CPU resource limit:

kubectl top pod

It means that our service is overloaded and does not manage to process all the requests that are sent to it.

Stop the load-generator pod for now using Ctrl+C.

Autoscaling

To avoid creating a static large number of replicas to deal with load spikes (that would waste resources as we have seen before when the load is low), K8s can adapt the number of replicas to the load by setting an HorizontalPodAutoscaler

To create an HorizontalPodAutoscaler (HPA), run:

kubectl autoscale deployment helloweb --cpu-percent=60 --min=2 --max=5

What does the parameter --cpu-percent=60 correspond to?
What is the number of Pods after running this command and why?

Note that to monitor the state of the HPA, you can use the following command:

kubectl get hpa --watch

Finally, to observe how the HPA behaves, re-launch the load-generator pod.

Observe and explains what happens.

Note that it may take a bit of time before the HPA takes actions.

About the Kubernetes cluster (optional)

This section raises some questions that are related to the way Kubernetes works. Trying to answer these questions can be a good way to better understand the internals of Kubernetes.

This overview of the Kubernetes components may help you to address the questions below.

When listing all pods in a cluster (using the command kubectl get pods --all-namespaces), in the column READY, you should observe a number such as 1/1, 2/2 or 4/4 for each pod.

What does this number correspond to? (Exploring through the web 
interface the state of the cluster and the pods can help you 
answering this question)

After running the command kubectl get nodes to get the list of nodes composing your cluster, you can get detailed information about the status of a node by running:

kubectl describe node [NODE_ID]

Among the information we get through this command, we learn about the Capacity of the node and about the Allocatable resources.

Explain why the allocatable resources is less than the capacity.

An explanation might be found here

When listing all pods in a cluster, we observe that system pods are running. Among these pods, some correspond to kube-proxy.

What is the purpose of the kube-proxy pods?

To go further, you can try answering this question for the other services running in the cluster.

When observing the kube-proxy service, we can see that it has as many instances as the number of nodes in the cluster. For other services, we observe that it is not necessary the case. kube-dns is an example of such services.

Why the kube-dns service does not have one instance per node?

For more information regarding the role and configuration of Kubernetes DNS, see this page.

Finally, regarding the control plane components (described here) can you tell where they are deployed? The following commands and link may help you:

kubectl-cluster-info
kubectl-get-componentstatus
GKE documentation on “control plane security”

Cleaning

You are reaching the end of this lab. Several resources needs to be deleted before disconnecting from GCP.

To delete the HPA, run:

kubectl delete hpa helloweb

To delete the service, run:

kubectl delete service helloweb

To delete the deployment, run:

kubectl delete deployment helloweb

Finally, to delete the GKE cluster, run:

gcloud container clusters delete hello-cluster

Going further (optional)

If you have some time and are interested in digging deeper, you are encouraged to look at one or several of the following resources:

StatefulSets basics: a tutorial illustrating the concepts of StatefulSets and PersistentVolumes in Kubernetes. In brief, these concepts are aimed at dealing with the deployment and scaling of pods and the management of storage resources used in stateful applications.
- StatefulSets provide guarantees about the ordering and uniqueness of pods. In particular, each pod has a persistent identifier (i.e., which remains the same across pod rescheduling). An individual pod in a StatefulSet may fail but can be replaced with another one that will be associated with the same identifier.
- PersistentVolumes are storage resources in a cluster whose life cycle is independent of any individual pod that uses these volumes. Hence, the data stored on these volumes still exists after the shutdown of an application. StatefulSets are useful to define the association between pods and PersistentVolumes.
The documentation about the main Kubernetes concepts and particularly the following sections:
The “learn Kubernetes basics” tutorial.
The Kubernetes configuration best practices
The Kubernetes training workshops of Jérôme Petazzoni [web page] [github]. In particular:
- Getting Started With Kubernetes and Container Orchestration (2019) [slides] [video]
- Kubernetes Fundamentals (2021) [slides]
- Advanced Kubernetes (2021) [slides]

Discovering GCP and Kubernetes