Cloud Run for Anthos Tutorial

Introduction

In this article, I will demonstrate how to create and deploy a service with Cloud Run for Anthos. You can go through my previous article to understand the basics of Cloud Run for Anthos.

Pre-requisites

A Google Cloud project preferably with Owner access.
A basic understanding of Cloud Run for Anthos
Fair understanding of Kubernetes

Setting up a GKE Cluster

Cloud Run for Anthos is used to run serverless workloads on GKE cluster. When you create a GKE cluster, you can enable Cloud Run for Anthos as an add-on component. For that matter, you could even enable Cloud Run for Anthos to an existing cluster.

The following command creates a GKE cluster with Cloud Run for Anthos enabled:

gcloud container clusters create cluster-1 \
--zone=asia-southeast1-a \
--addons=HttpLoadBalancing,CloudRun \
--machine-type=e2-standard-4 \
--num-nodes=4 \
--enable-stackdriver-kubernetes

You will create a cluster named cluster1 in the asia-southeast1-a (Singapore) zone. It is recommended to setup a cluster with atleast 4 nodes having 4 vCPUs for Cloud Run for Anthos to operate properly. Once the cluster is created, you could set the default Cloud Run for Anthos properties for your cluster

The following command sets the default values for cluster and cluster location:

gcloud config set run/cluster cluster-1
gcloud config set run/cluster_location asia-southeast1-a

Once you have set the cluster default values, you then need not specify it when running any Cloud Run for Anthos commands.

Developing a Service

You will create a simple service named nodeapp that will simply print the message This is a simple service. You will use Node.js to write the service and build it into Docker container. The container image will be pushed to Google Cloud Registry (gcr.io).

Below is the Node.js code that represents our service:

'use strict';

const express = require('express');

const PORT = 9000;
const HOST = '0.0.0.0';

const app = express();
app.get('/', (req, res) => {
  res.send('This is a simple service \n\n');
});

app.listen(PORT, HOST);
console.log(`Running on http://${HOST}:${PORT}`);

The above code uses express module to setup a small web runtime listening on port 9000. You have to make sure the service is stateless and not writing anything to local persistent storage. You will have to write a Dockerfile that will be used to build the code into a container.

The below is the content of the Dockerfile:

FROM node:14

# Create app directory
WORKDIR /usr/src/app

# Install app dependencies
COPY package*.json ./

RUN npm install

# Bundle app source
COPY . .

CMD [ "node", "server.js" ]

You can build the above code using Cloud Build.

cloud builds submit --tag=gcr.io/$GOOGLE_PROJECT/nodeapp

The above command will build the application code into container image and push it to container registry (gcr.io)

Deploying the Service

In Cloud Run for Anthos (CRA), the service represents application workload. The application must be backed by a container image already stored in a container registry. When you deploy a serverless service in CRA, it creates a revision which are immutable artifacts. A service can have more than one revisions. There are several ways in which you can deploy a service with CRA. You could use cloud console, write a service resource YAML or directly deploy using the command line. The below command deploys our service named nodeapp in the default anthos namespace:

gcloud run deploy nodeapp --image gcr.io/anthos-book-322415/nodeapp --cluster cluster-1 --cluster-location asia-southeast1-a --port 9000

As you can see it is so easy and simple to deploy a service and run it as a workload. All you need is the name of the service and the URL of the container image. The cluster name and location is not needed if you have already specified cloud run default cluster settings as part of configuration. The port number is required if the service is running on the port other than 8080. As you can see, you do not have to deal with Kubernetes resource like Deployments or Service or for that matter, even Ingress. The CRA service abstracts away the Kubernetes world for you. By default, it deploys the service in anthos namespace, so make sure you have already created the said namespace before deploying the service. If you want to deploy to some other custom namespace then specify it using –namespace option. When you deploy a service it is automatically assigned example.com domain. It does not serve as a load balancer or a proper DNS mapped entity and therefore the ingress request to this service domain will fail. You will have to use custom domain to appropriately invoke the service using the domain name. I will demonstrate the use of custom domain in a separate use case. For now, you can use Istio based ingress external IP to invoke the service – we will see this in the later section.

Viewing and Accessing the Service

You can view the the deployed services using the following command:

gcloud run services list

Output:

For cluster [cluster-1] in [asia-southeast1-a]:
SERVICE NAMESPACE URL LAST DEPLOYED BY LAST DEPLOYED AT
✔ nodeapp anthos http://nodeapp.anthos.example.com developer@xxx.com 2021-08-20T13:03:22Z

The above output displays the service name, namespace and the service endpoint. The endpoint is in the format http://<service>.<namespace>.example.com The service endpoint with example.com domain, as mentioned earlier, cannot be accessed directly as it is not mapped to DNS record. You can call the service using Istio ingress gateway. In order to do that, you will have to know the IP address and port of the Istio ingress service called istio-ingress as part of gke-system namespace. The istio-ingress service will serve as a load balancer for external traffic. The following command will give you the ingress load balancer IP. You can note down the external IP of the istio-ingress service which will be of type LoadBalancer in the gke-system namespace.

kubernetes -n gke-system get svc

You can invoke the nodeapp service by giving the following command:

curl -H 'host:nodeapp.anthos.example.com' http://<ingress-external-ip>/

Output:

This is a simple service

The service that you just invoked is ultimately backed by Kubernetes pod. The pods will be active as far as the service revision is active. If the service is not invoked or accessed for more than 5 minutes, the CRA will automatically scale down the pods to zero.

Updating the Service

When you create or update a CRA service, it creates a revision which is a versioned immutable workload that you access.

Let’s look at the service YAML.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: nodeapp
  namespace: anthos
spec:
  template:
    metadata:
      name: nodeapp-00003-bap
    spec:
      containerConcurrency: 0
      containers:
      - image: gcr.io/anthos-book-322415/nodeapp
        name: user-container
        ports:
        - containerPort: 9000
          protocol: TCP

The service revision name is nodeapp-00003-bap. Any updates made under template->spec section, will create another revision of the service.

The below command will list the revisions for your service:

gcloud run revisions list

Output:

For cluster [cluster-1] in [asia-southeast1-a]:
   REVISION           ACTIVE  SERVICE  DEPLOYED                 DEPLOYED BY
✔  nodeapp-00003-bap          nodeapp  2021-08-20 13:03:16 UTC  developer@xxx.com

For now we have only one revision for our nodeapp service. Let’s update the service to add an environment variable.

The below command updates the service to add an environment variable.

gcloud run services update nodeapp --set-env-vars=ENV=dev

Now if you list the service revisions, there will be two of them.

For cluster [cluster-1] in [asia-southeast1-a]:
   REVISION           ACTIVE  SERVICE  DEPLOYED                 DEPLOYED BY
✔  nodeapp-00004-voq  yes     nodeapp  2021-08-22 08:55:55 UTC  developer@xxx.com
✔  nodeapp-00003-bap          nodeapp  2021-08-20 13:03:16 UTC  developer@xxx.com

The latest one will be the active revision having our new environment variable. The service revisions allows you to roll back to previous version or any known good version. Now when you observe the pods in the anthos namespace, you should see the pod associated to the latest service revision. The pod associated to previous revision is terminated as there are no requests to it for the last 1 minute. When you invoke the service, it will automatically point to the latest revision.

Scaling the Service

With CRA, you can explicitly set the service scaling limits using the command line options –min-instances and –max-instances. If you are working with YAML, then the annotations autoscaling.knative.dev/maxScale and autoscaling.knative.dev/minScale can be used.

Let’s understand these scaling options:

minScale

When there are no requests to the serving application, the CRA by default will scale down the services (pods) to zero. This could seem obvious choice as far as autoscaling is concerned, but there arises the problem of cold start. It is the wait time or a transition time for service to scale from zero to one. When your service is scaled to zero, a request to that service will scale the pod from zero to few to handle the request. The time it takes to scale from zero to n is the cold start latency. As the cold start latency can give you undesirable effect, it is recommended to set the minimum service replica at the start.

The following command can be used to set the minimum service replica:

gcloud run services update nodeapp --min-instances 2

The above command will make sure that there will always be minimum 2 replica of service pods running.

maxScale

The CRA dynamically scales your services as per the request load. There is no upper scaling limit by default. This may not be desirable as it could potentially use all your compute resources available on the nodes. In order to control the resource usage, you can set the maximum scaling limit for your service.

The following command can be used to set the maximum number of replica:

gcloud run services update nodeapp --max-instances 7

The above command will make sure that the CRA autoscaler can scale your service pods to max 7 replicas.

As you see it is important to observe you autoscaling behaviors and accordingly set minimum and maximum limits for your service replicas.