Spring Cloud Data Flow workshop on Kubernetes.
It is possible to follow the instructions to repeat this guide in your laptop or an EC2 instance (image:
ami-03e97315b2269f290
; region:us-west-2
).
Ignore this if you're following this workshop to repeat it outside of the Strigo platform.
We will be using Minikube as the single-node Kubernetes cluster with --vm-driver=docker
as the driver. You need at least 4CPUs and 10GB RAM is required to run the lab exercise.
minikube start --vm-driver=docker --kubernetes-version v1.17.0 --memory=10240 --cpus=4
If you are repeating it in EC2, you will have to launch the
ami-03e97315b2269f290
(region:us-west-2
) AMI witht2.xlarge
instance type.
Kubernetes v1.17.x is used in the lab, but you can pick a version that is compatible with SCDF.
To walkthrough SCDF's streaming, batch, analytics, and observability features, the following components will be provisioned in the single-node Kubernetes cluster.
The lab relies on the Spring Cloud Data Flow's Bitnami chart. Students will have to run the scripts/deploy-scdf.sh
script that is available at sabbyanandan/SpringOne2020.
The ami-03e97315b2269f290
(region: us-west-2
) is prepared and ready to launch with the following tools.
To get started with the labs, first, you will have to prepare the environment.
When you log in to Strigo with your access token, you will have to click the "My Lab" button on the left-nav to prepare the lab VM.
After you login to the VM, test the environment by running the
scripts/test-env.sh
script — see example below.
[ec2-user@ip ]$ cd SpringOne2020
[ec2-user@ip ]$ git pull
[ec2-user@ip SpringOne2020]$ sh scripts/test-env.sh
docker is ready
minikube is ready
helm is ready
kubectl is ready
k9s is ready
java is ready
mvn is ready
The first lab starts with setting up a single-node Kubernetes cluster using Minikube. Let's review the steps involved.
Start a single-node Minikube cluster with the minikube start --vm-driver=docker --kubernetes-version v1.17.0 --memory=10240 --cpus=4
command.
[ec2-user@ip-100 ~]$ minikube start --vm-driver=docker --kubernetes-version v1.17.0 --memory=10240 --cpus=4
* minikube v1.12.2 on Amazon 2 (xen/amd64)
* Using the docker driver based on user configuration
* Starting control plane node minikube in cluster minikube
* minikube 1.12.3 is available! Download it: https://github.com/kubernetes/minikube/releases/tag/v1.12.3
* To disable this notice, run: 'minikube config set WantUpdateNotification false'
* Creating docker container (CPUs=4, Memory=10240MB) ...
* Preparing Kubernetes v1.17.0 on Docker 19.03.8 ...
* Verifying Kubernetes components...
* Enabled addons: default-storageclass, storage-provisioner
* Done! kubectl is now configured to use "minikube"
We are using the Docker driver to install Kubernetes into an existing Docker daemon in the VM. This also simplifies the setup because we don't need to worry about extra virtualization to be enabled.
Verify that Minikube is successfully started: minikube status
.
ec2-user@ip-100 ~]$ minikube status
minikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Runningkubeconfig: Configured
Confirm that the Minikube container is running.
[ec2-user@ip-100 ~]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS
NAMES
539c113531ae gcr.io/k8s-minikube/kicbase:v0.0.11 "/usr/local/bin/entr…" 9 minutes ago Up 9 minutes 127.0.0.1:32771->22/tcp, 127.0.0.1:32770->2376/tcp, 127.0.
0.1:32769->5000/tcp, 127.0.0.1:32768->8443/tcp minikube
This command roughly takes about 2-3mins to complete. If for any reason, you notice startup errors, likely, the Docker daemon is still starting in the VM, so re-run the same start command once again. The cluster will start eventually.
It is now time to deploy SCDF! To get started, you will run the scripts/deploy-scdf.sh
script.
[ec2-user@ip-172-31-19-111 ~]$ sh SpringOne2020/scripts/deploy-scdf.sh
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
Update Complete. ⎈ Happy Helming!⎈
Error from server (NotFound): namespaces "monitoring" not found
A namespace called monitoring for prometheus should exist, creating it
A namespace called monitoring exists in the cluster
Error: release: not found
Install bitnami/prometheus-operator prometheus_release_name=prom prometheus_namespace=monitoring
....
....
....
This command will take ~5-6mins to finish. Behind the scenes, the script uses Bitnami's Prometheus and Grafana operator to provision the monitoring stack. With that foundation up and running, the script runs SCDF's Bitnami chart to deploy the following.
While all this is starting, open a terminal session in a new tab and run the k9s
command to verify the current deployment status.
[ec2-user@ip-172-31-19-111 ~]$ k9s
When the script completes successfully, you will see the following output.
[ec2-user@ip-172-31-19-111 ~]$ sh SpringOne2020/scripts/deploy-scdf.sh
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "bitnami" chart repository
...
...
...
### Stack succesfully deployed ###
Connect to Data Flow
$ helm status scdf
Grafana password
$ kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
Forward grafana
$ kubectl port-forward -n monitoring svc/graf-grafana 3000:3000
Likewise, you can verify what version of the Spring Cloud Data Flow's Bitnami helm chart is currently in use.
[ec2-user@ip-100 SpringOne2020]$ helm list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
scdf default 1 2020-08-20 21:19:45.728756477 +0000 UTC deployed spring-cloud-dataflow-0.6.1 2.6.0
Before we start the next lab, though, let's verify that SCDF is up and running. We will apply port-forwarding rules for the following pods so that we can test the newly deployed components.
shift+f
on the scdf-spring-cloud-dataflow-server-***
pod to open the port-forward window in k9s
; change the "Address" to 0.0.0.0
.shift+f
on the graf-grafana-b6bf96c9c-***
pod to open the port-forward window in k9s
; change the "Address" to 0.0.0.0
.shift+f
on the prometheus-prom-prometheus-operator-prometheus-**
pod to open the port-forward window in k9s
; change the "Address" to 0.0.0.0
.Example:
Now that the stack is ready and the port-forwarding rules are active, we can access SCDF's dashboard by clicking the "SCDF Dashboard" tab in the Strigo platform. If the page doesn't load, hit the "refresh page" button inside the tab.
Alternatively, you can review SCDF's Shell access. To do that, switch to the terminal tab, and run the following.
[ec2-user@ip-172-31-19-111 ~]$ cd scdf-shell
[ec2-user@ip-172-31-19-111 scdf-shell]$ java -jar spring-cloud-dataflow-shell-2.6.0.jar --dataflow.uri=http://localhost:8080
This command might take a few seconds to start. Hang tight. You will see the
dataflow:>
prompt.
Congrats! You have completed lab #1.
This lab will review how to build and deploy event-streaming applications using Spring Cloud Data Flow. Let's begin with reviewing the applications we will use in this lab.
To open VS Code, from the terminal window, you will have to start the code-server
process inside the Strigo lab VM.
[ec2-user@ip-172-31-19-111 ]$ cd code
[ec2-user@ip-172-31-19-111 code]$ bin/code-server &
[1] 103335
[ec2-user@ip-172-31-19-111 code]$ info Using config file ~/.config/code-server/config.yaml
info Using user-data-dir ~/.local/share/code-server
info code-server 3.4.1 48f7c2724827e526eeaa6c2c151c520f48a61259
info HTTP server listening on http://0.0.0.0:1111
info - No authentication
info - Not serving HTTPS
The
code-server
process takes a ~5-10 seconds to start. You can verify whether the port1111
is running withsudo lsof -i -P -n | grep 1111
command in the terminal.
When the code-server
process is running, click the "IDE" tab in Strigo to open VS Code. Go to home/ec2-user/SpringOne2020
folder to review the applications we will be using for the labs.
If the window doesn't load, please click the "refresh page" button inside the "IDE" tab.
Now that we have had a look at the applications, let's build them.
[ec2-user@ip-172-31-19-111 ~]$ cd SpringOne2020
[ec2-user@ip-172-31-19-111 SpringOne2020]$ git pull
Already up to date.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ ls -ltr
total 28
-rw-rw-r-- 1 ec2-user ec2-user 6608 Aug 12 20:17 mvnw.cmd
-rwxrwxr-x 1 ec2-user ec2-user 10070 Aug 12 20:17 mvnw
drwxrwxr-x 2 ec2-user ec2-user 47 Aug 14 20:16 scripts
-rw-rw-r-- 1 ec2-user ec2-user 3520 Aug 17 16:18 README.md
-rw-rw-r-- 1 ec2-user ec2-user 2609 Aug 17 19:02 pom.xml
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 trucks
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 brake-temperature
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:06 brake-logs
drwxrwxr-x 5 ec2-user ec2-user 104 Aug 17 19:07 thumbinator
The build attempts to create container images for these applications using the jib
plugin.
To configure your local environment to re-use the Docker daemon running inside the Minikube instance, we will run the eval $(minikube docker-env)
before building the code and generating the docker images.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ eval $(minikube docker-env)
Run the Maven build and generate Docker images using mvn clean install com.google.cloud.tools:jib-maven-plugin:dockerBuild -DskipTests
command.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ mvn clean install com.google.cloud.tools:jib-maven-plugin:dockerBuild -DskipTests
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO]
[INFO] labs [pom]
[INFO] trucks [jar]
[INFO] brake-temperature [jar]
[INFO] brake-logs [jar]
[INFO] thumbinator [jar]
...
...
[INFO] Container entrypoint set to [java, -cp, /app/resources:/app/classes:/app/libs/*, com.springone.trucks.TrucksApplication]
[INFO]
[INFO] Built image to Docker daemon as dev.local/trucks, dev.local/trucks:0.0.1-SNAPSHOT
[INFO] Executing tasks:
[INFO] [==============================] 100.0% complete
...
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for labs 0.0.1-SNAPSHOT:
[INFO]
[INFO] labs ............................................... SUCCESS [ 6.388 s]
[INFO] trucks ............................................. SUCCESS [ 42.855 s]
[INFO] brake-temperature .................................. SUCCESS [ 10.615 s]
[INFO] brake-logs ......................................... SUCCESS [ 4.851 s]
[INFO] thumbinator ........................................ SUCCESS [ 5.697 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 01:12 min
[INFO] Finished at: 2020-08-19T18:36:29Z
[INFO] ------------------------------------------------------------------------
Verify application images in the local registry under the dev.local
repository.
[ec2-user@ip-172-31-19-111 SpringOne2020]$ docker images | grep dev.local
dev.local/thumbinator 0.0.1-SNAPSHOT a2f62650b367 50 years ago 233MB
dev.local/thumbinator latest a2f62650b367 50 years ago 233MB
dev.local/brake-logs 0.0.1-SNAPSHOT e914bf6237ab 50 years ago 250MB
dev.local/brake-logs latest e914bf6237ab 50 years ago 250MB
dev.local/trucks 0.0.1-SNAPSHOT f1da7c4eb7aa 50 years ago 250MB
dev.local/trucks latest f1da7c4eb7aa 50 years ago 250MB
dev.local/brake-temperature 0.0.1-SNAPSHOT 61e84d293eb9 50 years ago 264MB
dev.local/brake-temperature latest 61e84d293eb9 50 years ago 264MB
If you haven't already prepared the environment, please review Lab-1-Prepare-Environment.
Secondly, this lab assumes that you have locally built the applications and that the container images are available in the Docker daemon running inside the single-node Minikube cluster.
Use-case: Imagine there are 100s of freight-trucks on the road and that you're interested in finding out the fleet's performance in real-time. To narrow it down further, imagine if you want to understand the truck's peak performance, given the current load it is carrying. A vital factor to consider is the brake condition, which would have significant wear and tear depending on the freight. It could even be dangerous if it goes unnoticed.
Given that background, we will deploy a streaming data pipeline in SCDF with three applications.
trucks
— generates truck data in a random intervalbrake-temperture
— computes moving average of a truck's brake temperature in 10s intervalbrake-logs
— logs the output in real-timeApplications rely-on and use Spring Cloud Stream's Apache Kafka binder implementation. However, the real-time computations inside the
brake-temperature
application uses the Spring Cloud Stream's Kafka Streams binder.
The source code for trucks
, brake-temperature
, and brake-logs
are under the SpringOne2020
directory. You can open the code in IDE as described at Explore-Application-Code section.
Given the lab's time constraints, we will attempt only to build and deploy the applications instead of extending or customizing the behavior. If you manage to run the lab quickly, feel free to crack at any customizations, and redeploy as you find appropriate.
Let's open SCDF and register the three new applications that are already available in the Docker registry.
The coordinates for the 3 applications are:
source.trucks=docker:dev.local/trucks:0.0.1-SNAPSHOT
processor.brake-temperature=docker:dev.local/brake-temperature:0.0.1-SNAPSHOT
sink.brake-log=docker:dev.local/brake-logs:0.0.1-SNAPSHOT
If in case you aren't able to locally build the applications, you can register the applications from Docker Hub.
source.trucks=docker:sabby/trucks:0.0.1-SNAPSHOT processor.brake-temperature=docker:sabby/brake-temperature:0.0.1-SNAPSHOT sink.brake-log=docker:sabby/brake-logs:0.0.1-SNAPSHOT
Navigate to "Application(s)" section and select "Bulk import application" as the option. On the right-frame, copy+paste the three application coordinates to import the applications to SCDF's application registry.
Now that we have the applications registered, it is time to create and deploy a stream.
Click the "Streams" link from the left-navigation and open the "Create Stream(s)" page. Copy the following streaming DSL command into the DSL text area in the dashboard. Alternatively, you can drag +drop the apps in the canvas and interactively configure the desired properties.
truck-performance = trucks --spring.cloud.stream.function.bindings.generateTruck-out-0=output | brake-temperature --spring.cloud.stream.function.bindings.processBrakeTemperature-in-0=input --spring.cloud.stream.function.bindings.processBrakeTemperature-out-0=output | brake-log --spring.cloud.stream.function.bindings.log-in-0=input
In case you're wondering about
--spring.cloud.stream.function...
in-line properties, and what they mean, you can learn more about Spring Cloud Stream's function bindings and the naming conventions from the reference guide.
Click "Create Stream(s)" button and deploy the truck-performance
stream from the list page.
SCDF is now in the process of programmatically creating the Kubernetes deployment manifests for the three applications and deploying them onto Kubernetes. You can switch to the "Terminal" tab and review the new pods from the k9s
output.
The applications take ~1-2 mins to start and for the liveness/readiness probes to be running correctly.
Let's review the results by tailing the logs of truck-performance-brake-log-v1-***
application. Alternatively, you can open the logs from SCDF's dashboard, too.
You will notice the computed moving-average in the logs, in real-time. See below an example that includes the average brake temperature for the truck with the ID="JH4KA8170MC002642".
{
"average": 14.967257499694824,
"count": 3,
"end": 1597875620000,
"id": "JH4KA8170MC002642",
"start": 1597875610000,
"totalValue": 44.90177
}
Now that the real-time streaming data pipeline is running in Kubernetes, we will next review the steps to monitor the streaming applications using Prometheus and Grafana.
All the heavy-lifting of configuring Prometheus, Grafana, and preparing the applications for metrics scrapping is already handled automatically by SCDF.
Click the Grafana Dashboard
tab to navigate to the Grafana GUI. For the login, you need to find the admin password from the Kubernetes secrets. To retrieve and decode the password, run the following in the terminal window.
kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
Example:
[ec2-user@ip-172-31-30-179 ~]$ kubectl -n monitoring get secret graf-grafana-admin -o jsonpath={.data.GF_SECURITY_ADMIN_PASSWORD} | base64 --decode
MaHOlr9Vxv
In my case, I am using admin/MaHOlr9Vxv
as the credentials to log-in to Grafana dashboard.
The user is
admin
, but the password would be a randomly generated token for every student. You will have to run the abovekubectl
command to retrieve it. The password will be different for every student.
The SCDF-specific metrics dashboards are preloaded in the Grafanaa service running in your Kubernetes cluster. First time when you login, you will have to go to the "Manage" section to find the dashboards.
Click the "Applications" dashboard to view the real-time performance of the streaming applications.
That's it! You have completed lab #2.
If you haven't already prepared the environment, please review Lab-1-Prepare-Environment.
This lab assumes that you have already built the applications and that the container images are available in the Docker daemon running inside the single-node Minikube cluster.
This lab aims to highlight how short-lived and ephemeral-style batch applications can be orchestrated, scheduled, and monitored using Spring Cloud Data Flow.
To demonstrate the features, we will build a Task application (thumbinator
) that includes two batch-jobs internally.
The first job includes 3-steps to simulate extract, transform, and the loading of data — an ETL job.
@Bean
public Job extractImage() {
// extract an image
}
@Bean
public Job transformImage() {
// create a thumbnail for the image
}
@Bean
public Job loadImage() {
// load the thumbnail to a different directory
}
The second job is going to query and print the result.
@Bean
public Job statusImage() {
// print the size of the original and the thumbnail images
}
To keep it simple and to repeat the lab easily, this workshop includes multiple jobs inside the same application. However, you can choose to create new Task applications for each of the jobs instead, which would allow you to evolve them with bug-fixes and improvements independently, so you can continuously deliver them.
In your Strigo VM, you can follow the Explore-Application-Code steps to review the code of thumbinator
application.
Click the "SCDF Dashboard" tab in Strigo to launch SCDF's dashboard.
First, you will have to register the locally built application in SCDF's application registry.
The Docker coordinate for the thumbinator
applications is:
task.thumbinator=docker:dev.local/thumbinator:0.0.1-SNAPSHOT
If in case you aren't able to locally build the application, you can register the application from Docker Hub.
task.thumbinator=docker:sabby/thumbinator:0.0.1-SNAPSHOT
Open "Tasks" -> "Create Task(s)"
Give a name to the task definition and create the task.
Let's launch the task. Click the "play" button from the task list page to manually launch the task with the default parameters and arguments.
Look for a newly launched task pod in the k9s
terminal.
Verify the results by tailing the logs of the task pod running in Kubernetes. Switch to k9s
terminal and look for the "pod" with the name that you assigned to the task. You will find the results from both the jobs in the log.
In SCDF, there's an out-of-the-box option to schedule task/batch-jobs to launch at a recurring cadence. To do that, SCDF builds on the primitives of cronjob
spec in Kubernetes.
From the task list page, select the dropdown to choose "Schedule Task".
Give your schedule a name and schedule the task to launch every minute with the */1 * * * *
cron-expression.
Switch back to k9s
terminal and verify the scheduled creation of pods automatically launching once every minute.
Alternatively, you can also use the kubectl
command to query the cronjob
and job
resources that are running in the Kubernetes cluster.
[ec2-user@ip-100 ~]$ kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
sch-thumbnails */1 * * * * False 1 54s 4m27s
[ec2-user@ip-100 ~]$ kubectl get job
NAME COMPLETIONS DURATION AGE
sch-thumbnails-1597961580 1/1 39s 3m57s
sch-thumbnails-1597961640 1/1 49s 2m57s
sch-thumbnails-1597961700 1/1 40s 117s
sch-thumbnails-1597961760 0/1 57s 57s
Let's also verify the task execution details from SCDF's dashboard.
Similar to the monitoring steps discussed in the event-streaming lab, the task monitoring with Prometheus and Grafana is preloaded, and the metrics dashboard is prepared to monitor the tasks running in the Kubernetes cluster.
Go to "Grafana Dashboard" -> "Tasks".
Kudos to you for completing lab-3!
event streaming
batch processing
stateful streams
predictive analytics
cloud-native
microservices