Skip to content

Apache Spark Kubernetes Operator

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

Apache-2.0
LICENSE
Apache-2.0
LICENSE-binary
Notifications You must be signed in to change notification settings

apache/spark-kubernetes-operator

Apache Spark K8s Operator

Artifact Hub GitHub Actions Build License Repo Size

Apache Spark™ K8s Operator is a subproject of Apache Spark and aims to extend K8s resource manager to manage Apache Spark applications via Operator Pattern.

Install Helm Chart

Apache Spark provides a Helm Chart.

$ helm repo add spark-kubernetes-operator https://apache.github.io/spark-kubernetes-operator
$ helm repo update
$ helm install spark-kubernetes-operator spark-kubernetes-operator/spark-kubernetes-operator

Building Spark K8s Operator

Spark K8s Operator is built using Gradle. To build, run:

$ ./gradlew build -x test

Running Tests

$ ./gradlew build

Build Docker Image

$ ./gradlew buildDockerImage

Install Helm Chart

$ ./gradlew spark-operator-api:relocateGeneratedCRD

$ helm install spark-kubernetes-operator --create-namespace -f build-tools/helm/spark-kubernetes-operator/values.yaml build-tools/helm/spark-kubernetes-operator/

Run Spark Pi App

$ kubectl apply -f examples/pi.yaml

$ kubectl get sparkapp
NAME   CURRENT STATE      AGE
pi     ResourceReleased   4m10s

$ kubectl delete sparkapp/pi

Run Spark Cluster

$ kubectl apply -f examples/prod-cluster-with-three-workers.yaml

$ kubectl get sparkcluster
NAME   CURRENT STATE    AGE
prod   RunningHealthy   10s

$ kubectl port-forward prod-master-0 6066 &

$ ./examples/submit-pi-to-prod.sh
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20240821181327-0000",
  "serverSparkVersion" : "4.0.0-preview2",
  "submissionId" : "driver-20240821181327-0000",
  "success" : true
}

$ curl http://localhost:6066/v1/submissions/status/driver-20240821181327-0000/
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "4.0.0-preview2",
  "submissionId" : "driver-20240821181327-0000",
  "success" : true,
  "workerHostPort" : "10.1.5.188:42099",
  "workerId" : "worker-20240821181236-10.1.5.188-42099"
}

$ kubectl delete sparkcluster prod
sparkcluster.spark.apache.org "prod" deleted

Run Spark Pi App on Apache YuniKorn scheduler

If you have not yet done so, follow YuniKorn docs to install the latest version:

$ helm repo add yunikorn https://apache.github.io/yunikorn-release

$ helm repo update

$ helm install yunikorn yunikorn/yunikorn --namespace yunikorn --version 1.6.3 --create-namespace --set embedAdmissionController=false

Submit a Spark app to YuniKorn enabled cluster:

$ kubectl apply -f examples/pi-on-yunikorn.yaml

$ kubectl describe pod pi-on-yunikorn-0-driver
...
Events:
  Type    Reason             Age   From      Message
  ----    ------             ----  ----      -------
  Normal  Scheduling         14s   yunikorn  default/pi-on-yunikorn-0-driver is queued and waiting for allocation
  Normal  Scheduled          14s   yunikorn  Successfully assigned default/pi-on-yunikorn-0-driver to node docker-desktop
  Normal  PodBindSuccessful  14s   yunikorn  Pod default/pi-on-yunikorn-0-driver is successfully bound to node docker-desktop
  Normal  TaskCompleted      6s    yunikorn  Task default/pi-on-yunikorn-0-driver is completed
  Normal  Pulled             13s   kubelet   Container image "apache/spark:4.0.0-preview2" already present on machine
  Normal  Created            13s   kubelet   Created container spark-kubernetes-driver
  Normal  Started            13s   kubelet   Started container spark-kubernetes-driver

$ kubectl delete sparkapp pi-on-yunikorn
sparkapplication.spark.apache.org "pi-on-yunikorn" deleted

Try nightly build for testing

As of now, you can try spark-kubernetes-operator nightly version in the following way.

$ helm install spark-kubernetes-operator \
https://nightlies.apache.org/spark/charts/spark-kubernetes-operator-0.2.0-SNAPSHOT.tgz

Clean Up

Check the existing Spark applications and clusters. If exists, delete them.

$ kubectl get sparkapp
No resources found in default namespace.

$ kubectl get sparkcluster
No resources found in default namespace.

Remove HelmChart and CRDs.

$ helm uninstall spark-kubernetes-operator

$ kubectl delete crd sparkapplications.spark.apache.org

$ kubectl delete crd sparkclusters.spark.apache.org

In case of nightly builds, remove the snapshot image.

$ docker rmi apache/spark-kubernetes-operator:main-snapshot

Contributing

Please review the Contribution to Spark guide for information on how to get started contributing to the project.