Spark

Name:flytekitplugins-spark
Version:0.0.0+develop
Author:admin@flyte.org
Provides: flytekitplugins.spark
Requires: flytekit>=1.15.1
pyspark>=3.0.0
aiohttp
flyteidl>=1.11.0b1
pandas
Python:>=3.9
License:apache2
Source Code: https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-spark
  • Intended Audience :: Science/Research
  • Intended Audience :: Developers
  • License :: OSI Approved :: Apache Software License
  • Programming Language :: Python :: 3.9
  • Programming Language :: Python :: 3.10
  • Programming Language :: Python :: 3.11
  • Topic :: Scientific/Engineering
  • Topic :: Scientific/Engineering :: Artificial Intelligence
  • Topic :: Software Development
  • Topic :: Software Development :: Libraries
  • Topic :: Software Development :: Libraries :: Python Modules

Flyte can execute Spark jobs natively on a Kubernetes Cluster, which manages a virtual cluster’s lifecycle, spin-up, and tear down. It leverages the open-sourced Spark On K8s Operator and can be enabled without signing up for any service. This is like running a transient spark cluster — a type of cluster spun up for a specific Spark job and torn down after completion.

To install the plugin, run the following command:

pip install flytekitplugins-spark

To configure Spark in the Flyte deployment’s backend, follow Step 1, 2.

All examples showcasing execution of Spark jobs using the plugin can be found in the documentation.