Kubeflow MPI
Name: | flytekitplugins-kfmpi |
Version: | 0.0.0+develop |
Author: | admin@flyte.org |
Provides: |
flytekitplugins.kfmpi |
Requires: |
flytekit>=1.6.1,<2.0.0 |
Python: | >=3.9 |
License: | apache2 |
Source Code: | https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-kf-mpi |
- Intended Audience :: Science/Research
- Intended Audience :: Developers
- License :: OSI Approved :: Apache Software License
- Programming Language :: Python :: 3.9
- Programming Language :: Python :: 3.10
- Topic :: Scientific/Engineering
- Topic :: Scientific/Engineering :: Artificial Intelligence
- Topic :: Software Development
- Topic :: Software Development :: Libraries
- Topic :: Software Development :: Libraries :: Python Modules
This plugin uses the Kubeflow MPI Operator and provides an extremely simplified interface for executing distributed training.
To install the plugin, run the following command:
pip install flytekitplugins-kfmpi
Code Example
MPI usage:
@task(
task_config=MPIJob(
launcher=Launcher(
replicas=1,
),
worker=Worker(
replicas=5,
requests=Resources(cpu="2", mem="2Gi"),
limits=Resources(cpu="4", mem="2Gi"),
),
slots=2,
),
cache=True,
requests=Resources(cpu="1"),
cache_version="1",
)
def my_mpi_task(x: int, y: str) -> int:
return x
Horovod Usage: You can override the command of a replica group by:
@task(
task_config=HorovodJob(
launcher=Launcher(
replicas=1,
requests=Resources(cpu="1"),
limits=Resources(cpu="2"),
),
worker=Worker(
replicas=1,
command=["/usr/sbin/sshd", "-De", "-f", "/home/jobuser/.sshd_config"],
restart_policy=RestartPolicy.NEVER,
),
slots=2,
verbose=False,
log_level="INFO",
),
)
def my_horovod_task():
...
Upgrade MPI Plugin from V0 to V1
MPI plugin is now updated from v0 to v1 to enable more configuration options. To migrate from v0 to v1, change the following:
- Update flytepropeller to v1.6.0
- Update flytekit version to v1.6.2
- Update your code from:
task_config=MPIJob(num_workers=10),
to
task_config=MPIJob(worker=Worker(replicas=10)),