## Linux Foundation AI&Data Flyte.org
The Workflow Automation Platform for Complex, Mission-Critical Data and ML Processes at Scale
### Haytham Abuelfutuh
### haytham@union.ai
---
## Why
* Flyte is used in production at Lyft, Spotify, Freenome and many others.
* Flyte is Battle-Tested and Truly Open-Source
* Intuitive Multi-Lang SDK (Python, Java, Custom DSL)
* Start locally, Scale Seamlessly
* Automated Lineage Tracking and Caching
* Controlled Extensibility
* Facilitate Collaboration
* Build Self-Service Data Platform
---
## Production Grade
Flyte promotes good engineering practices (e.g. Code tracking, containerization, immutability) to provide robust and reproducible pipelines.
---
## Intuitive SDK
Regular Python with minimal overhead
```python
@task(limits=Resources(cpu="2", mem="150Mi"))
def pay_multiplier(df: pandas.DataFrame, scalar: int) -> pandas.DataFrame:
df["col"] = 2 * df["col"]
return df
@task(task_config=Spark(
spark_conf={"spark.driver.memory": "1000M"}
), retries=2)
def total_spend(df: pyspark.DataFrame) -> int:
return df.agg(F.sum("col")).collect()[0][0]
@workflow
def calculate_spend(emp_df: pandas.DataFrame) -> int:
return total_spend(df=pay_multiplier(df=emp_df, scalar=2))
LaunchPlan.get_or_create(name="...",
workflow=calculate_spend,
schedule=FixedRate(duration=timedelta(minutes=10)),
notifications=[
Email(
phases=[WorkflowExecutionPhase.FAILED],
recipients_email=[...])]),
)
```
---
## Start Locally, Scale Seamlessly
* Run fully in pure python environments.
* Package and deploy to sandbox environments when ready.
* Interact with remote environments from python, jupyter notebooks, cli and open APIs.
---
## Lineage & Caching
* Flyte supports caching idempotent executions for faster reruns.
* Locally and remotely.
* Automatic tracking of producers and consumers of datasets.
---
## Extensibility
* Flyte is extensible in every component - the programming SDK, backend & UI
* FlyteKit type transformers, Pure SDK Plugin, Data Persistence Plugins
* Backend plugins offer powerful, stateful plugins with unified APIs across languages. Run as services!
* Easily develop DSLs on top of Core Protobuf, Python and Java SDKs.
---
## Collaboration
* Flyte entities (Tasks, Workflows) can be shared across projects/teams for easier, centeralized & maintainable development.
---
## Infrastructure-friendly
* K8s-native & cloud agnostic.
* Out of box system and user-level metrics
* Integrates with native logging systems
* Multiple deployment options; helm, kustomize, terraform, runX opta, manual guides for various cloud providers.
---
## Recap
Flyte aims at powering Data Scientists and ML Engineers to write production grade pipelines while maintaining and easy to use prototype environment.
Community: slack.flyte.org
Docs: docs.flyte.org
Me: haytham@union.ai
{"metaMigratedAt":"2023-06-16T19:36:24.493Z","metaMigratedFrom":"YAML","title":"Flyte in a nutshell","breaks":"true","slideOptions":"{\"theme\":\"simple\",\"transition\":\"fade\",\"progress\":true,\"loop\":true,\"previewLinks\":true}","contributors":"[{\"id\":\"64b70499-d1a9-4244-bf32-24bc2e5398c5\",\"add\":3237,\"del\":56}]"}