## Linux Foundation AI&Data Flyte.org The Workflow Automation Platform for Complex, Mission-Critical Data and ML Processes at Scale ### Haytham Abuelfutuh ### haytham@union.ai --- ## Why * Flyte is used in production at Lyft, Spotify, Freenome and many others. * Flyte is Battle-Tested and Truly Open-Source * Intuitive Multi-Lang SDK (Python, Java, Custom DSL) * Start locally, Scale Seamlessly * Automated Lineage Tracking and Caching * Controlled Extensibility * Facilitate Collaboration * Build Self-Service Data Platform --- ## Production Grade Flyte promotes good engineering practices (e.g. Code tracking, containerization, immutability) to provide robust and reproducible pipelines. --- ## Intuitive SDK Regular Python with minimal overhead ```python @task(limits=Resources(cpu="2", mem="150Mi")) def pay_multiplier(df: pandas.DataFrame, scalar: int) -> pandas.DataFrame: df["col"] = 2 * df["col"] return df @task(task_config=Spark( spark_conf={"spark.driver.memory": "1000M"} ), retries=2) def total_spend(df: pyspark.DataFrame) -> int: return df.agg(F.sum("col")).collect()[0][0] @workflow def calculate_spend(emp_df: pandas.DataFrame) -> int: return total_spend(df=pay_multiplier(df=emp_df, scalar=2)) LaunchPlan.get_or_create(name="...", workflow=calculate_spend, schedule=FixedRate(duration=timedelta(minutes=10)), notifications=[ Email( phases=[WorkflowExecutionPhase.FAILED], recipients_email=[...])]), ) ``` --- ## Start Locally, Scale Seamlessly * Run fully in pure python environments. * Package and deploy to sandbox environments when ready. * Interact with remote environments from python, jupyter notebooks, cli and open APIs. --- ## Lineage & Caching * Flyte supports caching idempotent executions for faster reruns. * Locally and remotely. * Automatic tracking of producers and consumers of datasets. --- ## Extensibility * Flyte is extensible in every component - the programming SDK, backend & UI * FlyteKit type transformers, Pure SDK Plugin, Data Persistence Plugins * Backend plugins offer powerful, stateful plugins with unified APIs across languages. Run as services! * Easily develop DSLs on top of Core Protobuf, Python and Java SDKs. --- ## Collaboration * Flyte entities (Tasks, Workflows) can be shared across projects/teams for easier, centeralized & maintainable development. --- ## Infrastructure-friendly * K8s-native & cloud agnostic. * Out of box system and user-level metrics * Integrates with native logging systems * Multiple deployment options; helm, kustomize, terraform, runX opta, manual guides for various cloud providers. --- ## Recap Flyte aims at powering Data Scientists and ML Engineers to write production grade pipelines while maintaining and easy to use prototype environment. Community: slack.flyte.org Docs: docs.flyte.org Me: haytham@union.ai
{"metaMigratedAt":"2023-06-16T19:36:24.493Z","metaMigratedFrom":"YAML","title":"Flyte in a nutshell","breaks":true,"slideOptions":"{\"theme\":\"simple\",\"transition\":\"fade\",\"progress\":true,\"loop\":true,\"previewLinks\":true}","contributors":"[{\"id\":\"64b70499-d1a9-4244-bf32-24bc2e5398c5\",\"add\":3237,\"del\":56}]"}
    454 views