RUN pip install apache-airflow
RUN pip install google-cloud-orchestration-airflow==1.9.1
RUN pip install apache-airflow-providers-google
RUN pip install jsonpickle
pip install git+https://github.com/flyteorg/flytekit.git@873c60e505bb3492b5e3fa09840836b4d6775f4b#subdirectory=plugins/flytekit-airflow
Use default credentials to connec to Google Cloud
export AIRFLOW_CONN_GOOGLE_CLOUD_DEFAULT='google-cloud-platform://'
export GOOGLE_APPLICATION_CREDENTIALS=/Users/kevin/.config/gcloud/application_default_credentials.json
from datetime import timedelta
from airflow.utils import trigger_rule
from flytekit import workflow
from airflow.providers.google.cloud.operators.dataproc import DataprocCreateClusterOperator, DataprocDeleteClusterOperator, DataprocSubmitSparkJobOperator
@workflow
def wf():
create_cluster = DataprocCreateClusterOperator(
task_id="create_dataproc_cluster",
image_version="2.0.27-debian10",
storage_bucket="opta-gcp-dogfood-gcp",
master_machine_type="n1-highmem-32",
master_disk_size=1024,
num_workers=2,
worker_machine_type="n1-highmem-64",
worker_disk_size=1024,
region="us-west1",
cluster_name="flyte-dataproc",
project_id="dogfood-gcp-dataplane",
)
run_spark = DataprocSubmitSparkJobOperator(
job_name="spark_pi",
task_id="run_spark",
dataproc_jars=["file:///usr/lib/spark/examples/jars/spark-examples.jar"],
main_class="org.apache.spark.examples.JavaWordCount",
arguments=["gs://opta-gcp-dogfood-gcp/spark/file.txt"],
cluster_name="flyte-dataproc",
region="us-west1",
project_id="dogfood-gcp-dataplane",
)
delete_cluster = DataprocDeleteClusterOperator(
task_id="create_dataproc_cluster",
project_id="dogfood-gcp-dataplane",
cluster_name="flyte-dataproc",
region="us-west1",
retries=3,
retry_delay=timedelta(minutes=5),
email_on_failure=True,
trigger_rule=trigger_rule.TriggerRule.ALL_DONE
)
create_cluster >> run_spark >> delete_cluster
if __name__ == '__main__':
wf()
from airflow.sensors.filesystem import FileSensor
from flytekit import task, workflow, ImageSpec
airflow_plugin = "git+https://github.com/flyteorg/flytekit.git@487438ab59147879eded897674593a1eaee1c78b#subdirectory=plugins/flytekit-airflow"
image_spec = ImageSpec(base_image="pingsutw/flytekit:v1", packages=["apache-airflow", airflow_plugin], apt_packages=["git"], registry="pingsutw")
@task(container_image=image_spec)
def t1():
print("flyte")
@workflow
def wf():
sensor = FileSensor(task_id="id", filepath="/tmp/1234")
sensor >> t1()
if __name__ == '__main__':
wf()
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
Apr 17, 2024Motivation Currently it is hard to implement backend plugins, especially for data-scientists & MLE’s who do not have working knowledge of Golang. Also, performance requirements, maintenance and development is cumbersome. The document here proposes a path to make it possible to write plugins rapidly, while decoupling them from the core flytepropeller engine. Goals Plugins should be easy to author - no need of code generation, using tools that MLEs and Data Scientists are not accustomed to using. Most important plugins for Flyte today are plugins that communicate with external services. It should be possible to test these plugins independently and also deploy them privately. It should be possible for users to use backend plugins for local development, especially in flytekit and unionML
Dec 28, 2023Issues Discussion Motivation: Why do you think this is important? Currently flyteadmin notifications are delivered using the PagerDuty, Github and Slack email APIs. On AWS deployments FlyteAdmin uses SES to trigger emails, for all others the only alternative email implementation is SendGrid integration. Setting up SES or SendGrid can be somewhat complicated. Furthermore, asking your Flyte users to configure the aforementioned services with email integrations adds even more overhead. It would be simpler as an alternative to provide webhook integration for notification so that users only have to configure existing API keys for PagerDuty/Github/Slack. Flyte currently only allows sending notifications by email and requires users to explicitly define notification rules in their launchplans. FlyteAdmin Webhook
Jun 5, 2023Save Model to submarine model registry by adding submarine.save_model(model, "tensorflow") in your script import tensorflow_datasets as tfds import tensorflow as tf from tensorflow.keras import layers, models import submarine def make_datasets_unbatched(): BUFFER_SIZE = 10000
May 19, 2022or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up