# PMLE MLOPS : TFX vs KFP VS Vertex AI pipes. (Work in progress)

sections to cover from PMLE : TBD mostly complement to MLOPS
### Following table depicts the different MLOPS ways to create a pipelie, look on orchestrators and runners difference and not overwelm on overlaping functionalities:
==**NOTE**== move mouse ove the table to the right to get the whole columns!<mark/>
| Framework | Description | Functionality | Runner | Orchestrator | Use Case | Mode |
|-------------------------|-----------------------------------------------------------------------------|----------------------------------------------------------------|--------------------------|----------------------------|-------------------------------------------------------------|-----------------------|
| **TensorFlow Extended** | End-to-end platform for deploying production ML pipelines | Data ingestion, validation, transformation, training, serving | TFX | Airflow, Kubeflow, Beam | Production ML pipelines, TFX-specific | Batch, Streaming |
| **Vertex AI Pipelines** | Managed service for ML pipelines on Google Cloud | Data preprocessing, training, evaluation, deployment | Kubeflow Pipelines | Google Cloud Vertex AI | Scalable and managed ML pipelines on Google Cloud | Batch, Real-time |
| **Kubeflow** | Open-source ML toolkit for Kubernetes | Model training, serving, pipelines, Jupyter notebooks | Argo Workflows | Kubernetes | End-to-end ML workflows on Kubernetes | Batch, Streaming |
| **Apache Airflow** | Open-source platform to programmatically author, schedule, and monitor workflows | Task scheduling, execution, and monitoring | CeleryExecutor, LocalExecutor | Airflow DAGs | ETL, Data Engineering, general-purpose workflows | Batch |
| **Apache Beam** | Unified model for defining both batch and streaming data-parallel processing pipelines | Data processing, transformation, windowing, aggregation | DirectRunner, DataflowRunner | Dataflow, Flink, Spark | Unified batch and stream processing | Batch, Streaming |
From Framework perspective (Python)
TensorFlow > Libraries for AI/ML from Google platform/framework for Keras API?

TensorFLow Extended > E2E pipeline based on TensorFlow (MLOPS)
components:

libraries:

From Infrastructure perpective
Docker > the virutalization of the OS in reality is a process in a namespace that is allow to write in specific piece of virtual disk (of docker node aka VM but not the one you create when invoke an GCE instance those are more custom adn supports more than one container so ratio is not 1VM : 1Ctnr.)
Docker File > NO EXTENSION file as a manifesto of the "what" for the creation of container image
NOTE. let's say you are in colab, colab is a container image ultra custom that is runing jupyter then whe you do shell comands it feels like you are in a linux OS, it happens to be that the colab instance (container) has docker server and docker client (cli) on it, so in this way when you have the following code in a docker file:
```dockerfile=
#sample manifesto docker file
FROM python:3.9
COPY . /opt/app
WORKDIR /opt/app
RUN apt-get update && apt-get install -y python3-pip
RUN pip install --trusted-host pypi.python.org -r requirements.txt
```
then in colab cell you run this:
docker cli [cheatsheet](https://raw.githubusercontent.com/sangam14/dockercheatsheets/master/dockercheatsheet8.png)
```shell=
!docker build --tag=$REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE .
```
the client points to the server api and use the manifesto to get the container image as output
(optional)
you can run
```shell
!docker images
```
to see the artifact created in local colab.
then it is local in your server docker container registry and in order to use it by let´s say vertex or TFX or KFP you need to push it in a container registry service to be access by api connection (you server can do that but is unpractical).
```shell=
!docker push $REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY/$IMAGE
```
NOTE. just before continue the oher way to create a container is by using <mark>**_Cloud Build_**</mark> which is the _**CI/CD**_ (build, test and deploy) set of tools to compose a fully manage service you can use or not the docker file (not if you prefer do it with __Buildpacks__ such a marvelous thing) check this [link](https://cloud.google.com/build/docs/building/build-containers) for more details on how that is possible and to see samples but basically you create a yaml file manifesto to create and deploy the image aka the container in on shoot; run tests, push to registry, pull into <k8s, cloud run, so on.> .
==Cloud Run== > CaaS or Conatiner as a Service from GCP aka manage and run containersat scale (not K8s it runs on top of it it comes from knative).
==Artifact Registry== > in a nutsell whee you can store your artifacts not just images but mainly to serve images
<mark>GKE</mark> aka [K8s](http://slides.eightypercent.net/platform-platform/index.html#p1) (Kubernetes) Google made > container orchestrator it does more thatn orchestrastation of containers named pods but the idea of orchestrator connects with pipelines on MLOPS so keep that. k8s is a distributed platform software that runs in top of GCE isntances that can manage pods forming pool of resources for them. thin in control plane aka master node (GCE VM with designated role by process that serves to manage the pods) and data plane aka worker nodes(GCE VM with kubelet and other process on it) in a cluster.
So far we got a bunch of tools to create containers adn to "orchestrate" them, but you may wonder wait a minute I'm knowgeable in do fancy python programing modeling with Neural Networks, why so I care this? in short to set a base ground I prepare this chart to give a whole idea on why this bases care in distribute training.
## this is an intent of glue all to answer the question what is what and where to run what?

[MLOPS PMLE overview Donwload](https://drive.google.com/file/d/1h3d5R1km0c8aj_SGgcoPMhPhpo9MwXtr/view?usp=sharing)
==feedback is a gift!==