Submarine Roadmap

Submarine SDK Overhaul

Local cache (0.8.0) [P1]

Download S3/hdfs data to local file systems before training

Dataset API (0.9.0) [P1]

We can leverage fsspec
















def exists(self, path: str) -> bool:
    ...
    
def get_random_remote_path() -> str:
    ...
        
def download(self, remote_path: str, local_path: str, recursive bool = false):
    fs = fsspec.filesystem(remote_path)
    fs.get(remote_path, local_path, recursive=recursive)

def upload(self, local_path: str, remote_path: str):
    fs = fsspec.filesystem(remote_path)
    fs.get(remote_path, local_path, recursive=recursive)

submarine.data.upload("/tmp/file", "s3://bucket/key")

Training API (TBD) [0.9.0] P2

There are some benefits to implementing submarine base trial class
1. Automatically save model when training is done.
2. Automatically output metrics, parameter (learning rate, batching size, model metrics)
3. Automatically load checkpoint if it exists
4. Get TF, Pytorch config internally



















from submarine import keras

class CIFARTrial(keras.TFKerasTrial):
    def build_model(self):
        model = build_model(
            layer1_dropout=self.context.get_hparam("layer1_dropout"),
            layer2_dropout=self.context.get_hparam("layer2_dropout"),
            layer3_dropout=self.context.get_hparam("layer3_dropout"),
        )
        
        ...
        
        return model
    
    def training_data_loader(self) -> union[tf.dataset, pd.dataframe]:
        ...
        
    def validation_data_loader(self) -> union[tf.dataset, pd.dataframe]:
        ...

Reference

Model serving

Model quality monitoring [0.8.0] P3
A/B testing [0.9.0] P3
Serverless (auto scale model endpoints) [0.9.0] P3

Submarine Experiment

XGBoost training [0.8.0] P0
Model checkpoint (Recover experiment) [0.8.0] P0

Submarine operator

Replace trafik with Istio
Submarine-operator v3 [0.8.0] P0

Submarine workbench

Angular -> React [0.8.0] P2
Web socket [0.8.0] P2

Submarine Cli

Use cli to start Submarine [0.9.0] P3
Start Submarine in k3s [0.9.0] P3

Environment Overhaul [0.8.0] P0

Currently creating a environment require users to set docker image and conda yaml file. However, Users can't set arbitrary image, the image must be apache/submarine:jupyter-notebook

From the users perspective, they only care about.

python version
python package (Tensorflow, pandas)
CPU, GPU
Cuda version

Solution:

On the workbench, Users will set different kinds of the above config to create an environment.
Provide some environments (images) that cover most users' needs.
- python-3.8-tensorflow-cpu, python-3.9-pytorch-gpu

Link: https://github.com/apache/submarine/issues/892 https://github.com/apache/submarine/issues/895
Issue: https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1213

Notebook [0.8.0] P1

Stop notebook server if idle for a long time

Link: https://github.com/apache/submarine/issues/853
Issue: https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1167
Solution: https://github.com/shangyuantech/submarine/commit/35db197e20ac7604bed224f2c3d066d46115c824

Refactor

Remove duplicate code in experiment [0.8.0] P4
Remove legacy code [0.8.0] P4

UI/UX [0.9.0] P4

(Workbench )Remove Data dict, Department, Work space, Interpreter?
Running experiment without building docker image. The entire flow will be like,
1. Create an environment or use predefined environment
2. Create a notebook, start developing the model
3. Create an experiment
  - Choose a notebook
  - Mount the code to the experiment pods

Metrics

Experiment CPU/ Memory Usage [0.8.0] P0
Model CPU, memory, disk, and network I/O [0.9.0] P0

Example

Model serving [0.8.0] P0
Tracking example [0.8.0] P0

Workflow Orchestrator [0.8.0] P4

Airflow operator for Submarine experiment (Creating a custom Operator)

Issue: https://issues.apache.org/jira/projects/SUBMARINE/issues/SUBMARINE-1264

Docs [0.8.0] P3

Auto generate pysubmarine api docs from comments (Refer to https://realpython.com/documenting-python-code/)

Bug Bash

Fix E2E flaky test [0.8.0] P2
Improve test coverage (sonarcloud) [0.8.0] P2

tags: `Submarine` `Roadmap`

Thinking Chen

2022/05/10 13:44:35

For the Python HDFS api, we may need to consider supporting Kerberos. Because the webhdfs method is adopted instead of the native HDFS method, it will be slower than the native HDFS interface when processing upload and download (Edited)

2022/05/10 13:47:47

I've a commit about this feature, I think I will rebase it and create a new PR in the next month. (Edited)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Submarine Roadmap

Submarine SDK Overhaul

Local cache (0.8.0) [P1]

Dataset API (0.9.0) [P1]

Training API (TBD) [0.9.0] P2

Reference

Model serving

Submarine Experiment

Submarine operator

Submarine workbench

Submarine Cli

Environment Overhaul [0.8.0] P0

Notebook [0.8.0] P1

Refactor

UI/UX [0.9.0] P4

Metrics

Example

Workflow Orchestrator [0.8.0] P4

Docs [0.8.0] P3

Bug Bash

tags: Submarine Roadmap

tags: `Submarine` `Roadmap`