# Tensorleap Guide
Tensorleap’s platform offers unique tools for debugging, observability, and explainability within the development of deep-learning models.
In order to make those deep analyses, our platform tracks each sample, feature, layer and collects a large number of indicators.
For integration to begin, the model needs to be exported, along with a dataset, and a script to read the dataset.
The purpose of this guide is to describe how to convert a model defined in PyTorch/Tensorflow into a Tensorleap-compatible file format, and the script that is used to read and preprocess data from your dataset.
## Model Import
A deep learning model consists of multiple components:
* Model architecture - the layers and their connections
* Weights values - the state of the model after training
* A set of loss functions, an optimizer and a set of metrics
Tensorflow or Pytorch models can be saved to a serialization format for trained models, which stores the model's weights and details on its architecture and how it was trained.
The saved model can then be used in Tensorleap independently of the code that created it.
Tensorleap reads this serialization file, loads it, and displays it in the platform.
For your convenience, below are a few references to code one-liners.
### Tensorflow 2 (Keras) - Save Model
The following command generates a folder with the serialized model data, that contains the model architecture and weights.
```python=
model.save('path/to/location')
```
### Tensorflow 2 (Keras) - H5 format
Keras also supports saving the model's architecture and weights in a single HDF5 file. This is essentially a light-weight alternative to the “Save Model” option described above.
```python=
model.save("my_h5_model.h5")
```
More info can be found here: https://www.tensorflow.org/guide/keras/save_and_serialize
### PyTorch - ONNX Format
Tensorleap supports PyTorch and requires the model to first be exported to an .onnx file format in order to read it.
```python=
dummy_input = torch.randn(10, 3, 224, 224, device="cuda")
model = torchvision.models.alexnet(pretrained=True).cuda()
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]
torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
```
More info can be found here:
https://pytorch.org/docs/stable/onnx.html
### Leap Save Model
To manage the adaptation and saving of a model import prior to its import, we recommend using the `leap_save_model` function. The code for this function can be found in `model.py` within the provided repository.
The function saves the model to a specified path and can be employed to adapt models for use within the Tensorleap platform.
This function is also used by Tensorleap's `leap-cli` for importing models.
Sample code:
```python=
from src.model import MyModel
def leap_save_model(target_file_path: Path):
my_model = MyModel()
model.save(target_file_path)
```
## Dataset Integration
Dataset preprocessing scripts are used by Tensorleap to encode data for the network.
The script includes the preprocessing function that prepares the data state for fetching into the neural network. Providing encoding functions for each input, which reads them and prepares them for neural networks.
Additionally, a ground truth encoding function that is correlated with each output.
### Preprocessing Function
The `preprocessing` function is called once, just before the training/evaluating process. It prepares the training data and validation data (`train_data` and `val_data` in the sample code below).
In the sample code below, the function downloads and reads a `TFRecord` file of a pandas dataframe, splits it into `train` and `validation`, and finally returns `train_data` and `validation_data`.
```python=
from sklearn.model_selection import train_test_split
from src.etl.datasetintegration.datasetbinder import SubsetResponse
def extract_fn(tfrecord):
# Extract features using the keys set during creation
features = {
'image_fpath': tf.FixedLenFeature([], tf.string),
'target': tf.FixedLenFeature([], tf.int64)
}
# Extract the data record
sample = tf.parse_single_example(tfrecord, features)
return sample
def preprocessing():
# arrange the data
train_path = "Tensorleap/train.tfrecord"
validation_path = "Tensorleap/validation.tfrecord"
train_dataset = tf.data.TFRecordDataset([train_path]).map(extract_fn)
validation_dataset = tf.data.TFRecordDataset([validation_path]).map(extract_fn)
train_dataset = list(train_dataset.as_numpy_iterator())
validation_dataset = list(validation_dataset.as_numpy_iterator())
train = SubsetResponse(length=len(train_dataset), data=train_dataset)
val = SubsetResponse(length=len(validation_dataset), data=validation_dataset)
response = [train, val]
return response
```
`### Batch Generation Functions
During the training or evaluation process, the samples are fetched to the neural network in batches.
This section describes functions that are called during the batch generation process, for every sample within the batch.
As an example, a training set of 10K samples would result in 10K calls for each function per epoch. Consequently, it is recommended to avoid long processes in those functions.
#### Input Encoder Function(s)
The input encoder functions receive data (`train_data` / `validation_data` according to the state) as an argument, as well as idx that represents the index of the sample.
For each model input, there should be an encoding function that extracts and generates the input data per one sample.
In order to facilitate tracking and analysis, Tensorleap requires samples to be fetched by index.
Sample code:
```python=
def image_input_encoder(self,idx,subset_response):
image_fpath = subset_response.data[idx]["image_fpath"]
img = imread(image_fpath)
return img
```
### Ground Truth Encoder Function(s)
Similar to the input encoder functions, there are also ground truth encoder functions correlated to each output of the neural network.
Sample code:
```python=
def ground_truth_encoder(self,idx,subset_response):
return subset_response.data[idx]["target"]
```
### Metadata Encoder Function(s)
Furthermore, it is optional to add metadata functions. Those functions return additional data about the sample. This enables querying by each value and detect various related correlations.
Sample code:
```python=
def color_metadata(self,idx,subset_response):
gender = subset_response.data[idx]["color"] #black, blue, brown, gray, green, orange, pink, red, violet, white, yellow
return color
def shape_metadata(self,idx,subset_response):
gender = subset_response.data[idx]["shape"] #long, round, rectangular, square
return gender
```
`
### Test
In order to test the code, the following scripts use the functions above as they will be used within the Tensorleap platform.
The script reads the preprocessed data, and fetches a sample from the training set, and a sample from the validation set. Finally, it prints the two sample inputs along with the ground truths.
Note - the function is presented here for clarification purposes only, and is not required by Tensorleap.
```python=
train_data, validation_data = preprocessing()
fetch_idx = 0 # or any other index.
# for testing the training set
input_feature_1 = image_input_encoder(fetch_idx, train_data)
ground_truth_1 = ground_truth_encoder(fetch_idx, train_data)
# print the training sample
print(input_feature_1)
print(ground_truth_1)
# for testing the validation set
input_feature_1 = image_input_encoder(fetch_idx, validation_data)
ground_truth_1 = ground_truth_encoder(fetch_idx, validation_data)
# print the training sample
print(input_feature_1)
print(ground_truth_1)
```
{%hackmd theme-dark %}
### Leap Binding Functions
The `leap` is an instance pre-set globally by Tensorleap's engine.
Its purpose is to represent the dataset by introducing all the functions above.
In the following sample code, we describe how the attributes and functions are bound:
```python
from src.contract.common.enums import DatasetInputType, DatasetOutputType, DatasetMetadataType
leap.set_subset(ratio=1, function=preprocessing, name='ImageClassificationSubset')
leap.set_input(function=image_input_encoder, subset='ImageClassificationSubset', input_type=DatasetInputType.Image, name='Image')
leap.set_ground_truth(function=ground_truth_encoder, subset='ImageClassificationSubset',
ground_truth_type=DatasetOutputType.Classes,
name='ground_truth', labels=['Dog', 'Cat', 'Mouse'], masked_input=None)
leap.set_metadata(function=color_metadata, subset='ImageClassificationSubset', metadata_type=DatasetMetadataType.string,
name='color')
leap.set_metadata(function=shape_metadata, subset='ImageClassificationSubset', metadata_type=DatasetMetadataType.string,
name='shape')
```
### Additional Example
Below is another example of dataset integration.
Sample code:
```python=
from typing import List
from src.contract.common.enums import DatasetInputType, DatasetOutputType, DatasetMetadataType
from src.etl.datasetintegration.datasetbinder import SubsetResponse
import tensorflow as tf
from keras.datasets import mnist
import numpy as np
def subset_subset0() -> List[SubsetResponse]:
(trainX, trainy), (testX, testy) = mnist.load_data()
return [SubsetResponse(length=10000, data={"data": trainX, "label": trainy}),
SubsetResponse(length=10000, data={"data": testX, "label": testy})]
def input_image(idx, samples):
img = samples.data["data"][idx]
return np.array(img)[..., np.newaxis]
def ground_truth_num(idx, samples):
label = samples.data["label"][idx]
return np.eye(10)[label]
def metadata_label(idx, samples):
label = samples.data["label"][idx]
return str(label)
leap.set_subset(1.0, subset_subset0, 'subset0')
leap.set_input(input_image, 'subset0', DatasetInputType.Image, 'image')
leap.set_ground_truth(ground_truth_num, 'subset0', DatasetOutputType.Classes, 'num',
labels=['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], masked_input=None)
leap.set_metadata(metadata_label, 'subset0', DatasetMetadataType.int, 'label')
```