Tensorleap Guide

# Tensorleap Guide Tensorleap’s platform offers unique tools for debugging, observability, and explainability within the development of deep-learning models. In order to make those deep analyses, our platform tracks each sample, feature, layer and collects many indicators. For integration to begin, the model needs to be exported, along with a dataset, and a script to read the dataset. The purpose of this guide is to describe how to convert a model defined in PyTorch/Tensorflow into a Tensorleap-compatible file format, and the script that is used to read and preprocess data from your dataset. ## Model Export A deep learning model consists of multiple components: * The layers in a model and their connections (model's architecture) * Weights values - the state of the model after training * A set of loss functions, an optimizer and a set of metrics Tensorflow or Pytorch models can be saved to a serialization format for trained models, which stores the model's weights and details on its architecture and how it was trained. The saved model can then be used in Tensorleap independently of the code that created it. Tensorleap reads this serialization file, loads it, and displays it in the platform. For your convenience, below are a few references to code one-liners. ### Tensorflow 2 (Keras) - Save Model The following command generates a folder with the serialized model data, that contains the model architecture and weights. ```python= model.save('path/to/location') ``` ### Tensorflow 2 (Keras) - H5 format Keras also supports saving the model's architecture and weights in a single HDF5 file. This is essentially a light-weight alternative to the “Save Model” option described above. ```python= model.save("my_h5_model.h5") ``` More info can be found here: https://www.tensorflow.org/guide/keras/save_and_serialize ### PyTorch - ONNX format Tensorleap supports PyTorch and requires the model to first be exported to an .onnx file format in order to read it. ```python= input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ] output_names = [ "output1" ] torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names) ``` More info can be found here: https://pytorch.org/docs/stable/onnx.html ## Dataset Integration Dataset preprocessing scripts are used by Tensorleap to encode data for the network. The script includes the preprocessing function that prepares the data state for fetching into the neural network. Providing encoding functions for each input, which reads them and prepares them for neural networks. Similarly, a ground truth encoding function that is correlated with each output. ### Preprocessing function The `preprocessing` function is called once, just before the training/evaluating process. It prepares the training data and validation data (`train_data` and `val_data` in the sample code below). In the sample code below, the function downloads and reads a `TFRecord` file of a pandas dataframe. It then splits it into `train` and `validation`, and finally returns `train_data` and `validation_data`. ```python= from sklearn.model_selection import train_test_split def extract_fn(tfrecord): # Extract features using the keys set during creation features = { 'image_fpath': tf.FixedLenFeature([], tf.string), 'target': tf.FixedLenFeature([], tf.int64) } # Extract the data record sample = tf.parse_single_example(tfrecord, features) return sample def preprocessing(): # arrange the data train_path = "Tensorleap/train.tfrecord" validation_path = "Tensorleap/validation.tfrecord" train_dataset = tf.data.TFRecordDataset([train_path]).map(extract_fn) validation_dataset = tf.data.TFRecordDataset([validation_path]).map(extract_fn) train_dataset = list(train_dataset.as_numpy_iterator()) validation_dataset = list(validation_dataset.as_numpy_iterator()) return train_dataset, validation_dataset ``` ### Batch generation functions During the training or evaluation process, the samples are fetched to the neural network in batches. This section describes functions that are called during the batch generation process, for every sample within the batch. As an example, a training set of 10K samples would result in 10K calls for each function per epoch. Consequently, it is recommended to avoid long processes in those functions. #### Input encoder function(s) The input encoder functions receive data (`train_data` / `validation_data` according to the state) as an argument, as well as idx that represents the index of the sample. For each model input, there should be an encoding function that extracts and generates the input data per one sample. In order to facilitate tracking and analysis, Tensorleap requires samples to be fetched by index. Sample code: ```python= def image_input_encoder(self,idx,data): image_fpath = data[idx]["image_fpath"] img = imread(image_fpath) return img ``` ### Ground truth encoder function(s) Similar to the input encoder functions, there are also ground truth encoder functions correlated to each output of the neural network. Sample code: ```python= def ground_truth_encoder(self,idx,data): return data[idx]["target"] ``` ### Test In order to test the code, the following scripts use the functions above as they will be used within the Tensorleap platform. The script reads the preprocessed data, and fetches a sample from the training set, and a sample from the validation set. Finally, it prints the two sample inputs along with the ground truths. Note - the function is presented here for clarification purposes only, and is not required by Tensorleap. ```python= train_data, validation_data = preprocessing() fetch_idx = 0 # or any other index. # for testing the training set input_feature_1 = image_input_encoder(fetch_idx, train_data) ground_truth_1 = ground_truth_encoder(fetch_idx, train_data) # print the training sample print(input_feature_1) print(ground_truth_1) # for testing the validation set input_feature_1 = image_input_encoder(fetch_idx, validation_data) ground_truth_1 = ground_truth_encoder(fetch_idx, validation_data) # print the training sample print(input_feature_1) print(ground_truth_1) ``` {%hackmd theme-dark %}