Introduction

In this document, I want to describe the whole research done by me trying to integrate MMSegmentation in our framework. I have decided to do it in a structured way so any person reviewing this task is aware about limitations and difficulties and is not obliged to walk through my path.

MMSegmentation

The MMSegmentation library uses python dictionaries as configs and provides you a way to train any of their pre-equipped models with custom datasets. Usually, you do it in the following way:

  1. Firstly, you download necessary configs and checkpoints
  2. Secondly, you adjust the config for your need varying certain parameters
  3. Finally, you launch the training that is fully controlled by your config

All steps are easily visible through code here. But, in this tutorial they setup a new dataset and mention that they need to override 'load_annotation' method. However, they do not do this in the code, so I am confused about what they meant by this and how it is actually working. Overall, this notebook gives a good overview of the structure of the framework

innofw

Your framework also uses configuration as the core concept to control machine learning tasks. But, our configuration files are much closer to the actual code and written with YAML. Also, we have a way to point to the particular class we want to use, while MMSegmentation rely on Registry.

As I provided overview for both ideas, I proceed to limitation of integration of them.

Limitations

Both frameworks are standalone solution for solving various machine learning tasks and their size and totally different approaches makes it difficult to make them work together

Package Manager

It is not an easy task to install and use this library. I have set up an environment several times and a few of them was succesful. Overall, to install the library and use it, the following steps should be taken:

  1. There are two ways to install mmcv-full

    • The torch and CUDA versions are to be determined first. Using this information, you retrieve the proper link for pip from here. So, if you change the version of either torch or CUDA, the link also should be changed.
    • Another way is to install mim and then install mmcv-full using it. The mim package is used anyway, but the major drawback is that I see no way to automatize installation from mim and it is done in poetry or pip.
  2. Then, we need to install mmsegmentation. The preferrable way to do it is using pip:

    ​​​​pip install mmsegmentation
    

    Another way it do it from source (refer to this)

In conclusion, installation is not pretty and requires manual work (either to retrieve proper link for pip, or use mim).

Dataset

Another limitation is the format of datasets used in mmsegmentation. The whole data flow is orchestrated with configuration files including the dataset class and image/annotations format. After long research, I found a function build_dataloader that accepts "PyTorch dataset" as it is said in the documentation. However, a problem still arises during training when our Dataset object is supplied. I have no idea why the mention PyTorch dataset, if actually the training loop await CustomDataset (source). The problem is traced in the following way

  1. On line 214, the __getitem__ method returns a result of prepare_train_img function.
  2. On line 247, the result of pipeline is returned.
  3. Tracing deeper from line 229 to compose.py.44 it can be concluded that the output of the dataset is python dictionary and that makes impossible incorporating any of our datasets without wrapping or changing them.

My proposition is to have create a wrapper or a totally different class for working with segmentation data specially for MMSegmentation. I think it will tend to stable codebase and easiness in OpenMMLab integrations.

Conclusion

Both frameworks use their own idea of managing configurations and building datasets. The only way I see them integrated is that we adapt an interface devoted special of mmsegmentation class. But this task requires clear understanding of segmentation as a process and broad knowledge of the whole codebase of the project. I alone will spend a significant amount of time incorporating this two solutions.