Implementation of semantic-segmentation-pytorch

# Implementation of semantic-segmentation-pytorch > colab: [Semantic Segmentation Demo](https://colab.research.google.com/drive/1kUZrjMFuDjlrPsNF2l7FioQKFqTxO9oS?authuser=1#scrollTo=Ic3Belq4zmet) > repo: [CSAILVision/semantic-segmentation-pytorch](https://github.com/CSAILVision/semantic-segmentation-pytorch) <- credit to those folks > my note: [Segmentation Overview](https://hackmd.io/Bo8ujt3LSOOK_1pfFaHKMg) [Toc] ## Disclaimer (?) Hi~ If you are reading this, this is a "helpme" md try to document the note while I adopting the repo for my own project. **I am not one of authors of the repo**, and **this page is under developed**. Any comment and suggestion is welcome~ ## Bug If you are like me, using single GPU to train, you might encouter [this](https://github.com/CSAILVision/semantic-segmentation-pytorch/issues/203). The solution is already in [#203](https://github.com/CSAILVision/semantic-segmentation-pytorch/issues/203#issuecomment-562524601): Add the following in `model.py` around line 30, ```python= class SegmentationModule(SegmentationModuleBase): def forward(self, feed_dict, *, segSize=None): # training if type(feed_dict) is list: feed_dict = feed_dict[0] if torch.cuda.is_available(): feed_dict['img_data'] = feed_dict['img_data'].cuda() feed_dict['seg_label'] = feed_dict['seg_label'].cuda() else: raise RunTimeError('Cannot convert torch.Floattensor into torch.cuda.FloatTensor') ``` ## Configuration The default configurations of datasets and models, train/val/test's hyparameters are located in `mit_semseg/config/default.py`. The explanation of each option is well documented in there. The official custom configuration can be found in `config` directory. ## Loss Their loss is [NLLLoss](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html). Might change to the cross-entropy loss. Or dice coefficient. The efficiency or improvement is unknown to me. ## Model See [here](https://github.com/CSAILVision/semantic-segmentation-pytorch/blob/9aff40de31ee4b21f18514d31e5d6e4ba056924d/mit_semseg/models/models.py#L50), it is in `mit_semseg/models/models.py` ## Evaluation Use `eval_multipro.py` to display the IoU of each class in terminal, and calculate the mean IoU of all the classes as well as accuracy. If the visulazation is set `True` (one can set it in the `.yaml` config file) The results of prediction (`.png`) will be saved under `./ckpt/the_model_u_trained/result`. You will get something like the following: <center> <img src=https://i.imgur.com/xu8m6p8.png width=600> </center> ## Train on you own dataset ### Test run :::success Here assume that you alreay follow the steps in Train section of the readme in the repo, e.g. download the dataset etc. ::: To make sure everything is in the right place and already to go, test run with ```python= python train.py --gpus 1 --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml ``` if there is a error message of GPU memory is not enough, _usually_ change the `imgMaxSize` in `config/ade20k-resnet50dilated-ppm_deepsup.yaml` in `config` can solve the problem. ### Dataset Preparation #### `.json` to Dataset Here, I use labelme for annotation ([tutorial here](https://github.com/wkentaro/labelme/tree/master/examples/tutorial#tutorial-single-image-example)), in order to let the model eat the label data, we need to transform `.json` files to `.png` label images. The file `json_to_dataset.py` is to convert `.json` file into single image dataset. :::info It can be found in `~/anaconda3/envs/labelme/lib/pythonX.X/site-packages/labelme/cli/json_to_dataset.py` if one creates with conda env. Otherwise, one can find in here: `~/anaconda3/lib/pythonX.X/site-packages/labelme/cli/json_to_dataset.py`. ::: To perform repeatly for all the `.json` files, I modified it as below: ```python= import argparse import base64 import json import os import os.path as osp import imgviz import PIL.Image import matplotlib.pyplot as plt from labelme.logger import logger from labelme import utils def main(): # logger.warning( # "This script is aimed to demonstrate how to convert the " # "JSON file to a single image dataset." # ) # logger.warning( # "It won't handle multiple JSON files to generate a " # "real-use dataset." # ) parser = argparse.ArgumentParser() parser.add_argument("json_file") # parser.add_argument("-o", "--out", default=None) parser.add_argument('--save',type=str) args = parser.parse_args() json_file = args.json_file # if args.out is None: # out_dir = osp.basename(json_file).replace(".", "_") # out_dir = osp.join(osp.dirname(json_file), out_dir) # else: # out_dir = args.out # if not osp.exists(out_dir): # os.mkdir(out_dir) data = json.load(open(json_file)) imageData = data.get("imageData") if not imageData: imagePath = os.path.join(os.path.dirname(json_file), data["imagePath"]) with open(imagePath, "rb") as f: imageData = f.read() imageData = base64.b64encode(imageData).decode("utf-8") img = utils.img_b64_to_arr(imageData) label_name_to_value = {"_background_": 0} for shape in sorted(data["shapes"], key=lambda x: x["label"]): label_name = shape["label"] if label_name in label_name_to_value: label_value = label_name_to_value[label_name] else: label_value = len(label_name_to_value) label_name_to_value[label_name] = label_value lbl, _ = utils.shapes_to_label( img.shape, data["shapes"], label_name_to_value ) label_names = [None] * (max(label_name_to_value.values()) + 1) for name, value in label_name_to_value.items(): label_names[value] = name lbl_viz = imgviz.label2rgb( label=lbl, img=imgviz.asgray(img), label_names=label_names, loc="rb" ) # Modified here # PIL.Image.fromarray(img).save(osp.join(out_dir, "img.png")) utils.lblsave(osp.join(args.save, f"{os.path.basename(args.json_file)[:-5]}.png"), lbl) # PIL.Image.fromarray(lbl_viz).save(osp.join(out_dir, "label_viz.png")) # with open(osp.join(out_dir, "label_names.txt"), "w") as f: # for lbl_name in label_names: # f.write(lbl_name + "\n") logger.info("Saved to: {}".format(args.save)) if __name__ == "__main__": main() ``` Note that you can modify it for the label of background (Here it assigns label 0 for the background). In the following, I repeatly call the above function, remember that one has to prepare both training and val datasets at first: ```python import labelme import os, sys if __name__ == "__main__": mode = 'val' # or 'train' path = f"path/to/your_json" dest = f"path/to/save/" dirs = os.listdir(path) dirs = [dir for dir in dirs if dir.endswith('.json')] dirs = [os.path.join(path,dir) for dir in dirs] for i, item in enumerate(dirs): json_name = item.split('/')[-1].split('.')[0] os.system("labelme_json_to_dataset "+item+" --save "+dest) ``` We still need to create a several files to train, but first, let's check the data structure of the repo's file. #### Directory Structure ``` # under `data` directory yourdataset ___ | ___ annotations _______ training | |___ validation | | ___ images ______ training |___ validation # help files training.odgt validation.odgt ``` If you happen to have your own dataset, split the data and create a directory structure under the repo's `data`. The `.odgt` files are for the image and annotation paths also their height and width. The format of content in `training/validation.odgt` is like the following: ``` {"fpath_img": "path/to/.jpg", "fpath_segm": "path/to/.png", "width": 683, "height": 512} ``` To create these files, I wrote the following (note that I presume you have already put the data under the images/annotations directory) ```python import os import cv2 import json def odgt(img_path): seg_path = img_path.replace('images','annotations') seg_path = seg_path.replace('.jpg','.png') if os.path.exists(seg_path): img = cv2.imread(img_path) h, w, _ = img.shape odgt_dic = {} odgt_dic["fpath_img"] = img_path odgt_dic["fpath_segm"] = seg_path odgt_dic["width"] = h odgt_dic["height"] = w return odgt_dic else: # print('the corresponded annotation does not exist') # print(img_path) return None if __name__ == "__main__": modes = ['train','val'] saves = ['metal_training.odgt', 'metal_validation.odgt'] # customized for i, mode in enumerate(modes): save = saves[i] dir_path = f"your/data/{mode}" img_list = os.listdir(dir_path) img_list.sort() img_list = [os.path.join(dir_path, img) for img in img_list] with open(f'~/semantic-segmentation-pytorch/data/{save}', mode='wt', encoding='utf-8') as myodgt: for i, img in enumerate(img_list): a_odgt = odgt(img) if a_odgt is not None: myodgt.write(f'{json.dumps(a_odgt)}\n') ``` #### Modify `dataset.py` Note that the annotation file should be converted to mode `L`, I modified it in `__getitem__` in `TrainDataset` (val and test also) ``` # Original # segm = Image.open(segm_path) segm = Image.open(segm_path).convert('L') ``` In addition, classes annotations from `labelme` will not start from 0. I have to change the following line in `BaseDataset` manually (there are only two classes for my case): ```python= def segm_transform(self, segm): # to tensor, -1 to 149 for the default dataset # for ours, -1 background, 0 and 1 for classes # the segm input format is (0, 38, 75) # we need to change it to (-1, 0, 1) segm = np.array(segm) segm = np.where(segm==0, -1, segm) segm = np.where(segm==38, 0, segm) segm = np.where(segm==75, 1, segm) # print(np.unique(segm)) # Original # segm = torch.from_numpy(segm).long() - 1 segm = torch.from_numpy(segm).long() return segm ``` #### Modify Config Customizing the config file for ones case ```yaml= DATASET: root_dataset: "./data/" list_train: "./data/metal_training.odgt" list_val: "./data/metal_validation.odgt" num_class: 2 ... ``` ### Training ```shell= python3 train.py --gpus 1 --cfg config/custom.yaml ``` ### Loading in pre-train model TODO

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.